Predicting psycho-somatic disorders in online activity using multi-layer perceptron

ABSTRACT


INTRODUCTION
People's techniques of communication have experienced a tremendous tidal change as a result of the rise of social networking websites like LinkedIn, Facebook, and Twitter [1].This modification has made it far more challenging to precisely predict issues that a sizable proportion of consumers would encounter.According to study, those who expend a lot of time on social network media are more likely to have mental health issues including social depression, anxiety, and exposure to inappropriate information.Making predictions about user behaviour based on data acquired from the quantity of time spent using social networks is becoming more and more popular as a means of precisely anticipating user activities [2].The data received via Social Network is often unbalanced in comparison to traditional data [3].As a result, it can be difficult to accurately predict user problems using such data.
We will be able to learn more about algorithms that can study, generalize, and predict from large data sets with the assistance of machine learning algorithms [4].When we eventually have to find out how to put these algorithms to use, having this information on hand will be helpful.Because machine learning plays a role in both calculating statistics and making decisions, the two processes are intimately related to one another [5].Machine learning paradigms are used in a wide range of applications, such as estimating the number of units  ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 687-694 688 that will be sold of a certain product, calculating the chance that it will rain in a particular area, and many other similar tasks.The building of prediction models for tailored issues pertaining to the amount of time spent on social networks will be facilitated by the use of systematic analysis in conjunction with algorithms for machine learning [6].The monitoring of adverse occurrences in users while they are participating in the trial run, as well as the determination of the best forecast for each user.In this research, machine learning methodologies were used to anticipate the issue, and in order to accomplish this, we proposed an incorporation framework for a decision support system.In order to create the decision support system for predicting the issue, the support vector machine kernel technique was utilized, and the performance of the system was assessed [7].
Through the use of machine learning algorithms, we will get a deeper understanding of algorithms that learn from massive data sets, generalise their results, and make predictions based on those discoveries.Having this knowledge on hand will be very helpful in the future when we will need to determine precisely how to implement these methods.Calculative statistics and decision-making research both include machine learning, hence there is a close connection between the two areas of study [8].Robotics-related methods, like machine learning, are utilised in many different applications, such as predicting product sales, figuring out how likely it is that it will rain there, and many other things [9].It involves monitoring negative user events while they are undergoing a trial run and choosing the most accurate forecast for each user.The creation of prediction models for specific scenarios, such as the amount of time spent on social networks, will be helped by systems analysis combined with machine learning techniques.The amount of time spent on social networks is one illustration of such a circumstance.In an effort to foresee such issues, we used machine learning techniques and developed an incorporation framework for a decision support system as part of our study.In our past study, we built a decision support system for predicting the issue using models like logistic regression, support vector machines, random forests, and neural networks [10].The system's performance was then evaluated.The article is put together in the following manner: The first section provides an introduction, the second section discusses appropriate work in problem prediction, the third section discusses the data and methods for social network problem prediction, and the fourth section discusses the execution of a forecast model and results.Section 5 provides the conclusion.

RELATED WORK
Even though more conventional methods of statistical modelling are capable of producing reliable models, the application of artificial intelligence (AI) techniques could be able to facilitate the development of high-quality prediction models [11], [12].The authors of this paper propose a machine learning solution in the form of a multi-layer perceptron (MLP) artificial neural network (ANN) [13] in order to illustrate the progression of the disease.This answer forecasts the maximum number of instances of sickness per location and time unit, as well as the utmost number of cases of ill health that recover per location and time unit and the maximum number of cases of illness that pass away per location and time unit.The MLP was used instead of other AI technologies since it is simpler to understand how it works.They wanted to test the viability of modelling using relatively simple methods due to the shorter training time associated with such methods; the importance of quick results generation when modelling diseases due to the as-fast-as-possible requirement for models with adequate regression performance; and MLP is the most straightforward AI algorithm.The authors selected MLP because it is the simplest AI algorithm.Modeling on the basis of previously obtained data is made possible by statistical analysis.On the other hand, statistical analysis might not be able to grasp the complexities of the data under examination in the event that the model in question is particularly difficult to understand [14].Complex algorithms, especially AI and machine learning algorithms, can be used to "learn" not only the general trend of the data, but also its complexities, which ultimately leads to the production of models of a higher quality [15].This is made possible by the fact that complex algorithms are able to "learn" not only the overall trend of the data, but also its complexities.AI algorithms are now being used in a wide number of scientific and commercial domains, including medicine for the classification of a wide range of disorders as well as the construction of regression models for estimating and forecasting purposes [16].These models adjust their parameters in order to adapt their predictions to the data that is currently available, despite the fact that the data may or may not include the information that is being predicted.By doing so, the models take into account interactions between a wide range of input elements.These interactions would not have been taken into account if normal modelling methodologies had been used [17], but because the models do this, they do.
Support vector regression was utilised by Yu et al. [18] in order to illustrate the effectiveness of the maximal information coefficient technique for feature selection in the context of the determination of dissolved oxygen (DO) concentration (SVR).In terms of root mean square error (RMSE), the findings that were provided by the optimised dataset were much more reliable (28.5%) than those that were produced by the initial input configuration.In order to accomplish this goal, Csábrági et al. [19] shown that three standard ANN ideas, Predicting psycho-somatic disorders in online activity using multi-layer perceptron (Manjunath Gadiparthi) 689 namely the multi-layer perceptron (MLP), the radial basis function (RBF), and the general regression neural network (GRNN), are successful.A innovative ANN-based model, which Heddam [20] referred to as an evolving fuzzy neural network, was proposed for the purpose of modelling.As an illustration, a total of fiftyone studies have been carried out to investigate the applicability of fuzzy-based models.An efficient approach to data mining is the adaptive neuro-fuzzy inference system (ANFIS), which is also known by its full name.This approach has been investigated in a variety of studies.You could find further study on the application of machine learning technology in [21] Ouma et al. [22] studied the capability of a feed-forward artificial neural network and a multiple linear regression (MLR) model to reproduce the DO levels found in the Nyando River in Kenya.ANN is for artificial neural network.MLR stands for multiple linear regression.When compared to the correlation of the MLR, the correlation between the ANN and the MLR was much greater (i.e., 0.8546 versus 0.6199).It was revealed that the accuracy of the suggested model was roughly eight percent, seventeen percent, and twelve percent higher than that of typical data mining approaches such as feed-forward ANN, SVR, and GRNN, respectively.During the following hour, DO also demonstrated the highest level of accuracy (the coefficient of determination R2 equaled 0.908).Tyesha et al. [23] analysed a variety of water quality parameters by combining two well-known machine learning models, random forest (RF) and extreme gradient boosting, with a so-called denoising method that they termed "complete ensemble empirical mode decomposition with adaptive noise.You may find further work that is comparable to this one at [24], [25].This combination was used to analyse a number of different water quality parameters."Complete ensemble empirical mode decomposition with adaptive noise" is the name of this approach.It was shown that the RFbased ensemble can properly mimic DO, temperature, and specific conductance.In addition to this, they offered illustrations of the applicability of the proposed strategies by contrasting them with a number of other instruments that are commonly used.In a manner analogous to this, Heino et al. [14] proved that RF is superior to MLR when it comes to DO modelling.In addition to that, he added that the temperature and pH of the water are the two most critical factors to consider during this procedure.Pflüger and Glorius [15], analyzed the similarities and differences between the MLP, RBF, ANFIS (sub-clustering), and ANFIS [15].(Partitioning of the grid) The outputs of MLP are more closely connected to the measured DOs, as evidenced by the R2 values of 0.98, 0.96, 0.95, and 0.86 for a single station (number: 02156500).It was determined what the values of one station were.

MATERIALS AND METHODS
This information was obtained via the use of survey questionnaires filled out by 1,092 individuals located all around the world.Questions are asked of users regarding the amount of time they spend on various social networking sites as well as the challenges they have when utilising these types of sites in order to collect data.The following is a list of the survey questions that were utilised in the process of gathering information for the report that is being presented.Users replied to questions on how much time they spend on certain social network applications and whether or not they are experiencing any issues as a result of their use of these applications.Figure 1 shows the table structure of collected data which is used for our model.The data that was obtained is then refined for the machine learning approach so that it can accept it for training and testing.Correlation between the problems and social network apps The first thing that has to be done is research into the correlations that exist between social networking applications and the factors that are used as predictors.This is done in order to determine which variables should be included in the model of our data set, and it is done with that aim in mind.It has been established that a strong association does exist between the incidence of obesity and the usage of YouTube, and this relationship is depicted in Figure 2. It is possible for obesity to result in a wide range of problems, some of which are significant in nature and have an effect on an individual's physical health.Some of these consequences may be avoided by maintaining a healthy weight.However, a correlation factor ranging from 0.43 to 0.52 suggests that the amount of time spent on public media platforms such as Facebook, Telegram, and YouTube is a key risk factor in the prediction of anxiety and depression.These platforms include Facebook, Telegram, and YouTube.Whatsapp, in comparison to other social networking programmes, has a weaker link to any illness than any other social networking software, which indicates that it presents a lower risk to one's health than the other social networking software does.

Figure 2. Correlation between the problems and social network apps
A multi-layer perceptron, often known as an MLP, is a specific kind of neural network that utilizes the back-propagation method for its supervised learning strategy.A three-layer structure, consisting of an input layer, a hidden layer or layers, and an output layer or layers, as shown in the Figure 3, is optimal for the MLP.In this configuration, each neuron is linked to all of the neurons in the following layer.MLP has been shown to have a significant role in solving non-linear problems, according to several reports.Predicting psycho-somatic disorders in online activity using multi-layer perceptron (Manjunath Gadiparthi)

691
It is now time to begin the process of breeding in order to finish off the left over 12 networks in the populace and ensure that the next generation will have a full set of 16 networks.The population of the generation that follows the current one is produced from the population of the generation that came before it through a process known as crossover.To generate one or more children for the generation that comes after this one, there must be at least two persons from the present population who are referred to be parents.The parents are chosen based on the scores, and after that, the network parameters are mixed in order to create a new child that is a hybrid of their parents.In the context of this inquiry, each child that is born is a network that possesses a unique combination of unpredictable factors that are passed down from its parents.
Transformation: At this point, we have the population that will be utilised for the next generation; all the way through this process, some of the properties of the selected networks that make up the population are determined in a manner that is completely arbitrary.The purpose of this method is to churn out individuals who are even more excellent.It is now time to start the process of breeding in order to finish off the left over 12 networks in the population so that the next generation will have a full set of 16 networks.This is done as part of the process known as "propagation."Crossover refers to the process through which the population of one generation is used to contribute to the formation of the population of the generation that comes after it.It takes at least two people from the current population, who are collectively referred to as parents, in order to create one or more children for the generation that comes after the current one.The parents are determined by the scores, and after those decisions have been made, the network parameters are merged to form a new child who is a hybrid of their parents.According to the principles of this inquiry, every child born into this world is a network with a unique combination of random factors that are passed down from their parents.
Transformation: at this point, we have the population that will be utilised for the next generation.All the while, some of the properties of the selected networks that make up the population are determined in a manner that is completely arbitrary.This method is an attempt to generate individuals that are even more excellent than they already are.
Algorithm: MLP algorithm i.
Generate a population of MLPs and allocate arbitrary hyper-factors to each of the Networks.Network depth, also known as the number of layers, should be one of the random parameters: {1, 2, 3} Network depth or the number of neurons in a layer: {5, 10, 20, 40, 80, 160} Dense layer activation function: {relu, elu, tanh, sigmoid} Choose 15 MLPs at random to serve as the population, network optimizer ii.Conduct training on all of the population's networks.iii.Calculating fitness involves giving points depending on how accurate they are or how much they cost in reverse.iv.Choose some of the most popular networks as well as some less popular networks v. Combine the settings of two different members of the Networks you've chosen to work with.If you put the two NNs together, you'll get a "child" NN that has some of the same traits as the first and some of the same traits as the second.vi.Adjust the settings on some of the child networks' parameters.vii.Keep track of the "children" that are born into a new population, and then add the new population to the variable that already has the previous population.viii.Repeat steps 2 through 7 for each generation after the first.We used a total of seven generations to come up with the best network models.In this study, we started with a population of 15 random networks and went through the process of evaluation, selection, crossover, and mutation seven times.
In this work, we trained a total of about 160 MLPs, resulting in a more robust population as the MLPs proliferated.The following part evaluates the predicting accuracy of the final population set and ranks the top five neural networks.Predictive network accuracy was found to increase with population age.The models are trained and output using the Scikit learn and Keras tools in Python.In order to assess performance, we have employed cross-validation, a resampling strategy that will be elaborated upon in the next section.The suggested model's results have been compared to those of traditional machine learning algorithms.According to the simulation findings, the suggested model outperforms the support vector machine, logistic regression, and random forest decision tree classifier methods in terms of prediction.

RESULTS AND DISCUSSION
During the development of our model, the well-known Keras machine learning package that is available in Python was utilised.The technique of cross-validating the model makes use of the scikit-learn package, which can be found in Python.This resampling approach is applied for the purpose of performance evaluation.During this procedure, the data are partitioned into k-parts (k-1 parts are utilised for model training, and one part is maintained for model testing), and k is the number of parts that are retained.This technique is referred to as a stratified method because it makes an effort to equalise the number of samples that originate from each class in the k-splits.As a result, it is regarded to be a stratified approach.In this specific piece of study, the value of k has been determined to be 8, and the total accuracy of the models has been calculated by taking the mean of each of their respective results.We came to the conclusion that the best way to assess the performance of the classifiers would be to utilise three metrics that are common in the field of machine learning: precision, recall, and F1-score.When carrying out an analysis of performance, it is important to take into consideration the dependability of the data being examined.Because we used 10-fold cross-validation, we were able to calculate the mean and standard deviations of each model's accuracy, recall, and F1-score.This allowed us to compare the performance of the different models.There are just four different ways that a person may come upon them.The information that is presented here will act as a framework for our conversation.Figure 6 depicts a comparison of the models logical regression, random forest, and MLP for forecasting.It examines the obesity prediction across a diversity of machine learning algorithms.MLP provides the most accurate predictions when compared to other models for recognising the obesity epidemic.
These results can be compared to the previous works which done on different data sets and domains.Career self-efficacy mediates instructional quality and social support on civil engineering vocational high school students' career building.This study has major implications for vocational educators who construct career development or strengthening programmes for vocational students [26].A correlation-based filter helps ISSN: 2252-8938  Predicting psycho-somatic disorders in online activity using multi-layer perceptron (Manjunath Gadiparthi) 693 classifiers choose the most important features to improve classification accuracy.Weka and statistical package for the social sciences (SPSS) sensitivity, specificity, accuracy, and precision analysis is shown.A decision tree (J48) classified cardiovascular diseases (CVD) patients with 95.76% accuracy [27].Our MLP is showing 94-98% accuracy which is better compared to earlier works in different field of data.

CONCLUSION
In our research, we provide a powerful MLP-based prediction machine learning model for spotting potential issues associated with excessive use of social networking sites (social network time).In order to analyse and evaluate our facts collection, we have selected models like logistic regression, support vector machine (SVM), and random forest.We have done several iterations, modifying both the train set and the test set data, to arrive at the most accurate results possible.Following extensive experimentation to validate our hypothesis, we analysed the results using three separate performance indicators.Based on the outcome, we deduce that an MLP is the best model for predicting issues related to time spent in social networks.It's feasible that the findings of this study will be applied to social networking applications in the future, with the user being made aware of any risks that may occur from excessive usage of that app.

Figure 1 .
Figure 1.Sample data sheet collected from user


ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 687-694 692 − T p -True positive: Prediction is positive and individual is facing trouble − T n -True negative: Prediction is negative and individual is well − F p -False positive: Prediction is positive and individual is well, fake alarm, horrible − F n -False negative: Prediction is negative and individual is diabetic, the mainly horrifying Accuracy = (T p +T n )/(T p +F p +F n +T n ) Precision = T p /(T p +F p ) Recall = T p / (T p + F n ) / T p F1 Score = 2 * (R ecall * P recision ) / (R ecall + P recision )A comparison of the predicting capabilities of the models logistic regression, random forest, and MLP is presented in Figure4.It compares the anxiety problem on models of logistic regression, random forest, and MLP.MLP results best forecasting compared to remaining models for identifying anxiety problems.

Figure 4 .
Figure 4. Visualizing anxiety forecasting with different models