Machine learning-based stress classification system using wearable sensor devices

,


INTRODUCTION
Stress occurs due to a person's inability to handle his mental and emotional states during a challenging situation.It is described by Hans Selye as "an unspecific response of a human body to the demand of task" [1].There are two categorizations of stress, namely short-term and long-term stress.Cohen et al. [2] concluded that there is a direct association between long-term psychological stress and diseases like depression, human immunodeficiency virus (HIV) / acquired immune deficiency syndrome (AIDS), and cardiovascular diseases.High stress can also cause chronic illnesses such as stroke and diabetes.Tasks involving a high mental workload can induce stress, especially for academic and placement assessments of students.People in academia are constantly engaged in such tasks.Students in universities and schools face many mentally demanding and challenging situations, like examinations, peer pressure, teachers, and job interviews.In such a socially competitive and mentally exhausting environment, it is often the case that students become the victims of stress and its associated health risks.Nandi et al. [3] surveyed university medical students and found 53% of the students who participated in the study were stressed, and there were significant effects on the mental and social well-being of these participants.Behere et al. [4] proposed a study that involved a questionnaire-based survey ❒ ISSN: 2252-8938 of 100 random students.The study found that medical and engineering students had high-stress levels requiring immediate medical attention.It also concluded that students not attending to their high-stress levels could cause severe mental and psychosocial problems.
Although several psychological tests can help assess stress levels, they require assessment from a psychologist and depend on how well the participant has answered the questions.Several studies have recorded an individual's stress response using self-report questionnaires, like the perceived stress scale (PSS) [5].However, these questionnaires are subjective methods of evaluating stress and might not have any significant correlation with the stress levels of an individual [6].
An alternative is to have stress classification systems with physiological signals that can overcome the limitations of these questionnaires, assess and quantify stress levels, take timely preventive measures, and avoid risks.Physiological signals are more related to the body's vital elements, including cardiac activity electrocardiogram (ECG), blood volume pulse (BVP), brain function, exocrine activity, and muscle excitability estimated using electromyography (EMG).Various wearable sensors in the market can help measure such signals accurately.In this study, our primary focus is on four physiological signals -electroencephalogram (EEG), electrodermal activity (EDA), skin temperature (SKT), and heart rate (HR).Numerous studies use EEG for applications such as assistive smart homes [7], seizure detection [8], driver drowsiness [9], classification of autism [10] and mental stress detection.EEG can capture localized signals of the brain regions that generate the stress response and can give many robust features for assessing stress [11].Many researchers have observed the associations of variations in alpha, beta, and theta power bands of EEG signals with mental states and have exploited them for feature extraction [12].In addition to EEG, many stress studies consider EDA, a measure of electric current flowing through the skin.Human skin has millions of sweat glands that can get activated under stress.Activation of sweat glands increases the amount of skin moisture secreted from the body [13], which varies the skin conductance at that region of the skin.Several studies show a high correlation between the stress levels of an individual and skin conductance [14].High stress and anxiety can also cause variations in SKT.Different parts of the body show different patterns of temperature variations.Contrary to most beliefs, SKT can increase or decrease during stress.In a study by Vinkers et al. [15], when participants took the trier social stress test (TSST) [16], the authors observed that the upper region of the participant's arm showed a significant increase in SKT.Another commonly adopted measure for detecting stress is the heart rate (HR), measured as the number of heartbeats per minute.During stress, the human body releases adrenaline, a hormone that can cause the heart rate and breathing rate of a person to rise.In literature, many studies have reported that during stress, a person's heart rate rises significantly [15], [17]- [19].
Related work: several stressors can be used to study the impact of stress, such as academic and mental tasks.Montreal imaging stress task (MIST) task [20] for the mental workload.It comprises several mental arithmetic questions with single-digit answers and varying difficulty levels.MIST is nearest to the scenario when the students appear for an exam or a placement test in real life.Hence, this paper considers using MIST task to monitor stress in university students.
EEG is the most widely used biochemical signal to study brain functions due to the availability of non-invasive, easy-to-use, portable, and low-cost EEG devices.In a recent study by Sharma and Khyati [21], the authors made an EEG-based database for early stress detection.They used Hilbert-Huang transform (HHT) to decompose the EEG signals into intrinsic mode functions (IMFs) and extract related features in the Time-Frequency domain.They classified stress into three levels, low stress, medium stress, and high stress.A hierarchical support vector machine (SVM) model was trained, which achieved an accuracy of 92.86%.Blanco et al. [22] used a computer-based version of the Stroop test [23] and the Emotiv EPOC Headset to collect a dataset of 18 subjects.The raw EEG signals obtained were cleaned by subtracting the least-squares line of best fit and bandpass filter network of Chebyshev Type II filters.They used logistic regression, quadratic discriminant analysis (QDA), and k-nearest neighbors (KNN), achieving a maximum accuracy of 78.70% in the stress classification task.
On the other hand, many researchers choose physiological signals other than EEG to study stress.Airij et al. [24] proposed a system that monitored only three physiological signals-heart rate, skin conductance, and skin temperature of patients.The system stored the patients' physiological signals and stress level data for record maintenance and sent it to the concerned doctors for remote tracking of the patients.The authors used a rule-based fuzzy logic algorithm for stress classification, which obtained an accuracy of 96.19%.
There have been previous studies to evaluate stress using MIST [20].Minguillon et al. [25] used MIST with a small number of 10 subjects and achieved an accuracy of 50% with LDA with EEG features and Int J Artif Intell, Vol. 13 [20] to elicit stress in 12 participants and recorded their EEG signals using the Brain Master 24E system.The collected data was filtered using a 0.5 Hz to 30 Hz bandpass filter implemented by using a Butterworth 3rdorder filter.The independent component analysis (ICA) was applied to remove the artifacts.They used an SVM classifier for a three-level classification and obtained mean accuracies of 94%, 85%, and 80% for three mental stress levels one, two, and three, respectively.Jun and Smitha [28] designed an automatic EEG-based stress recognition system that used Stroop test [23], and MIST [20] to induce low levels and high levels of stress, respectively.They trained an SVM model on the power band features from the EEG signals, which achieved an accuracy of 75% in three-level stress classification.Contribution: these studies motivate us to build intelligent models to evaluate and predict stress levels in undergraduate engineering students using multimodal physiological sensors using the MIST task [20].The contributions of this paper are as follows: − Collection of a dataset of 23 students, recording physiological signals from Emotive EPOC+ for EEG and E4 Empatica wristband for EDA, SKT, and HR when performing MIST task [20].The participants' stress levels can be categorized into three classes-rest (no stress), moderate stress, and high stress.− Machine learning models that extract and select the dataset's optimal features.It trains them for mental stress classification using Random Forest and k-nearest neighbors on this dataset.− Proposal of a stress classification model using the EDA, HR, and SKT features.The model achieves stateof-the-art accuracy of 99.51% in classifying stress into three levels.− Proposal of a stress classification model achieving 99.98% accuracy using Time-Frequency Domain Features of the EEG data, higher than all previous works in stress classification using EEG.Roadmap: the rest of the paper comprises four sections.Section 2 discusses the data collection process used in this paper and the proposed methodology to analyze and process the collected data.Section 3 presents the details of the experimentation techniques.Section 4 describes the results of this work and a discussion.This is followed by section 5, which presents the conclusion and future work.

METHODOLOGY 2.1. Data acquisition 2.1.1. Participants
This study comprises 23 subjects (20 males and 3 females) between 17 and 25 years.The chosen participants are university students with a minimum formal education of 14 years.All participants are physically and psychologically fit, without any emotional or cognitive disability or any verbal or non-verbal learning difficulties.

Equipment
Emotiv EPOC+ Headset: the study records the brain's EEG signals using the EMOTIV EPOC+ device, a portable, high-resolution, 14-channel EEG system.The electrode's placement is according to the 10-20 International system covering the Anterior Frontal, Frontal Central, Temporal, Parietal, and Occipital regions of the brain.The EEG device uses 14 channels to record raw EEG signals, namely; AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, and AF4.The positions marked in red represent the placement of the 14 E4 Empatica Wristband: the E4 Empatica wristband is a wireless wristband designed to collect realtime data.The E4 features a photoplethysmography (PPG) sensor, an EDA sensor, and an infrared thermopile.The PPG sensor provides a blood volume pulse at a sampling frequency of 1 Hz, which helps obtain the heart rate, heart rate variability, and other cardiovascular signals.The electrodermal activity (EDA) sensor measures skin conductance levels at a rate of 4 Hz.The infrared thermopile measures the skin temperature at a sampling frequency of 4 Hz.

Procedure
The study uses MIST [20] to elicit a short-term stress response in the participant.MIST comprises several mental arithmetic questions with single-digit answers and varying difficulty levels.The standard MIST has three phases: rest, control, and experimental.Participants look at a blank screen with no tasks shown in the rest phase.The control phase presents a series of brief mental arithmetic tasks, including addition, multiplication, subtraction, and division, with various degrees of complexity.The participants need to answer all questions within a given time limit.Each test phase is highly adaptive, i.e., each question's difficulty level increases as the participants correctly answer more questions.The MIST protocol times the control phase but does not provide any feedback to the participant.Further, it instructs the participants to answer the maximum number of questions correctly and monitors their performance, thereby introducing an element of stress in the control phase.The experimental phase presents the same arithmetic problems with the same difficulty levels and a social threat component.Each participant's score is compared with the rest of the participants' average scores and displayed at the top of the screen every two minutes.It instructs all participants to get at least a minimum percentage equal to the global average to qualify.It also times each question and adaptively reduces the time for each question as the participants correctly answer more questions.As a result, this pushes participants to perform to their best ability.
Compared to the standard MIST, this study has two control phases, one experimental phase, and two relax phases.The study incorporates a training phase at the beginning to familiarize the participants with the test's keys and controls.Besides, there are small rest phases between consecutive control and experimental phases.The test takes a total of 45 minutes to complete.
The study labels the signals recorded during the two relax phases at the beginning and the end of the test, along with the two small rest phases in between, as the rest state or a state of no stress of the participant.It labels the two control phases as the state of moderate stress.The experimental phase, which induces more stress in the participants than the control phase, is labeled as a state of high stress.Figure 1 shows all the test phases and their duration.

Data processing
The study collects two sets of physiological signals from a subject, EEG signals and Non-EEG signals, i.e., EDA, HR, and SKT, and we apply separate processing steps to these sets of signals.

Feature extraction
The model extracts the EEG features in three domains for processing the EEG signals: Time Domain, Frequency Domain, and Time-Frequency Domain.It does not extract any features from the Non-EEG signals and uses them directly to train the machine learning models.The features extracted from the three domains of the EEG data are explained as follows.a) Time domain: previous works have used simple time domain features that include characteristics of a signal in the time domain [31].The proposed model calculates statistical features like minimum, maximum, mean (ȳ t ) and standard deviation (σ) given by ( 1)-( 4).
M aximum = max(y(t)) (2) Machine learning-based stress classification system using wearable sensor devices (Varun Chandra) The model also considers Hjorth parameters, which include: activity, mobility, and complexity [32].The activity parameter reflects the EEG signal's power and variance in the time domain.The mobility parameter represents the power spectrum's mean frequency or the proportion of standard deviation.The complexity parameter indicates a change in frequency.Hjorth parameters can be described by the ( 5)- (7).
b) Frequency domain: the model analyses signal by converting them to functions of frequency using the Fourier transform.Frequency domain features for EEG are highly correlated with mental workload activities [33], [34].This study considers five frequency domain features-spectral entropy, alpha by beta low power ratio, alpha by beta high power ratio, theta by alpha power ratio, and relative band power.Spectral entropy is a measure of signal irregularity.Tian et al. [35] demonstrated the efficiency of spectral entropy for studying mental workload.The spectral entropy of a signal is given by (8).
Here P (f ) is the normalised power spectral density (PSD) of the signal, which helps discover the power distribution of the time domain EEG signals over varied frequency ranges.It provides information about the cortical activation of different parts of the brain.The normalized power distribution of the signal in the frequency domain can be treated as a probability distribution.The Shannon entropy calculated from this distribution is called the spectral entropy of the signal.Relative band power measures the power in a frequency band expressed as a ratio to the signal's total power.We rely on relative band power instead of absolute band power since a stress response corresponds to the change in actual band power relative to the total power.The relative band is defined as (9).
Relative Band Power = Absolute Band Power Total Power Niemiec et al. [36] found that during mental arithmetic activities, there is a reduction in alpha activity with a corresponding rise in beta activity.Beta waves can be classified into two categories: low beta waves, often associated with active, busy, and nervous thinking, and high beta waves, which are more pronounced when an unexpected stimulus is received.Hence, this paper uses the alpha by beta high and alpha by beta low ratios as features for the model.
Sammer et al. [12] showed that theta activity also increases with an increase in the cognitive workload.Hence, Theta by Alpha Ratio is expected to increase during stress.
Theta/Alpha = Theta Band Absolute Power Alpha Band Absolute Power (11) c) Time-frequency domain: the time-frequency domain analyses the signals simultaneously in both time domain and frequency domain.The model computes the time-frequency domain features using Hilbert Huang transform (HHT), which is a two-step process involving empirical mode decomposition (EMD) of a raw EEG signal, followed by Hilbert spectral analysis (HSA) for feature extraction.
-Empirical mode decomposition (EMD): it is a method of breaking down an EEG signal into distinct components known as IMFs.The decomposition of a signal y(t) into its IMFs is as: Int J Artif Intell, Vol.where x i is the IMF, n is the number of IMFs and r n is the residue after decomposition.We consider only the first four IMFs of an EEG signal.-Hilbert spectral analysis (HSA): HSA determines the instantaneous frequency of IMFs.EMD and HSA, when applied together, represent the signal's amplitude in the time-frequency domain.Obtained instantaneous frequency w(t) represents the rate of change of phase and is described as: where H[.] is the Hilbert transform, θ i (t) is the phase, and x i (t) is the IMF of the raw EEG signal.We extract two time-frequency features: average instantaneous frequency and the variance of the IMFs.

Feature selection
Data often have unnecessary and redundant attributes, which do not contribute to a predictive model's accuracy.At times, these attributes can potentially even reduce the model's accuracy.Feature selection identifies and removes these attributes.By reducing the dimensionality of the dataset, feature selection makes computation faster and increases the overall accuracy of the learning models.With 14 EEG channels and each channel capable of producing many Time Domain, Frequency Domain, and Time-Frequency Domain features, it becomes essential that we extract only the relevant features.For this purpose, we use the recursive feature elimination with cross-validation (RFECV), a technique introduced by Guyon et al. [37].RFECV is a method that selects an optimal subset of features for the model by performing multiple cross-validations and removing 0 to N features in every cross-validation.The set of features that obtains the highest cross-validation score is selected.Although this approach provides high-performance features, it can be very costly in computation.

EXPERIMENTATION
For each participant, the study records 720 seconds of data during the rest phase, 1,200 seconds of data during the control phase, and 600 seconds during the experimental phase.It considers only 600 seconds of data for each test phase to reduce any class imbalance.It equalizes the number of labels for the three classes.The model labels all the data points as rest (0), moderate stress (1), and high stress (2) based on the phase of the test.This work implements two machine learning classifiers, Random Forest and k-nearest neighbors, using the scikit-learn module of Python 3 on local computers.It uses the 10-fold cross-validation approach for training and evaluating the classifiers to ensure an unbiased comparison of models and GridSearchCV for hyperparameter optimization for the classifiers.Table 2 shows the different configurations of hyperparameters for the classifiers.

RESULTS AND DISCUSSION
The analysis of EDA, HR and SKT signals finds the trends of the signals during the stress state of a participant to be consistent with previous studies [14], [15], [17]- [19].Heart rate increases sharply during the experimental phase of the test when the participant is under stress.It dips sharply when the participant is brought back to the relaxation phase.Figure 3 illustrates the impact of stress on heart rate.It increases during the experimental phase and dips during the rest and relax phases.The bottom graph shows the average heart rate reading during that phase.
The electrodermal activity (EDA) uniformly increases when the participant enters the stress phases from the relax phases of the test, as shown in Figure 4.The electrodermal activity (EDA) shows a uniform ❒ ISSN: 2252-8938 increase throughout the test.The graph in Figure 4 shows the average electrodermal activity (EDA) during that period.The participant's skin temperature increases during the test's control and experimental phases (refer Figure 5).The skin temperature uniformly increases as the participant goes through various control and experimental phases.The graph in Figure 5 represents the average skin temperature during that phase.Some participants even show a uniform decrease in skin temperature at the end of the test during the relaxation phase.Meanwhile, from the analysis of EEG signals, we observe the highest theta by alpha ratio value during the experimental phase, followed by the control phase, and the least Theta by Alpha ratio values during the rest and relax phase (refer Figure 6).The model implements Random Forest and k-nearest neighbors for stress classification and compares their performance using two performance metrics; accuracy and area under the receiver operating characteristic curve (AUC-ROC or AUROC).Accuracy is defined as (15).
Here, the paper denotes true positive, false positive, false negative, and true negative as TP, FP, FN, and TN, respectively, and computes them from the confusion matrix of the classifications.3 presents the accuracy and the AUC-ROC scores achieved by both the classifiers on the four feature domains.A careful evaluation reveals that KNN performs best with N=1 and Minkowski distance as its hyperparameters.Random Forest achieves optimal results for the criterion set to 'gini', and max-depth is set to 'None'.The KNN classifier achieves an accuracy score of 66.32% and 97.35% in time and frequency domain, respectively, outperforming Random Forest in both domains in terms of accuracy.Random Forest only performs better than KNN in terms of AUC-ROC score in the Time Domain.In the time-frequency domain, both the classifiers achieve their best scores, with the Random Forest classifier giving slightly better scores than the KNN classifier.Random Forest achieves the best classification accuracy of 99.98% and an AUC-ROC score of 0.99 in all the domains.KNN achieves the best accuracy of 99.97% and an AUC-ROC score of 0.99 in all the domains.The results for this work are higher than the previous stress classification studies using EEG.This study demonstrates that time-frequency domain features are highly successful in stress classification tasks.The classifiers are also trained on the EDA, HR, and SKT feature pools.This pool attains better results than both time and frequency domain features and achieves very close results to the time-frequency domain features.With this pool, the Random Forest classifier achieves an accuracy of 99.51% and an AUC-ROC score of 0.99, which is higher than [24] and better than all the current works on EEG-based stress classification models.
Conclusively, in terms of domains, the highest accuracy, and AUC-ROC score are obtained in the timefrequency domain using EEG data.The classifiers trained on EDA, HR, and SKT feature pools also outperform classifiers trained on time and frequency domain features.The overall performance of the classifiers increases in the order, time domain < frequency domain < (EDA, HR, SKT) < time-frequency domain.

CONCLUSION AND FUTURE WORK
This paper proposes machine learning models for predicting stress levels in undergraduate engineering students while performing the MIST using different wearable physiological biomarkers.The dataset collected ❒ ISSN: 2252-8938 in this study comprises EEG, EDA, HR, and SKT signals from 23 participants.Random Forest achieves a classification accuracy of 99.98% and an AUC-ROC score of 0.99, whereas the KNN classifier achieves a classification accuracy of 99.97% and an AUC-ROC score of 0.99.The results using the time-frequency domain features of the EEG data are better as compared to the previous studies.Time-frequency domain features highly correlate with the stress response, and models trained on these features can attain near-perfect accuracy.The classifiers also perform much better on the EDA, HR, and SKT feature pool (non-EEG features) than the features extracted in the time domain and the frequency domain of the EEG data.Using the non-EEG features, Random Forest achieves the best accuracy of 99.51%, and KNN achieves an accuracy of 99.05%.The models trained on EDA, HR, and SKT feature pools achieve state-of-the-art accuracy.Results of stress classification using EDA, HR, and SKT signals are competent to the classification results using only EEG signals.Measuring and recording these signals using cheap, portable, wearable sensor devices with high accuracy is also more manageable.In the future, we aim to build and collect an even more extensive database and use deep learning and federated learning models for real-time prediction with data privacy.

❒
ISSN: 2252-8938 electrodes of the Emotive EPOC+ Device.It records the raw EEG signals from each channel at a sampling frequency of 128 Hz.

Figure 1 .
Figure 1.Overview of the test phases Figure 2 illustrates Int J Artif Intell, Vol. 13, No. 1, March 2024: 337-347 Int J Artif Intell ISSN: 2252-8938 ❒ 341 the methodologies for building stress classification using EEG and non-EEG signals.The upcoming sections explain the data processing steps for these signals in detail.

Figure 2 .
Figure 2. Methodology used for building stress classification models

Figure 3 .
Figure 3. Impact of stress on heart rate

Figure 6 .
Figure 6.Average theta by alpha ratio of participants using AF3 (dominant) channel [27] ECG, and GSR features.Xi et al.[26]achieved a lower accuracy with SVM using only EEG features.However, they did not explore Time Domain features.Al-Shargie et al.[27], used MIST

Table 1 .
Table 1 compares the related work for stress classification using physiological signals with mental/academic tasks.Comparison of work for stress classification using physiological signals with mental/academic tasks

Table 2 .
Range of hyper-parameters for different classifiers

Table 3 .
Results of classifiers for different feature sets