Towards a system for real-time prevention of drowsiness-related accidents

Trafﬁc accidents always result in great human and material losses. One of the main causes of accidents is the human factor, which usually results from driver’s fatigue or drowsiness. To address this issue, several methods for predicting the driver’s state and behavior have been proposed. Some approaches are based on the measurement of the driver’s behavior such as: head movement, blinking time, mouth expression note, while others are based on physiological measurements to obtain information about the internal state of the driver. Several works used machine learning / deep learning to train models for driver behavior prediction. In this paper, we propose a new deep learning architecture based on residual and feature pyramid networks (FPN) for driver drowsiness detection. The trained model is integrated into a system that aims to prevent drowsiness-related accidents in real-time. The system can detect drivers’ drowsiness in real time and alert the driver in case of danger. Experiment results on benchmarking datasets shows that our proposed architecture achieves high detection accuracy compared to baseline approaches. This is an open access article under the CC BY-SA license.


INTRODUCTION
According to Moroccan National Road Safety Strategy 2017-2026 and the World Health Organization, road accidents are a major public health problem globally.Traffic accidents are one of the leading causes of death and injury worldwide.Each year, more than 1.35 million people die in road accidents and millions more are injured or disabled.On average, about 3,500 people die and 12,000 are seriously injured in road accidents in Morocco each year, and on average 10 people die and 33 are seriously injured every day.With this in mind and in collaboration with all road safety stakeholders, Morocco has decided to implement the National Road Safety Strategy (2017-2026) to prevent all forms of road accidents.This new strategy defines a more ambitious long-term vision to develop 'responsible action in Morocco'.We have set an ambitious and quantifiable goal to halve road deaths by 2026 compared to current levels (less than 1,900 road deaths in 2026).
Advanced driver assistance systems (ADAS) consist of the use of advanced technologies to assist the driver of a vehicle [1].An ADAS allow to collect data inside and outside the vehicle as well as to perform technical processing such as identifying, detecting, and tracking static and dynamic objects, making it an active safety technology that enables drivers to detect potential hazards in the shortest time possible in order to attract attention and improve safety.Vehicle recognition and tracking, lane line recognition, traffic sign recognition, pedestrian recognition, and other general functions of visual ADAS are examples.Since ADAS have various Ì ISSN: 2252-8938 built-in functions, one of the limitations imposed on the module presented in this work is to distract the driver and avoid triggering false alarms that cause the driver to turn off the ADAS. is to Using a non-intrusive system that can detect driver drowsiness from a series of images, which is currently a difficult task.Currently, the main research results regarding driver behavior modeling technology include the following.Cai et al. [2] proposed the concept of driving caracteristics map.Miyajima and Takeda [3] used road driving data to develop to model driver behavior.This technique has been implemented using statistical machine learning methods.Angkititrakul et al. [4] proposed a probabilistic driver behavior model using Gaussian mixture models.Shi et al. [5] proposed to assess driving style by normalizing driving behavior based on personalized driver modeling [6].To quantitatively assess driving style in this way, an aggressiveness index is proposed that can be used to identify abnormal driving behavior.Taniguchi et al. [7] proposed an unsupervised learning method that was established based on the original double articulatory analyzer model.This method predicts possible driving behavior scenarios by segmenting and modeling incoming time-series data about driving behavior.Okuda et al. [8] proposed probability weighted autoregressive exogenous models.The autoregressive exogenous models are composed of probability weighting functions.This model can represent real-world driving behavior.
This work begins with the following assumptions: An on-board camera captures an image from the front of the driver and analyzes it using artificial intelligence (AI) technology to determine if the driver is feeling sleepy.The system can then warn drivers in real time to prevent possible accidents.The implementation of the proposed solution consists of a combination of feature pyramid network (FPN) [9] and Deep Residual Network ResNet [10].The model can recognize patterns in a series of images, so it can predict whether the driver is tired.FPN is one of the most popular feature fusion techniques for solving multiscale problems in object detection.
The next section of this paper reads: section 2 provides an overview of recent research on driver drowsiness detection using deep learning.The proposed method for the drowsiness detection system is described in section 3, and the experiments and results are presented in section 4. Finally, section 5 analyzes the results and identified problems, proposes possible improvements and future work directions, and section 6 presents the conclusions of this work.

RELATED WORK
Human eye blinks can be categorized into the following types: reflex, involuntary, and voluntary blinks [11].Reflexive blinks are evoked by stimuli such as loud sounds and bright lights.Involuntary blinks occur unconsciously, while involuntary blinks are consciously generated by the cue [11].Blinking was used to detect drowsiness and concentration, eye fatigue and dry eye [12], [13].It is the involuntary, momentary or brief closure of both upper eyelids in a coordinated manner.

Eye aspect ratio-based driver's blink detection
The quick closure and reopening of the human eye are known as blinking, and each person has a unique blink pattern.Blinking lasts around 100-400 milliseconds.When it comes to detecting whether a driver is tired, the status of his or her eyes blinking is crucial.The blink condition of the driver is used to detect whether he or she is drowsy or not.
The first step in blink detection is to obtain and crop the face from the frontal image.This step uses the well-known facial recognition library Dlib [14].Dlib is based on facial features, which are points on the face that represent features such as eyes, mouth, and nose.Provides a pre-trained facial feature detector that can detect 68 points on a face.Six of them are related to the eyes as shown in Figure 1.By calculating the ratio of the Euclidean distances between the six eye coordinate points using the formula given by (1).We can determine whether a person is blinking.This ratio is known as the eye aspect ratio (EAR) [15].
Where Table 1 dipict a summary comparative study of recent deep learning aritectures that was designed for the problem of driver drowsiness detection and trained on public well known images datasets.We present works that focus on the real time aspect.Despite these advances in the use of deep learning architectures for the drowsiness detection, there is still enough space to improve the accuracy using new advances from object detection and classification.

METHOD
In this section, we describe our proposed drowsiness detection system.Countless people are out and about day and night.Drivers and long distance drivers suffer from sleep deprivation.Therefore, drowsy driving is very dangerous.Many accidents are due to driver fatigue.To avoid these accidents, we built a system with FPN and ResNet architecture to warn drivers in case of fatigue.

Steps of our project
The camera continuously captures video in front of the driver, creates a region of interest (ROI) in each frame of the video, detects eyes in the ROI, and feeds it into a classifier.The classifier detects whether the driver's eyes are open or closed.A score (number of consecutive frames with eyes closed) is then calculated to Towards a system for real-time prevention of drowsiness-related accidents (Abdelhak Khadraoui) detect whether the driver is drowsy or drowsy.In case of danger, you will be alerted in real time.The Figure 2 down shows the pipeline of the proposed system for real-time warning of the driver in case of drowsiness.If no face is detected in the image, the first face in the list returned by the detector is used.In the next step, face key points are derived using shape predictors provided by Dlib.To create a shape predictor, first upload the trained database to Dlib [14] these are provided for shape predictors.There are many data sets for detecting these landmarks, but we will use intelligent behaviour understanding group (iBUG) dataset, which contains 68 face detection points [28].After the shape predictor is built, given the face and the grayscale image of the face object provided by the frontal face detector, it can determine and return the locations of these 68 facial features.This can be used for subsequent processing.The face/eyes detection is done in three steps: i) detect and locate the face in the image; ii) detect and locate the 68 facial landmarks; and iii) derive the 6 landmarks of each eye in Figure 1.

Classification
Finally, as before, we predict two classes as outputs: Eyes-Open and Eyes-Closed.Eyes-Open means the driver is conscious and not fatigued.Eyes-Closed indicates that the driver is in danger of becoming drowsy in a short period of time.If he has more than 10 consecutive captured frames with his eyes closed, it means that the driver is suffering from severe drowsiness or fatigue and should take a break soon.

Databases
The MRL Eye Dataset [29]: The dataset is collected from 37 persons (33 men and 4 women) some of them with glasses.This collection includes both low-and high-resolution photos that were all taken in varied lighting situations.Several trainable features or classifiers can be tested using the dataset.Examples of open and closed eyes from this dataset are displayed in Figure 3.  2) minimize the number of false positives.The idea is that the system should only warn the driver when he is actually feeling sleepy, to avoid false alarms disabling ADAS because the driver is bored.It is important for the driver to set the frame rate that the camera will send to her ADAS.A high frame rate will overload the system with more frames per second (FPS).A low frame rate can have a negative impact on system performance.In this range, the number of frames per second should be sufficient to perceive detail in image sequences of very short duration.B. Wink.A frame rate of 10 FPS is used in this work, as the average blink time is between 100 and 400 ms.This is enough to detect blinks and does not overload the system.Figure 4 depicts our proposed network architecture.We used ResNet50V2 [10] as the backbone network.The FPN [9] we used is similar to the original version except for concatenated layer.To solve the challenge of detecting objects at various scales, recognition systems (particularly object detection) utilise pyramid feature networks, which were first developed by Tsung-Yi et al. [9].The goal is for the network to learn the features of the same object across a broad range of scales in the dataset by using a pyramid of the same image at various scales.Consequently, a feature extraction network that creates feature representations at various scales can be described as a FPN.In our work, we adopted FPN was proposed as an object detection networks, for image classification (i.e., Eyes-Open or Closed).Many recognition systems use residual networks ResNets [10], a traditional state-of-the-art deep neural network, as their foundation.ResNet has been demonstrated to perform well when used for classification problems.
Towards a system for real-time prevention of drowsiness-related accidents (Abdelhak Khadraoui) Two final features are extracted by the FPN, each of which represents an aspect of the input image at a different scale.After that, a dropout layer was put in place (to prevent overfitting), and then the first classification layer.The SoftMax function is not appropriate since it determines the position of each output neuron based on the ratio of the positions of the other output neurons, making the relu activation function more appropriate.The construction was completed by connecting two categorized layers, each made up of two neurons, to create a thick layer of ten neurons.The final classification layer to which the SoftMax function was applied was then connected to this layer.The network uses several categorization results based on various scaling features when performing this function.This enables the network to classify the photos more accurately.

Model training
The two datasets we used contain training, and test data.We trained the model up to 20 epochs.For network training, we used ResNet50V2 as a pretrained model with its weights to speed up the convergence.In the fitting phase we adopt Adam as optimizer and the categorical cross entropy as a loss function and 0.0001 as a learning rate 0.0001.We also used data augmentation techniques to make learning more efficient and prevent network overfitting.

EXPERIMENTAL RESULTS
In this section, we will report the image classification results of the network on the test set.We implemented the algorithm and network in the Google Colab Notebook and used the Keras library Ten-sorFlow backend for developing and operating deep networks.The results for each dataset are reported in Table 2 in terms of speed (in frame per second), precision, recall, F1-score and accuracy.Note that, in the first dataset, images have 3 color channels: red, green, and blue (RGB) while in the second dataset images are in gray scale only.

Discussion
The average results of the two models in the single image evaluation phase on Table 2 show that the models achieved an overall accuracy of 98.17 %.Our research shows that while our model detects eyes very carefully, regular models such as CNN and OpenCV judge all similar eyes and misjudge open rather than closed images.to identify it.So the model achieves high accuracy overall.Table 2 shows the average results of our model on two datasets, properly classifying 931 out of 970 images, which is an acceptable value.It also shows that our model runs almost as fast as the other models in Table 1, and the processing speed is good.Since a sequence of images our dataset has images with a resolution of (80,80) in most cases, this system can process them in about 100 to 132 milliseconds.The reason for this is that the algorithm for selecting the photographic images can differ in each of the dataset image sequences.Thus, the processing speed changes more than expected.What makes this work credible is that it has been developed and tested in real conditions.By viewing the input data as a series of images or videos (not a single image) evaluated on a large dataset, exhibiting high accuracy, few false positives, and good inference speed.We hope our joint dataset and code will help other researchers improve his AI models and use them to diagnose driver behavior.

Model evaluation
What makes this work credible is that it has been developed and tested in real conditions.By viewing the input data as a series of images or videos (not a single image) evaluated on a large dataset, exhibiting high accuracy, few false positives, and good inference speed.We hope our joint dataset and code will help other research.This model uses video as real-time input to the scoring process to determine if subjects are feeling sleepy.Figure 5 shows the frames of the video used.The camera is not in the same position as the standard computer camera (aloft the screen) that the model was trained with.This made eye tracking very sensitive.Also, videos were recorded at different frames per second (15 and 30 fps).This is different from the frame rate used here to train the model ( 10

CONCLUSION
In this work, we introduced a fully automated system for detecting driver drowsiness from an open and closed eye dataset.We suggested an image processing approach in the first stage to filter the appropriate images of the driver's face.This algorithm improves the network's accuracy and speed.To improve classification, we implemented a new deep neural network in the next step.This network can be used to increase the accuracy of various classification issues, especially for photos carrying essential information.Improve accuracy, particularly in images with crucial small-scale objects.To classify drowsy and non-drowsy database images, we trained three different deep convolutional networks.Our model outperformed the others by utilizing ResNet50V2, a modified pyramid network, and the designed architecture.We used the trained networks to run the fully automated identification system after training.Our model achieved an overall accuracy of 98.17% for single image classification (the first method of evaluation).In the driver state identification step (sequence of images or video), our model performed better than the other systems; correctly identifying approximately 463 images out of 502 images as drowsy.We also used the webcam, which continuously captures and observes the driver's eyes, to test the classification's accuracy.Based on the results, the proposed methods can improve drowsiness detection while being fast enough to be implemented in an ADAS system.Finally, we mention that this system can be used in other industrial situation where the awareness of the employee is crucial to avoid accidents.

p 1 ,Figure 1 .
Figure 1.The six landmarks (two-dimensional coordinates) representing the eye automatically detected by Dlib: in the case of an open eye (image on the left) and in the case of closed eye (image on the right)[15]

Figure 2 .
Figure 2. Pipeline of the proposed system for real-time drowsiness detection and driver notification

Figure 3 .
Figure 3. Examples of the MRL eye dataset

Figure 4 .
Figure 4. General overview of the proposed deep learning architecture for eyes classification

Figure 5 .
Figure 5. Evaluation using video from webcam: The first image on the left show face and eyes detection where the driver is not drowsy, the second image on the right show a drowsy driver with the alert message (coupled with sound in the video)

Table 1 .
Comparative study of recent deep learning based approaches for driver drowsiness detection, trained on benchmarking images datasets

Table 2 .
Classification results of our model on the two used datasets