Deep learning based biometric authentication using electrocardiogram and iris

ABSTRACT


INTRODUCTION
There is a tremendous growth in the field of security and privacy preserving techniques by means of authentication systems.These security systems are widely adopted in various real-time online and offline applications such as biomedical systems, cloud computing, computer vision.Generally, these authentication systems are classified as password-based authentication, multifactor authentication, certificate-based authentication and token-based authentication.However, these systems have their different applications, advantages and disadvantages.But when considering the human authentication system, then identifying the liveliness of human plays an important role, Therefore, in this work we focus on biometric authentication system.The term "biometrics" refers to our biological, behavioural, or physical traits, and it is seen as a legitimate substitute for passwords, signatures, and other forms of identification.Evidently, biometric systems are designed to automatically recognise or validate certain persons based on their physical, physiological, or behavioural attributes, such as their iris, face, stride, keystroke dynamics, or other characteristics.In order to increase security, convenience, and social inclusion as well as to offer prospective applications in several scientific and industrial domains, biometric technologies are widely being implemented.The current advancements in machine learning techniques also increased the reliability of these authentication systems.
The traditional authentication systems were based on the unimodal systems which considered single biometric system for authentication however these systems suffer from various issues such as noisy data, inter-class similarity, intra-class variability in different applications.Some of unimodal authentication researches include face authentication [1], [2], Iris authentication [3], [4], ECG authentication [5], [6] and fingerprint authentication [7], [8].However, these systems suffer from accuracy and afore mentioned related issues.Therefore, need of robust authentication system gained attention due their significant applications.In order to handle these issues, several researches have been presented which are mainly focused on developing the multimodal authentication system, for instance, Zhang et al. [9] presented a robust multimodal authentication system which is based on the voice and face data.Tarannum et al. [10] used a combination of IRIS, facial and fingerprint data to achieve the increased authentication performance.Currently, optimization-based methods also play important role in classification tasks by reducing the error in attributes and finding the best solution for dimension reduction and feature selection.Based on this concept, Sujatha et al. [11] used iris, finger vein, finger print data along with genetic algorithm-based optimization strategy.Similarly, the traditional classification methods suffer from several issues such as these methods rely on data pre-processing, overfitting, under fitting, speed, scalability and accuracy.Therefore, deep learning-based methods have gained huge attention in machine learning based applications.Hammad and Wang [12] presented a combination of ECG and fingerprint biometrics to develop the convolution neural network.Zhao et al. [13] used palm print and dorsal hand vein and introduced deep learning model for authentication.Alkeem et al. [14] adopted deep learning for ECG based authentication systems.
Electrocardiogram (ECG) verification, in comparison, is difficult to compromise without the user's knowledge.ECG biometric feature collection necessitates the use of specialized apparatus, such as an electrocardiograph, which makes this technique challenging to replicate.The ECG technique usually takes 10 seconds or longer to record ECG signals and reach an adequate degree of accuracy for authentication.ECG signal helps asses the heart's electrical conductivity as well as its cardio vascular alterations.There are two main types of data that an ECG provides.First, a doctor measures the time intervals on the ECG to ascertain how long the electrical wave takes to travel through the heartbeat.The amount of time it takes a wave to move from one area of the heart to another area of the body reveals whether electrical activity is regular, slow, quick, or irregular.Secondly, by monitoring the amount of electrical activity flowing through the heart muscle, a cardiologist may determine whether areas of the heart are too big or overworked.The normal ECG is depicted in Figure 1.According to this figure, we can observe several entities in the given ECG signal such as different waves such as P, Q, R, S and T wave.Based on this, several attributes can be extracted such as PR interval, ST interval, QRS interval and Q-T interval.Similarly, iris is a small, annular structure in the eye, regulates the pupil's size and diameter, hence regulating how much light reaches the retina and Figure 2 shows a sample iris image which shows that complete eye image is comprised of pupil, iris, eyelid, collarette and sclera.Thus, identification of iris becomes an important aspect to perform various tasks on iris Therefore, in this work; we focus on developing a multimodal authentication system by combining ECG and iris image modalities.Figure 2. Eyelid image [16] ECG and Iris-based authentication systems have their own set of challenges that must be overcome for their effective implementation.One of the primary challenges in ECG-based authentication systems is the need for high-quality ECG signals.Poor quality signals can lead to inaccurate detection of R-peaks, which can significantly impact the accuracy of the authentication system.Additionally, ECG signals can be affected by factors such as noise, motion artifacts, and electrode placement, which can further reduce the accuracy of the system.Iris-based authentication systems also face several challenges.One significant challenge is the need for high-quality iris images, as poor-quality images can result in inaccurate feature extraction.Furthermore, factors such as occlusion, dilation, and age-related changes in the iris can affect the accuracy of the authentication system.Another challenge is the need for proper illumination and focus during image capture, as variations in lighting and focus can impact the quality of the iris image.
The main aim of this research is to develop a novel and robust multimodal authentication system by considering ECG and Iris data.In order to achieve this, the proposed model uses feature extraction method where R peak detection and morphological feature extraction methods are employed on ECG signals.For iris, Gabor wavelet, gray level co-occurrence matrix (GLCM) and gray level difference matrix (GLDM) feature extraction methods are employed.Later, the obtained features are combined together to formulate the feature vector.Later, we trained a deep learning classifier by using convolutional neural network long shortterm memory CNN-LSTM approach.
As discussed before, the multimodal authentication plays important role in various real-time security applications.Several biometrics combinations have been employed to improve the robustness of authentication systems where ECG and Iris based authentication systems have been adopted widely.In this section, we present a brief literature review about these multimodal authentication systems.Regouid et al. [17] focused on ECG, ear and Iris to develop a multimodal biometric authentication system to overcome the challenges of traditional unimodal systems.ECG facilitates the liveness information, ear biometrics helps to obtain the rich and stable information, and iris features helps to ensure the promising reliability and accuracy.This scheme performs normalization and segmentation as pre-processing step.Later, 1D-LBP, Shifted-1D-LBP and 1D-MR-LBP features are extracted from ECG signal.Ear and iris images are transformed into 1D signal.Finally, the K nearest neighbors (KNN) and radial basis function (RBF) classifiers are applied to classify the users as genuine or imposter.
Jiang et al. [18] focused on incorporating the authentication for body area sensor networks by combining iris and ECG features.This process follows two-level authentication where first level focus on iris authentication and later ECG authentication is performed to improve the overall security.El-Rahiem et al. [19] presented a multimodal authentication system by using ECG and finger vein.The complete process is divided into three stages as: pre-processing where data normalization and filtering techniques are applied, feature extraction phase uses deep CNN model for feature extraction from ECG and finger vein.Finally, different classifiers such as KNN, support vector machine (SVM), artificial neural network (ANN), random forest (RF) and naïve bayes (NB) are used to classify the obtained features.Moreover, this model uses multicanonical correlation analysis (MCCA) to increase the speed of authentication.
Jadhav et al. [20] presented a multimodal biometric authentication system by using the combination of palm-print, iris, and face Hammad et al. [5] developed a combination of ECG and fingerprint-based authentication system.CNN model is used for generating the combined feature set and biometric templates are generated from these features.These features are trained and classified by using Q-Gaussian multi support vector machine (QG-MSVM) to increase the classification performance.Singh and Tiwari.[21] combined ECG, sclera, and fingerprint to develop a multimodal biometric system.This combination is carried out in two modules: i) decision-level fusion where combined Whale Optimization (WOA-ANN) is used to generate the sequential model and ii) score-level fusion model which uses salp swarm optimization-deep belief network (SSA-DBN) model.
Zhang et al. [9] developed face and voice based multimodal authentication system.This model uses local binary pattern (LBP) method for face feature extraction and voice activity detection (VAD) for voice sample analysis to increase the voice detection accuracy.This model also uses a feature fusion strategy to fuse the face and voice attributes efficiently.
Amritha et al. [22] emphasized on score level and feature level fusion to improve the accuracy of multimodal authentication system.This model includes ECG, face, and fingerprint for feature extraction and fusion.The score and feature level fusion are performed separately and later obtained scores are normalized by employing overlap extrema-based min-max (OVEBAMM) method.Huang et al. [23] reported that the performance of ECG based authentication system which is affected by the different types of noises and sample variation.To overcome these issues, authors presented local binary pattern-based feature extraction along with the robust multi-feature collective non-negative matrix factorization (RMCNMF) approach to address the noise and sample variation issues.This process helps to learn the latent semantic space with the help of Convolution non-negative matrix factorization (CNMF) method.Kumar et al. [24] focused on face and gait biometrics as multimodal authentication system.This approach uses principal component analysis (PCA) and deep neural network approach.The deep learning model replaced Euclidean distance with cross entropy function.The PCA is used for feature extraction and reconstruction of faces and DNN helps to improve the accuracy where final matching score is obtained by applying SoftMax function.

METHOD
This section presents the proposed solution for feature extraction and classification for multimodal authentication system.Figure 3

ECG feature extraction
This section presents the proposed solution for ECG feature extraction.In ECG signal processing, R Peak detection plays important role therefore, we present the R peak detection method.Currently, several methods have been described to detect the R peaks.In this work, we adopted the deep learning method and present an encoder and decoder-based CNN architecture for peak detection by segmenting the input signal.Figure 4 depict the architecture of encoder and decoder module.The encoder path is known as contracting path which contains repeated convolution operations followed by rectified linear unit (ReLU) and max pooling.This structure is used to perform the down-sampling operations.Similarly, the expanding path performs up sampling operations followed by convolutions and ReLU operations.In this work, we formulated the peak detection problem as 1D segmentation problem for R peak segmentation by using deep learning approach.The input dataset contains  input ECG signals which are normalized between +1 and -1.The output of this module is obtained as 1D segmentation map which contains the R peak locations in the given signals.The segmentation map obtained from processing the signal through encoder and decoder module is expressed as (1): Where  represents the encoder block,  represents the decoder block,  1 &  2 are the weight vector which are optimized by incorporating the binary cross entropy loss between actual and predicted R peaks.The loss function is given as (2):

−
Based on the R peaks, we obtain RR interval which can be used to approximate the P and T waves.The T wave lie next to 1 st R peak whereas P-wave is present in the 2 nd R peak in the current RR interval.

−
In order to select the T-wave, 15% of RR interval is added to its 1 st R Peak location and continue adding 55% of RR interval to the same location.

−
In order to select the P-wave, 65% portion of RR interval is added to the 1 st R peak location and continues adding till 95% to the same location.− P and T wave peaks are identified based on the highest value in their corresponding window.

−
The Q peak is estimated by identifying the minimum value in window starting from 20ms before R peak.

−
In order to select the S peak, the lowest value in R Peak is identified from its peak to 20ms after the peak.
Based on these peaks, we estimate several attributes such as PR interval, QRS detection, QT interval, QT corrected and vent rate (BPM).These parameters can be computed as follows: − PR Interval: it can be computed as (3): Where   denotes the sampling frequency,   represents the locations of R peak,   represents the locations of  peaks − QRS duration: it is computed as (4): Where  is the immediate 5 ms from signal which are added to the   and subtracted from   because QRS duration is identified from start of Q till end of S peak.
Where   shows the locations of  peaks, a constant 0.13 is multiplied with   , and  is subtracted from the

Iris feature extraction
This subsection describes he proposed approach for iris segmentation and feature extraction.The iris segmentation plays important role where we consider and inner and outer boundaries of iris.The Figure 5 depicts the inner and outer boundary region.In this, we extend our previous feature extraction technique for iris image authentication.Figure 5 For simplicity, it is assumed that the iris image acquisition devices extract the square region of image where center of iris image is closer to the center of square region.In order to obtain the boundary information, we initiate the iterative process from the center point of image which is denoted by (  ,   ).Then, the profile operations are performed in vertical and horizontal directions which are denoted as   and   , respectively.The center point is denoted by ( 0 ,  0 ).In order to find the boundaries, we focus on estimating the radius which is computed as (6).
Where  represents the left, right, top and bottom points of the image.Further, this image is processed to obtain the partial derivatives and normalized contour integrals.This includes Gaussian smoothing function, and smoothing scale which examines entire image repeatedly over the given parameter set ( 0 ,   , ).This can be expressed as (7).
In order to obtain the normalized output, all points which are inside the iris region are mapped to polar coordinates (, ) where  represents the interval [0,1] and  denotes the angle as [0, 2].Based on this remapping, we obtain the two circular edges which are represented in Figure 6.

Gabor-wavelet feature extraction
This portion of image is considered as region of interest (ROI) of the image which is used for further feature extraction process.In this work we apply Gabor -wavelet feature, GLCM [26], GLDM [27] and PCA [28].The 1D-Gabor filter can be expressed as (8).
Where  0 denotes the central frequency.In next stage, we convert the polar coordinates to Cartesian coordinates, thus, the resulting frequency domain can be expressed as (9).
Where  0 denotes the central frequency,   is the bandwidth controller for 1,   bandwidth controller for  1 ,  denotes the orientation of filters.Based on this, the output of Gabor filter is correlated with the input image and Gabor kernel _() and the Gabor filter can be expressed as (10).
Where  denotes the vector used to describe the wave.In this work we use  =  and five spatial frequencies with varied wavenumbers as   = ( ) along with 8 orientations from 0 to .

GLCM and GLDM feature extraction
Further, we apply gray level co-occurrence matrix (GLCM) based texture feature extraction which is applied on the segmented iris ROI.To extract the GLCM features, we consider different offsets ranging as 0°, 45°, 90°, and 135°.With the help of these offsets, we compute contrast, correlation, energy, homogeneity and entropy features.Table 1 shows the expression to compute features.
Similarly, we consider GLDM feature extraction model which is based on the absolute difference between two pixels given in a gray level and separated by a displacement  which is denoted as motion vector as (12).

𝛿 = |𝑆(𝑥, 𝑦) − 𝑆(𝑥 + ∆𝑥, 𝑦 + ∆𝑦)| (12)
The probability density function can be expressed as (12).Where ∆ and ∆ are integer parameters,   (, ) denotes the input image.In order to formulate the feature vector, we concatenate contrast, entropy, angular second moment from PDF.Further, the obtained feature is processed through the PCA approach which has stages: − In first phase, it computes the mean of each vector as (14): further, the obtained mean is subtracted from the data vectors to generate the zero mean vectors as (15): Where   denotes the zero mean vector,   represents the each element of the column vector and   is the mean of each column vector.− In next stage, covariance matrix is computed as ( 16): later, we compute Eigen vectors and Eigen values which are further multiplied with the zero mean vector to produce the feature vector.The Eigen vector is represented as ( 17): and, the obtained feature vector is expressed as (18): This complete process generates a fused feature vector which is used for training purpose.

CNN-LSTM classifier construction
In this section we briefly describe the CNN and LSTM model and present a combined architecture for classifier construction.The current advancements in machine learning field have reported the importance of deep learning approaches where CNN has been adopted widely to achieve the efficient classification accuracy [5].The CNN architectures are based on the feedforward neural network process and it is capable to extract the robust features from convolution structures.In this work, we adopt a single-dimensional convolutional layer with  number of filters and  number of kernels.For each input signal, the corresponding attributes are fed into the corresponding different layer and the obtained output is then processed through the bidirectional long short-term memory (BiLSTM) block to obtain the final classification.
The LSTM architecture consist of three different gates as input gate   , forget gate   and output gate   .Moreover, it contains a dedicated memory cell   which helps to learn the long-term dependencies in the given sequence, and hidden state ℎ  .These components of LSTM are represented as (19): Where   ,   and   represents the weights of neurons.Similarly,   ,   and   represents the biases,  denotes the sigmoid function and ⊙ represents the elementwise multiplication and tanh() is the hyperbolic tangent function.The traditional LSTM model is extended to formulate the bidirectional LSTM where two LSTMs operate simultaneously in forward and backward direction.The backward LSTM module is used to capture the past contextual information whereas forward LSTM captures the future information.The final hidden state can be expressed as (20):

RESULTS AND DISCUSSION
In previous section, we have described the complete proposed approach to process the ECG and Iris data to formulate a robust multimodal authentication system.In this section, we present the experimental analysis of proposed approach and compare its obtained performance with existing schemes.In this experiment, we have considered two different datasets as IIT Delhi Database and MIT-BIH dataset from Physionet.

Dataset details
IIT Delhi Database: this dataset is considered for Iris image analysis.It has been acquired in Biometrics Research Laboratory by using JIRIS, JPC1000, digital CMOS camera.All of the images in the currently accessible database are in bitmap (*.bmp) format and obtained from 224 users.The database has 176 men and 48 women, all of whom fall into the age range of 14 to 55.A total of 224 separate folders, each linked to an integer identity or number, make up the database's 1120 photos.All of the photographs in this collection have a resolution of 320 by 240 pixels and were taken inside.Figure 8 depicts some sample images from this dataset.
MIT BIH dataset: The MIT-BIH dataset encompasses 48 hours recording of two-channel ambulatory ECG signals obtained from 47 subjects.Out of these 47 subjects.Out of these 48 hours, 23 readings are chosen form 24 hours' ambulatory signals which includes mixed population of inpatient and outpatient as 60% and 40% subjects respectively.Remaining 25 readings are obtained from the same set of the signal which includes clinically vital arrhythmias.For simplification, these signals were processed through digitization phase which generates the digitized signal at 360 samples per second per channel.This digitization is done with 11-bit resolution over 10mV range.These signals are annotated by expert cardiologists and entire dataset contains 110,000 annotations.This dataset is further divided in a ratio where 70% data is used for training and 30% data samples are used for testing purpose.We perform random shuffling the given dataset to ensure that the data points are in a random order.This helps in preventing any ordering biases that might exist in the data.

Performance measurement parameters
The performance of proposed approach is measured based on confusion matrix calculation.The confusion matrix generated with the help of true positive, false positive, false negative and true negative.Table 3 shows a sample representation of confusion matrix.Based on this confusion matrix, we measure several statistical performance parameters such as accuracy, precision, F1-score by using proposed approach.Accuracy is the measurement of correct instance classification out of total number instances.The accuracy is measured:

Comparative analysis with combined feature extraction process
With the help of aforementioned performance measurement parameters, we measure the performance of proposed approach.In order to classify the ECG and iris dataset, we consider 50 samples where these data samples are stored in each folder which is allocated their corresponding class.We measure the performance in terms of precision, sensitivity, specificity, F1-score and accuracy for different classifiers.Moreover, we evaluate the performance for ECG and Iris separately and later measured the performance for combined feature extraction.Table 4 shows the obtained classification performance.

Comparative analysis with different attacks on the input data
In order to show the robustness of proposed approach, we incorporated several attacks to the original data and measured the performance of proposed model.For ECG signal, we consider three different types of noises as baseline wander, muscle artifacts and power line interference.On the other hand, we consider different attacks on iris data such as cropping, rotation, adding noise, image blurring.Figure 9 depicts different attacks on image data where Figure 10 For this experiment, we measure the outcome of proposed approach and compared with existing classification schemes.Table 5 shows the comparative performance in terms of precision and F1-score for different attacks such as Cropping, Rotation, Noise, and Blurring.
Similarly, we have measured the performance in terms of Sensitivity and Specificity by considering aforementioned attacks.The obtained performance is presented in Table 6.Finally, we present the outcome of proposed approach in terms of classification accuracy and compare it with traditional classifiers.Table 7 shows the obtained performance.Table 6 shows that the proposed approach is able to achieve desired accuracy for four different types of attacks and outperforms traditional classification method.

CONCLUSION
In this work, we have mainly focused on development of multimodal biometric authentication system with improved accuracy and robustness.The proposed approach considers ECG and iris data due to their significant advantages in biometric authentication for physiological and physiological behavior.The proposed model performs deep learning based segmentation method for R peak segmentation from ECG signals.Further, different morphological features are extracted with the help of this peak data.Similarly, wavelet, GLCM and GLDM feature extraction processes are applied to extract the robust features from iris images.Finally, CNN-LSTM based hybrid classifier is used to learn these patterns and classify the users as genuine or imposter.The comparative study shows the proposed approach achieves average performance as 0.962, 0.975, 0.978, 0.9710 and 0.985 in terms of Precision, F1-score, Sensitivity, Specificity, and Accuracy.However, this process is tested on limited number of attacks such as Cropping, Rotation, Noise, and blurring with less intensity of attack whereas real-time attacks could be more intense.Moreover, this work can be extended by incoproating multimodal authentication system along with liveliness detection of subjects.

Figure 1 .
Figure1.Normal ECG signal[15] . The feature extraction process uses deep learning model for robust feature extraction from raw input images.Further, modified group search optimization (MGSO) approach is Int J Artif Intell ISSN: 2252-8938  Deep learning based biometric authentication using electrocardiogram and iris (Ashwini Kailas) 1093 employed to obtain the optimized and dimensionality reduced features.The classification phase uses teacher learning based deep neural networks (TL-DNN) model to reduce the classification error.
depicts the complete process of proposed approach The proposed ECG feature extraction phase includes deep learning-based R Peak detection and further, several morphological features [25] are obtained based on the R peak.Later, Iris feature extraction stage includes Gabor-wavelet, GLCM, GLDM and PCA based feature extraction.These features are further processed through the CNN-LSTM classifier module to authenticate the users.

Figure 3 .
Figure 3. Complete architecture of proposed authentication system

Figure 5 .
Figure 5. Inner and outer boundary representation, (a) original image and (b) boundary detection

Figure 8 from
Figures 9(a)-9(f) depicts some sample of original signal, different noise types and combined signal with different noises.

Figure 8 .Figure 9 .
Figure 8. Sample images from IITD Iris dataset compute precision of the proposed approach.It is computed by taking the ratio of true positive and (true and false) positives.=  +Finally, we compute the F-measure based on the sensitivity and precision values, which is expressed: (a) shows the original image, Figure 10(b) depicts the image with cropping attack, Figure 10(c) depicts the image after rotation attack, Figure 10(d) is the noisy image and Figure 10(e) shows the image after motion blur attack.

Table 1 .
Feature computation formulas

Table 2 .
certain hyperparameters to improve the learning process.The complete process is performed for k fold cross-validation with k=5.These hyperparameters are demonstrated in Table2.Hyperparameters

Table 4 .
Classification performance for different classifiers

Table 5 .
Precision and F1-score performance for different attacks

Table 6 .
Sensitivity and Specificity performance for different attacks