Image-based Gramian angular field processing for pedestrian stride-length estimation using convolutional neural network

ABSTRACT


INTRODUCTION
A method to estimate a person's location without the support of any external infrastructure in the environment is known as pedestrian dead reckoning (PDR). This technique only utilizes the inertial measurement unit (IMU) sensors (namely, accelerometer, gyroscope, and sometimes magnetometer), which are attached or carried by the users. To obtain user relative position, three important values must be extracted, which are step event, stride length, and heading. Among the mentioned tasks, stride length estimation (SLE) receives attraction from many researchers because this information is valuable not only in positioning but also in activity monitoring, and gait analyzing [1].
The simple SLE method assumes that people's average stride length can be represented using a constant. This approach is of course not accurate because different people have different stride lengths. Many studies on SLE were done with advanced techniques and models, D´ıez in [2] did a survey and divided approaches into two classes: direct methods and indirect methods. In the scope of this paper, we examine  ISSN: 2252-8938 Int J Artif Intell, Vol. 10, No. 4, December 2021: 997 -1008 998 three main approaches: the first is the biomechanical methods, the second is integration methods, and the final is adaptive methods.
Biomechanical methods utilize gait analysis in their models like Miyazaki's [3], in which he used a gyroscope attached to the subject's thigh to measure the angles created by lower limbs during the walking motion. When evaluating the step length, the author assumed that the length of the subject legs is already known and two steps in the same stride are equal. Potential errors from assumptions are corrected by taking advantage of the relationship between stride length and walking velocity. Zijiska in [4] came up with a method to calculate stride using lower limb length and the difference in height of the center of mass (COM). Another approach was proposed by Weinberg [5], in which he used vertical acceleration to estimate the stride length. An attempt to improve Weinberg's equation like Kang in [6] where he set up another logarithm-based formula to combine with the original one with some added constraints.
The double integration model could be implemented as a strap-down inertial navigation system (INS). Li and Young in [7] used a 2-axis accelerometer and a 1-axis gyroscope placed on a subject's shank to collect movements. The walking motion is then segmented and converted into a world coordinate frame using the angle calculated from gyroscope readings. Kose and colleagues in [8] took an approach that used a wavelet-based decomposition method to detect and separate steps from each leg. Then they applied the Kalman filter and reverse integration to compute step length. Error in the method is compensated by removing the pelvic rotation from the model. Another implementation to correct the sensor error is using the null-velocity update point (ZUPT) to reset the integration which can be found in [9]- [12].
There are two types of models in the adaptive approach which are parametric and non-parametric models. Kim in [13] perform experiments to determine the correlation of stride length and the mean of accelerometer signal from that same stride. Considering the method proposed in [14], which focuses on the importance of frequency and its linear relationship with the stride length. A similar approach with more features added can also be seen in [15]. Methods utilized variance of accelerometer signals to use in their model can be found in [16]- [18]. Besides the linear model, Zihajehzadeh in [19] uses gaussian processed regression (GPR) to achieve a better result. Much recent research on using non-parametric models like the method in [20] took advantage of Neural Network using three different values computed from maxima and minima in each stride as features. Hannink in [21] also used CNN but the accelerometer and gyroscope signal are normalized to 256 samples per stride. Gu in [22] trained a Stack Autoencoder to learn important features from input data, then they are fed to a regression layer for stride estimation. Although much progress was made in estimating a person's stride length, existing methods still pose limitations. The drawback of the biomechanical methods is that some parameters are required to know beforehand, which might not be available. About double integration, the sensor position plays a major role, thus smartphones or other electronic devices may not be suitable. Finally, with adaptive models, feature selection is crucial because it has a great influence on the performance of the model. After having a relative position of a user, we may then combine with some indoor positioning methods to make absolute position prediction to be more accurate [23], [24].
We took a different approach to solve the mentioned problems and present a unique method to estimate stride length. First, we only use accelerometer data from the dataset collected by Wang in [25]. Second, it doesn't require knowing any information about users' height or leg length. Third, to reduce the task of feature selection and determine their relationship, data is preprocessed and converted to images using the GAF algorithm [26], which has been successfully applied as a time series encoder in [27], [28]. Finally, for the task of learning, we used the CNN model due to its flexibility and accuracy.

RESEARCH METHOD 2.1. GAF algorithm
In our research, we focus on exploiting the accelerometer due to its ability to collect data related to user walking motion. The raw output of an accelerometer can be described as time series and its patterns can be extracted to estimate subject stride length. To retain the features of the data, we took a new approach to present information using the GAF algorithm proposed by Wang in [26]. Wang algorithm is suitable for converting one-dimensional time series data into a two-dimensional array, which can also be interpreted as an image. The method is briefly described as follows.
Suppose that our accelerometer data is in form of a time series = { 1 , 2 , … }, where n is the size of X. First, we would have to rescale data in the range of [-1, 1] as (1). Where ̃ is the normalized value of ; max( ) , min( ) is maximum and minimum value of , respectively. The rescale data can be expressed in a polar coordinate system by using the following transformation.
Where ∅ is the angle, is the radius and N is the number of the data points. The cosine function would respond to input value in range of [−1, 1] as [0, ] This representation gave us another way to gain insights into time-series data. We can calculate the trigonometric sum/difference among sampling points to determine the time correlation between them. Gramian angular summation field (GASF) and Gramian angular difference field (GADF) are defined as (3), (4): We utilize this algorithm to transform sensor data into images and the procedure is illustrated in Figure 1.

Proposed stride length estimation method 2.2.1. Overall architecture of the method
We proposed a method for stride length estimation, which consists of three phases as shown in Figure 2. The first phase is data preprocessing which handles raw data from the accelerometer sensor through filtering, segmentation and convert the signal to images. Inside data preprocessing we have a module called time series to image conversion. Its task is to rescale the data, represent data in polar coordinate, then construct a GASF or GADF matrix, the input to the CNN is normalized by resizing the GASF matrix to a fixed size (128x128). The second phase is training the CNN model using the images and labels extracted from the training dataset. Details of the model will be described in the latter section. After training, we use that model to predict value from the testing dataset.

Data preprocessing
Raw accelerometer sensor data is subject to noise from the shaking of user motion. To reduce the noise, we apply the Butterworth low-pass filter with a cutoff frequency equal to 5 Hz and an order equal to 5. After accelerometer readings are filtered, they need to be divided into smaller segments. Most of the time, this task is performed by a step detector or step counter. To simplify this requirement, we assume that the data was already divided, and each segment presents one stride as can be seen in Figure 3. After segmentation, filtered data from each axis will be converted to an image using the GAF algorithm mentioned in the previous section. The procedure can be seen in Figure 4.   dimension. As stated in the dataset [25], the sampling rate is 100 Hz and each stride contains about 120 samples, so we chose the size of the image in one axis to be 128×128 to retain the features inside.

CNN architecture
Most of the task involving CNN for images is classification. However, in our case we want the output to be the stride length so CNN would be treated as a regression model. We designed a simple CNN model that consists of 7 layers. First, we apply a convolutional layer (with ReLU activation) to create the feature map of the detected features from image input, then to prevent overfitting we use a dropout layer (with a rate of 0.3) before features are flattened, we normalize them using a BatchNormalization layer. After that 2 fully connected layers are used followed by a neuron that has a linear activation function at the end of the model. Except for the last neuron, all layers utilize rectified linear units as their activation function. For better illustration, the CNN architecture of each layer is shown in Figure 5.

EXPERIMENTS AND EVALUATION 3.1. Distance estimation
To evaluate the performance when subject travel in large distance, we need to calculate the accumulated walking distance. The accumulative distance of the subject is computed as (5): where ̃ is the total traveled distance, ̃ is the estimation of ℎ stride and N is the number of strides.

Error evaluation metrics
To keep consistence among the error metrics used for evaluating, we adopted the evaluation metrics from the dataset [25]. The relative stride error is calculated as (6): where denotes the stride length relative error; ̃ are the actual stride length and the estimated stride length of the ℎ stride, respectively. The relative distance error is computed as (7): where denotes the walking distance relative error; ,̃ are the actual stride length and the estimated stride length of the ℎ stride, respectively.

Dataset
The dataset we chose for training and evaluation was created and presented by Qu Wang in [25]. In his dataset, 10000 strides and their parameters were recorded including readings from accelerometer, gyroscope, and magnetometer. To better illustrate the dataset, we analyzed the stride length-frequency distribution of the whole dataset in Figure 6(a).
From Figure 6(a), it is observed that most strides fall in the range from 0.2 meters and 3.5 meters, which is reasonable as the subject walking at different velocities. However, there exists a case when measured stride-length is above 3.5 meters and reaching nearly 30 meters. This happened because Wang's dataset covers several unique scenarios for example when users using escalators or elevators. If we keep those unusual data in the dataset, it could create a false pattern which can ruin the model. To prevent this, it is important to also implement an activity recognition algorithm to distinguish between different motion patterns and scenarios. However, the dataset does not provide us with the label of the movement type or subject walking environment, so it is not possible to classify subject unique cases. To simplify the problem that we studied, we filter out all the data that is not in the [0.2, 3.5] meter range. After filtering, the dataset has 7998 strides left and the distribution is shown in Figure 6 Next, we use the stride number to segment the dataset into a series of strides. This series is also labeled using the provided stride-length column in the dataset. As the accelerometer is our main concern, only signals from the accelerometer are used. To provide data for the training phase and evaluation phase, we split the data into the training set, validation set, and evaluation set. We use 5612 steps and 1403 steps for training and validating, respectively and the remains for evaluation.

Experimental result and analysis 3.4.1. Model hyperparameters and the performance evaluation
Our model was built using Keras library. We use Huber as the loss function of the model because it is better to outlier than others. For the optimization task, we try several optimizers and found that Adam optimizer is the best fit for our model. Besides, to prevent overfitting the model, early stopping was utilized. The summary of a model hyperparameter is shown in Table 1.  Figure 7 illustrates the mean absolute error (MAE) and Loss during the training process. The error and loss decrease rapidly in the first couple of epochs. Then as the iterations increase, the MAE became stable after 60 epochs while loss only needed 20 epochs to reach that state. The model gets the optimal performance after 92 epochs with the training loss, validation loss, training MAE, validation MAE values equal to 0.00381, 0.00141, 0.0515, and 0.05431 correspondingly. We evaluate the performance of our model using the prepared test set and plot the comparison of estimated stride length and the actual value in Figure 8. Figure 9 shows the result of some concrete strides from raw signals, to intermediate GAF images and corresponding stride-length prediction. From the figure, we can see that our proposed method gives the closest prediction value to the actual ones.

Comparison with other models
From raw signals, we calculate the root square of ax, ay, and az from the accelerometer sensor, and then apply low-pass filter to feed the signals to the GAF transformation.

= √ 2 + 2 + 2
For comparison, we implemented 4 models from Kim [13], Yao [16], Shin [17], and Weinberg [4]. These models can be briefly described as (8) where denote the maximum and minimum acceleration values, respectively; is the acceleration value at stride ℎ ; −1 and are the starting and ending moments of time at step ℎ ; is the stride frequency and is the acceleration variance of the step. , , , , are model coefficients identified during the training process. The Shin, Yao, Weinberg, and Kim methods are evaluated using the testing dataset prepared earlier which consists of 973 steps. We can clearly see in Figure 10 that Shin and Yao's estimation is scattered around actual value while Weinberg and Kim's method tend to overestimate. Details of the error over the walking distance of our proposed method with others are shown in Table 2. Over the distance of 1300.5799 (m) our proposed method and are only 4.4378% and 3.1756%, which is the smallest among others. This indicates that our model has the best performance while evaluating the error of each stride and over a long distance. From Figure 11, it is obvious that our proposed method achieves 80% of strides with the    Figure 11. CDF of proposed method and others

DISCUSSION
From the indoor positioning perspective, having an accurate estimation of stride length and travel distance opens up new possibilities as many tracking systems rely on SLE. Using our proposed method combined with state-of-the-art techniques for step detection and heading estimation, we can minimize the error during the process and achieve a highly accurate position of the current users. Furthermore, this study could also be used in the field of gait analysis and health monitoring as the stride length of a person is a valuable parameter to predict an impaired gait. The main limitation of our proposed method is that it depends on heavy computation. As the accelerometer data is under the process of conversion from time series to image and passing through the CNN model, it would take a considerable amount of time. This leads to a problem that it is difficult for mobile devices' hardware to handle such an amount of work. A better idea is to place the system in a centralized server to harness the processing power and reduced the load for mobile devices.
In the future, studies can be done on how to reduce the computational time of the proposed method to support real-time tracking applications. The relationship between stride length and data from other sensors like gyroscope and magnetometer could be investigated to further improve accuracy. Finally, the lack of dataset labels for training should also be addressed since inaccurate data could result in the model learning false patterns. Thus, sensor data collecting procedures for stride length need to be rigorously examined so that with special moving patterns, the model can tell the difference between them.

CONCLUSION
In this paper, we have proposed a new method for stride length estimation. By utilizing the GAF algorithm, we were able to transform the accelerometer sensor time-series data into images. Then a CNN model was designed to estimate stride length given images as its input. We trained and evaluated the performance of our model using a public dataset created by Qu Wang. Although this dataset did not satisfy our requirements in labeling, it provided us an indicator of how the model performs. Experiments were conducted to measure the performance of our model compared to Kim, Yao, Shin, and Weinberg models. The experimental results show that the proposed method is better than others. Our model achieved 4.4378% in relative stride error and 3.1756% in relative distance error, which is superior compared to the closest methods which are 8.7553%, 4.6804%, respectively.