Pedestrian detection using Doppler radar and LSTM neural network

Received Feb 20, 2020 Revised Apr 25, 2020 Accepted May 7, 2020 Integration of radar systems as primary sensor with deep learning algorithms in driver assist systems is still limited. Its implementation would greatly help in continuous monitoring of visual blind spots from incoming pedestrians. Hence, this study proposes a single-input single-output based Doppler radar and long short-term memory (LSTM) neural network for pedestrian detection. The radar is placed in monostatic configuration at an angle of 45 degree from line of sight. Continuous wave with frequency of 1.9 GHz are continuously transmitted from the antenna. The returning signal from the approaching subjects is characterized by the branching peaks higher than the transmitted frequency. A total of 1108 spectrum traces with Doppler shifts characteristics is acquired from eight volunteers. Another 1108 spectrum traces without Doppler shifts are used for control purposes. The traces are then fed to LSTM neural network for training, validation and testing. Generally, the proposed method was able to detect pedestrian with 88.9% accuracy for training and 87.3% accuracy for testing.


INTRODUCTION
Pedestrian detection is among the most vital safety element in an increasingly complex driver assist systems [1][2]. These ensure that the vehicle would detect pedestrians in blind spots and perform evasive maneuvres when required; whether through warning systems [3] or emergency braking mechanisms [4]. Thus far, numerous sensing approaches have been tested which include computer vision [5], laser scanner [6] and automotive radar [7] technologies. Each of these methods has unique capabilities that enable the vehicle to detect pending collisions with pedestrians [8]. For example, radar sensors enable the utilization of Doppler and micro-Doppler information obtained from body movements to identify and discriminate between signals reflected between from pedestrians and other targets [9][10], making them a suitable candidate for this purpose. Hybrid systems were also proposed to improve pedestrian detection capabilities [11][12], however radar sensors remain an attractive choise for their ability to obtain unique signature from reflected signals.
Embedded in these innovative sensing approaches are intelligent algorithms; designed to automatically detect and perform evasive maneuvres [13]. Thus far, the two-dimensional information acquired from imagery information has also been tested using advanced artificial intelligent models such as  [14]. Despite implementation of LSTM in CNN architectures, the systems still rely on two-dimensional imagery inputs which result in high computational requirements [8]. Two major issues have been identified. 1) The use of radar as the primary sensing element is limited since it relies only on the reflected time-domain signals from targets [15]. Current technology only adopts it as support to the computer vision and laser scanning systems. 2) With radar as the primary input, an LSTM neural network is most suited as the architecture is capable of extracting common features of approaching pedestrians from the sequential information [16][17]. These however, remain untested.
To solve the aforementioned problems, the following objectives are outlines. 1) The study proposes a relatively simple continuous-wave Doppler radar to characterize between approaching pedestrians and controlled condition. 2) The returning pulses deflected off the subjects will used as input to train, validate and test the LSTM recurrent neural network architecture. This paper is structured as the following. Section 2 describes on the data collecton and intelligent classification method used for the study. Subsequently, Section 3 discusses on spectral trace characteristics and subject detection using LSTM neural network. Finally, Section 4 summarizes contribution of the study and its prospective application for driver assist technology.

METHODOLOGY 2.1. Experimental setup and acquisition protocol
Data collection was performed at the Microwave Research Institute, Universiti Teknologi MARA. The equipments used include Agilent MXG Analog Signal Generator, KeySight FieldFox Microwave Analyzer, as well as transmitter and a receiver antenna. As shown in Figure 1, the radar system is placed in a monostatic configuration. An absorber is positioned between transmitter and receiver to minimize spectral leakage. A 10 dBm continuous wave with frequency of 1.9 GHz from the signal generator is transmitted by a Vivaldi antenna. The returning waves deflected from the approaching subjects are captured by the receiver and the information is converted to spectrum traces by the microwave analyzer. Spectral resolution is set to 500 Hz to allow observable Doppler shift signatures as subjects move closer to the radar setup. Eight volunteers have participated in this study. Subjects are required to walk along the specified path from Point A to Point B at moderate pace. The system captures the returning signal of approaching subjects until they pass right in front of the assumed vehicle's line of sight; thus simulating the situation in that would probably result in a collision. Each subject is required to repeat the trials twenty times and every trial will produce between six and eight spectrum traces.

Pedestrian detection using LSTM neural network
LSTM is an improvement of the recurrent neural network (RNN) used for modelling sequential data. Figure 2 shows the theoretical architecture of RNN with the recurrent layer unfolded into a network [18][19]. U, V and W are hyperparameters of different network layers. is the input and ℎ is the hidden state that grants the network memory ability. The different time instances are indicated by -1, and + 1. Through activation function, Γ 1 , the output of hidden layer with present information is transferred to the hidden layer of the next time instance as part of the input. The feedback preserves the information of preceding time instance to retain data dependency; thus, improving learning and abstracting from the sequential data [20][21]. The vanishing gradient issues during computation of back-propagation learning however, adversely affect the amount of distant memories to be transferred. Therefore, these restrict the capability of RNN for modelling long-dependency sequential information and not suitable to be implemented this study [22].  [20] To solve the vanishing gradient issue, LSTM neural network has been proposed. A standard LSTM block shown in Figure 3 is comprised of memory cell state, forget gate, input gate and output gate. The memory state plays a defining role throughout the entire chain in selectively adding or removing relevant information to the cell state through the three-gate system [23].   Initially as shown by (1), cell state, decide on information that should be discarded from previous cell state, 1 through the forget gate, .
Subsequently as expressed by (2), the input gate, identifies the information from input that should be stored in the cell state, . Input information, and candidate cell, is then updated through (3).
Subsequently as shown by (4), the combined candidate memory, ̃ and the long-term memory from 1 is updated for cell state, .
The output at present time instant, ℎ is then computed by considering both the output information and cell state, . These are mathematically expressed by (5) and (6).
Based on the aforementioned equations, , and each represents the forget gate, input gate, and output gate. are the input weights, are the recurrent weights, and are the biases for the respective gates and cell states. Γ 1 is hyperbolic tangent and Γ sigmoid function. Both activation functions are used to improve non-linearity of the network and can each be expressed by (7)  LSTM architecture that incorporates memory cells and regulated by the gating mechanism provides solution to the vanishing gradient problem of RNN. Thus, the improved network structure is capable to extract historical information and predicts future trend for long-term dependencies of sequential data. In this study, the input to the LSTM neural network is the spectral traces obtained from the spectrum analyzer. The output classes from hidden states are defined as indexes for pedestrian and the controlled condition. 70% of the data is used for training, 15% is used for validation, and the remaining 15% is used for testing [24].
The performance of LSTM recurrent neural network for pedestrian detection is assessed in terms of accuracy (Acc), positive predictivity (Pp), and sensitivity (Se). Acc is described as the ability of the system to correctly differentiate between approaching subjects and control condition. Subsequently, Se is defined as the ability of the system to correctly identify approaching pedestrians. On the other hand, Pp is described as the probability of that following a positive detection, the subject will be within the line of sight of the vehicle. Each of these parameters is expressed by (7)(8)(9), where TP is true positive, TN is true negative, FP is false positives and FN is false negative classification [25].

Spectral profiling using doppler shift signature
In theory, the signal deflected off subjects moving farther away from Doppler radar will exhibit longer wavelength than the transmitted signal. These should be reflected in the presence of secondary peak with frequency characteristics lower than 1.9 GHz. In contrast, subjects moving closer towards the Doppler radar will exhibit shorter wavelength than the transmitted signal. These could be be characterized by the presence of secondary peak with frequency higher than 1.9 GHz. Figure 4 shows a sample of the spectrum trace obtained from KeySight FieldFox Microwave Analyzer. Doppler shift can be seen visible at frequency higher than 1.9 GHz. The result is thus valid as the waves deflected conform to the characteristics of an approaching pedestrian.  To further confirm the collective pattern of acquired data, results from each sample are combined to form a composite display of spectrum traces. Figure 5 shows the overall spectrum traces for controlled condition. The results show a consistent pattern with a dominant peak at frequency of 1.9 GHz. The results were also compared with collective pattern of spectrum traces for approaching subjects. As shown in Figure 6, an increase in spectrum activity is detected at frequencies higher than 1.9 GHz. These provide a conclusive proof that the Doppler radar is indeed capturing the correct information from deflected signals. Figure 6. Composite spectrum traces for approaching subjects (N = 1108 samples)

Pedestrian detection using LSTM recurrent neural network
The spectrum traces which are assumed as sequential information is subsequently fed as input to the LSTM neural network. As shown in Table 1, satisfactory results have been obtained with 88.9% Acc for training, 88.9% Acc for validation, and 87.3% Acc for testing. It is also worth noting that both Se and Pp measures range between 76.8% to 99.4% when detecting between subject or controlled condition. These indicate that the model is capable of detecting approaching subjects by extracting the Doppler shift information from the respective spectrum traces.

CONCLUSION
The study initially sets out to 1) implement Doppler radar as primary sensing element for detecting approaching pedestrian, and 2) assess the performance of LSTM neural network for extracting sequential information from spectrum traces for distinguishing between incoming subject and controlled condition. Through a relatively simple experiment setup, the study was able to produce satisfactory results. First, the adopted radar system was capable of capturing Doppler shift signatures through the spectrum analyzer. Second, the LSTM neural network has proven capable of extracting the required information for detecting approaching pedestrians. While the overall detection accuracy is satisfactory, there is still opportunity for improvement. Based on the observation of spectrum traces, there are samples in which the Doppler shift is not prominent. Hence, these are presented as outliers that exist within the broad range of samples. Furthermore, the network had to rely on relatively small sample size for capturing relevant information. To overcome these problems, a larger pool of samples is recommended. This is to ensure that the LSTM architecture is capable of extracting the long-term dependency characteristics of the sequential information and successfully generalize the features of incoming pedestrians.