Ensemble learning model for Wifi indoor positioning systems

Received Aug 24, 2020 Revised Jan 19, 2021 Accepted Feb 20, 2021 WiFi indoor positioning researches have received much attention from researchers recently. In this research, we focus on studying the performance of indoor positioning systems that utilize our new proposed ensemble machine learning model. Our new ensemble learning model uses several models for normal data training and position prediction, then it uses the verification data together with its' prediction errors from trained models as the input data to train an intermediate classification model to classify which set of Wifi received signal strength indicator (RSSI) is the best match for each position prediction model. The experimental result shows that our proposed ensemble model outperforms other compared models.


INTRODUCTION
Location-aware services need position information to carry out a specific task. In outdoor environments, one can use the global positioning system (GPS) equipped devices to get position information, but it is hard to use GPS in indoor environments. Recent researches have shown that WiFi indoor positioning systems (WiFi-IPSs) are very promising for those services. Many Wifi-IPS algorithms and systems have been proposed so far, but we can categorize them as range-based and range-free algorithms [1]. The WiFi RSSI fingerprinting systems (or range-free systems) that use WiFi signals from the surrounding wireless access points (APs) to provide the object's location information. This method eases the deployment at a low cost, and they require no extra infrastructure. Researches have shown that WiFi fingerprinting technology using the received signal strengths received from WiFi access points is a very promising method for indoor positioning [2]. However, this method causes many difficulties as Wi-Fi RSSI suffers from multipath and shadowing interferences in indoor dynamic environments [3].
Therefore, the measured RSSI value is not stable and highly depends on the measuring environment and surrounding objects. RSSI positioning estimates also have relatively low accuracy and security [4]. In particular, when predicting the movement trajectory of a person or device indoor environment, the more mobility, the bigger error is. Many methods have been proposed to overcome this limitation. For example, the use of the average of many selected maximum RSSI observations [5]. It uses a smoothness index to test the quality of RSSI to select an appropriate number of RSSI observations. A multi-point fingerprint matching algorithm has also been proposed, in which the common single-point matching procedure is expanded into multiple points [6].

201
Traditional ensemble methods combine the predictions from several base estimators to improve the generalizability and robustness over a single estimator. In averaging methods, the principle is to build several estimators independently and then to average their predictions. In contrast, the boosting methods, base estimators are built sequentially and one tries to reduce the bias of the combined estimator. Singh et al. [7] propose a method using an ensemble of classifiers on weighted averages of WiFi RSSI values within a time window to localize a user in an apartment. Akram et al. [8] combine gaussian mixture model (GMM)-based soft clustering and random decision forest (RDF) ensembles on WiFi fingerprints to solve the indoor localization at both room-level and latitude-longitude prediction. Lee et al. [9] employ random forest ensemble learning method on WiFi RSSI to locate the location of a user. Some authors try to employ the particle swarm optimization together with ensemble learning to solve the indoor localization using ultrawide-band signals (UWB) [10]. Some researchers try to make modification to traditional ensemble learning models or combine them in different ways to find out position of a user in indoor environments based on WiFi RSSI signals [11][12][13][14][15][16].
This research proposes a new ensemble model. In this model, an intermediate classifier is used to select the best model for each test point and its prediction from base models. The best model is defined as the model that gives the smallest prediction error. This allows the new model to choose the best estimator for each test data point flexibly.

RESEARCH METHOD 2.1. WiFi fingerprint positioning
Suppose that each location has a particular set of WiFi signal strength from APs measured in the offline phase, called an offline fingerprint, then the fingerprint got during the online phase is compared to the offline fingerprints stored in the database to estimate the position of an object. In the offline phase, each reference point includes signal strength measured from all accessible APs together with known 2D coordinates. When an object enters the region, they compare its current measured RSSI data and stored offline data to infer the position of the object [2,[17][18][19][20].
Wi-Fi RSSI Fingerprint data from surrounding access points formed a map for an area with some probability distribution of RSSI values at each given location (x, y). In most methods, RSSI values of the online phase are compared with those stored in the database in the surveying phase to find the closest match, then the position (x, y) is predicted based on this match [21]. A fingerprint is a set of signal strengths from surrounding access points over time at a given location (called reference point). These fingerprints have some relation to locations associated with them, so they can be used to distinguish those locations. When applying machine learning models to this kind of problem, most methods perform two separate phases. In the first phase, the training phase, multiple WIFI RSSI fingerprints scanned at each reference point together with its coordinates are used to train the learning model, and they are also recorded to the database for future uses. In the second online phase, it forwards a new scanned RSSI fingerprint at the unknown location to the trained model to predict the unknown position (x, y). The most commonly used estimation method is the K-Nearest Neighbor (KNN). More complicated methods include support vector machine (SVM), deep neural networks (DNN), the hidden markov model (HMM), and Gaussian Process Assisted are also implemented [2,[22][23][24][25]. Now, let us formalize the methods using a mathematical model. Assume that, the signal strengths at N points are measured together with their coordinates, respectively in a room (maybe a squared division or randomized). Normally, people will use 80% of that data set for training and the remaining 20% is used for testing purposes. Each point Pi in Figure 1 The point ′ coordinates are predicted based on the K nearest points. More complicated algorithms such as deep neural network (DNN), support vector regression (SVR), and other modern learning models can be applied to give better accuracy but, we have to pay more intensive computation.

Proposed ensemble method
After training base models (KNN, DNN, and random forest (RF)), we use these models to predict test points' coordinates, we realize that prediction errors of a single point from different models are also different. Which means that a model is good for a subset of test points. Therefore, we may think about building an intermediate classifier to classify which a point is the best fit with a specific base model (KNN, DNN, and RF). Our proposed ensemble model is illustrated in Figure 2.  Use the selected best base model to make prediction and get the final best predicted coordinates for each test data point Table 1 shows some sample validation data points used to forward to base models. The corresponding minimum error is used to label the validation data that are used to train the intermediate classifier (SVC).

RESULTS AND DISCUSSION
The experiments are conducted in our lab room with a dimension of 10 by 10 meters, and the room is divided into grids of 9 by 9 points to measure the WiFi RSSI fingerprints of surrounding access points in the surveying phase. The room contains tables, chairs, computers, and other networking devices as well as human beings working in the room. In this specific scenario, we use RSSI values from 9 access points surrounding the room and nearby rooms (there are about 600 measured points). Validation dataset is extracted from the measured training points and they are excluded from training dataset. For the testing data, we further measure RSSI fingerprints of random points in the room. To evaluate the performance of our proposed algorithms, the Euclidean distance between the estimated and true location is used to measure the error. Let (xi, yi) be the true 2-D physical coordinates and (̃ , ̃ )be the estimated location of point Pi, respectively, then the distance error Errd is computed: For the experiments, we use python sklearn libraries with three base models with the hyperparameter configurations are: RandomForestRegressor (random_state=0, n_estimators=300), KNeighborsRegressor (n_neighbors=7, weights='distance'), and MLPRegressor (hidden_layer_sizes = (128,64,32,16), activation='relu', solver='adam', batch_size=3, max_iter=300, random_state=0). For the intermediate classifier, we use SVC (random_state=0). The VotingRegressor model uses the same three base models as in our proposed ensemble model. Figure 3 shows the mean error for each base model and the new ensemble model (the data is also showed in Table 2). From the figure, one can easily find that the new ensemble model gets very good accuracy in comparison to each base model used. Figure 4 represents numerical analysis about the base  Table 3). For each base model, the percentage number shows the rate at which a model gives the smallest error among three base models on the test data. The percentage number of the new ensemble model shows the rate that the intermediate classification model (SVC) correctly classifies the base model which gives the smallest error. In this specific scenario, it achieves 60.38% ratio (higher than that of each base model's).    The cumulative error distribution for each base model and the new ensemble model is illustrated in Figure 5. We also evaluated our proposed ensemble model with the other ensemble learning models such as VotingRegressor and ExtraTreeRegressor, and the comparison result is presented in Figure 6 (the data is also showed in Table 4Table 2). From the figure, it is very clear that our proposed ensemble model has very good accuracy in comparison to other ensemble learning models for this specific WiFi RSSI dataset. From the experiments, we also realized that the validation dataset should be large enough to thoroughly train the

CONCLUSION
The paper already analyzed our proposed new ensemble learning model. Compared to traditional ensemble models, our proposed model uses an intermediate classification model to train a validation data set, then the trained classifier is used to select the best model for each test data point. The intensive experiments have confirmed that our model has better accuracy in comparison to traditional ensemble models as well as base models.