Classification of customer churn prediction model for telecommunication industry using analysis of variance

ABSTRACT


INTRODUCTION
In the telecommunications industry, data mining techniques have been used to classify and predict client attrition.In the realm of e-commerce, customer churn is a significant issue.Due to cost constraints, conventional churn prediction models are simple and basic, based on available data from an e-commerce platform [1], [2].
Customers who haven't logged in or made any purchases are deemed lost.Customers who are at risk of being lost can be precisely identified with data mining and analysis, also, they can be restored in a timely manner using efficient marketing methods.Although customer acquisition is unpredictable, a customer churn predictions model can help e-commerce enterprises figure out what created a lack and how to eliminate it in the future.The most extensively used customer turnover prediction model is the regularity, repetition, and financial value approach [3], [4].The model has divided customers into groups groupings characterized by three index values: relevance (once customers can directly make most latest acquisition), recurrence (how often the consumer makes transactions), and financial value (how much the purchaser gets to spend on each purchase), and then gives e-commerce businesses the operational measures they need to keep potential customers and enhance profitability.Way of categorizing, but on the other side, has a significant wait than some other types of client information, especially data on browsing habits [5], [6].Furthermore, various clients' repurchase cycles are different.As a result, the recency, frequency, monetary (RFM) model's analysis is too straightforward and straightforward.
The term "churn" was coined by combining the English terms "change" and "turn" to represent the phenomenon of a consumer leaving.It refers to the transfer of subscribers from one provider to another in the mobile telecommunications business (also known as customer attrition or subscriber churning).The rate of churn is used to measure it, and it is an essential metric for businesses.Churn usually occurs as a result of better rates or services offered by a competitor, or as a result of various benefits offered by a competitor when signing up [7], [8].
To thrive in today's telecommunications industry, you must be able to discern between consumers who are likely to migrate to a competitor.Customer churn prediction is a method of dealing with the problem, and it has become a significant concern in the telecommunications business.In such a competitive industry, and accurate method of predicting clients' future behavior would be considered invaluable.Telecom's technologies are typically based on statistical pattern-recognition algorithms.These sophisticated solutions dispute the common misconception that client churning is only a data-mining exercise [9], [10].The machine learning approach, on the other hand, is a rapidly expanding field of study.As a result, this study predicts that the telecommunications industry will catch up with the integration of nonparametric approaches and machine learning models in the next years.
Furthermore, a number of experts suggested existing machine learning methodologies for predicting consumer attrition in the telecom industry.A classification procedure is used in a huge percentage of these techniques [11], [12].Customer churn, on the other side, is a challenging task due to the inequitable distribution of its categories, with churner clients often being significantly fewer than non-churner individuals.Because of this issue, most individual machine learning approaches are ineffective for recognizing patterns [13], [14].As a result, several researchers identified artificial intelligence techniques for determining consumer turnover that combine two or more methods, one of which is utilized for information pre-processing prior to conducting the multiclass classification [15], [16].Some article suggests grouping the information into similar groups and then deleting some of them as a way to filter out misrepresentative information [17], [18].In this study, the support vector machine (SVM) is presented as a classifier to predict customer churn in a telecommunication sector, and the analysis of variance (ANOVA) feature selection method is developed to extract useful information from the telecommunications data.

METHOD
This study was conducted in stages, including data pre-processing, selection of features, and classification.It uses an open-source telecom customer churn dataset from Duke University saved in the Kaggle repository; it consists of 51,047 instances and 53 attributes "Customer ID, Churn, Monthly Revenue, Monthly Minutes, charges, calls, roaming, among others".Data processing (to remove distortion and inconsistency), data transformation (to merge several sources of data), data selections (to extract data relevant to the analytical objective from the data), and data processing were all performed on the dataset (by executing summarize or aggregation processes, data is converted and compacted into forms suitable for mining).In this study, ANOVA is used to select relevant information from the given dataset, SVM is used as a classifier to evaluate the results obtained and evaluated in terms of evaluation measures, Figure 1 shows the proposed model.1325 telecom dataset.A total of 10,149 characteristics were chosen.ANOVA increases its performance and scalability by deleting portions of features at a moment having little impact on the validity of the limited set of features as achievable, using approaches from artificial reinforcement of the procedure [19], [20].ANOVA improves feature selection by being "greedy," since the features selected of the highdimensional telecommunications information are picked by identifying the best group of attributes.That is, once a feature has been decided to be removed, it cannot be restored [21]- [23].The goal was to use a oneway ANOVA F-test to see if all of the various classes of Y have the same means as X [24].

Classification
Cortes and Vapnik (1995) invented SVM, a prominent widely utilized machine learning approach in a set of real research areas.The creditworthiness ground has been extensively utilized due to the advanced performance and roughly comparable expandability, especially in comparison to its relatively close predecessor artificial neural network (ANN) as well as other classification methods.The purpose of the SVM, dependent on the empirical risk methodology, is to lower the arbitrary limit of misclassifications.It is essential to use training examples to approximate a purpose for assessment in the SVM [25].
Its key premise is to visualize the data input into a high dimensional subspace, then generate a higher dimensional space aided by the support vectors to find the smallest possible margin between the two classes.The support vector's feature can be used to predict the new input sample labels.SVM uses a variety of functions (called kernels) to map input data into high dimension feature space, such as linear, radial basis function (RBF), polynomial, and sigmoid [26].
The discovery of intriguing patterns and information from massive amounts of data is part of the classification stage.The ANOVA features were passed on to SVM, which was used to further analyze the data and identify patterns in the data set, which was divided into churners and non-churners.Classification accuracy, F-measure, sensitivity, specificity, recall, and precision were all used to assess the technique's performance.

RESULTS AND DISCUSSION
Microsoft Excel was used to compile the churn data.The model was created in MATLAB 2015a, a fourth-generation programming language with object-oriented procedures.The dataset was tested, and it included 53 telecom experiments from Duke University, totalling 51,047 instances of expression levels.An ANOVA feature selection algorithm was used in this study to retrieve specific features from the telecom set of data, resulting in a total of 10,149 features.If the null hypothesis is true, the 0.5 p-value employed in the ANOVA feature selection technique is the likelihood of seeing a result (F-critical) as large as the one produced in the experiment (F0).Low p-values suggest that the null hypothesis is likely untrue.From a dataset supplied from Duke University on the Kaggle site, 10149 characteristics were picked using ANOVA.There was a statistically significant difference in group means as a consequence of the analysis.The average time it took each class to complete the spreadsheet activity is 0.05, which is the significant value.The selected features are processed for categorization.
The result of the categorization using the ANOVA-SVM is depicted in Figure 2's Scatter Plot.In addition, Figure 3, and Figure 4 shows the findings as a confusion matrix and the Receiver operating curve respectively.The confusion matrix depicts the true positive (TP), true negative (TN) false positive (FP), and false negative (FN) classes utilized by an SVM classifier to create performance metrics such as accuracy, sensitivity, specificity, precision, recall, and F-Measure.Figure 3 illustrates a confusion matrix graphic with the predicted class (Output class) in the rows and the true class in the columns (Target class).The diagonal cells correspond to observations that have been properly categorised.The observations that were incorrectly classified are represented by the off-diagonal cells.The number of observations as well as the percentage of the total number of observations are displayed in each cell.
In the far-right column of the plot, the percentages of all the occurrences anticipated to belong to each class that are correctly and incorrectly classified are shown.Two often used measures are accuracy (or positive predictive value) and false discovery rate.In the row at the bottom of the picture, the percentages of all the examples belonging to each class that are correctly and incorrectly classified are shown.Two often used measurements are recall (or true positive rate) and false negative rate.In the cell at the bottom right of the plot, the total accuracy is displayed.
The following values were acquired for the categorization process.TP=39, TN=18, FP=3, and FN=0.This was used to calculate the following metrics.Figure 4 shows the reciever operating characteristics curve for analysis of varian0ce-SVM.The SVM-RBF was developed using training data that has been reduced to the desired features from each algorithm.In terms of prediction accuracy rate, ANOVA performed well, with 95% accuracy on the test data.It was discovered that using ANOVA for feature selection increases the efficiency of the classifier algorithm significantly without lowering classification accuracy.In domains with a huge number of features, such as telecom data, the feature selection procedure is therefore made considerably more practical.As the number of samples accessible grows, this enhancement becomes even more critical.Table 1 summarizes the results of the ANOVA-SVM-RBF classification algorithm used in the telecomm unications industry to improve the performance of churned customer data.
The performance assessment of classifier using SVM-RBF on the Telecom dataset uncovers that the ANOVA feature selection approach achieved the considered necessary greater value in the sets of data on performance parameters such as accuracy, timing, sensitivity, specificity, and prediction, according to the findings of this study.The feature selection algorithm using ANOVA is very useful when the dataset has a large number of dimensions.That matter improves the performance of feature selection methods as well as the classifier algorithm "SVM" in terms of accuracy, sensitivity, specificity, and precision.

CONCLUSION
The greatest threat to the planet is complexity in the telecommunications realm.Churners and nonchurners alike now have new options thanks to the expansion of telecom data and the development of statistical methods.The basic technologies of client retention are feature selection and classification.They're both important for recognition and prediction.Limited to telecom data characteristics, many common solutions in this field still require further attention to overcome their drawbacks.The key characteristics of telecom data, as well as the main problems for researchers conducting telecom data analysis, are small sample size, high dimensionality, and class imbalance.Researchers in this sector rarely look on class imbalance when pre-processing datasets.This problem is solved in this study by using the ANOVA resampling method.For classification, SVM-RBF was used, which worked effectively by decreasing unnecessary processing costs for large-scale linear separable data like telecom data.This research can be expanded in the future to include more feature extraction algorithms and classifiers.

Table 1 .
Analysis of the classification of ANOVA-SVM-RBF for telecommunication industry ISSN: 2252-8938  Classification of customer churn prediction model for telecommunication industry … (Ronke Babatunde) 1327 Figure 4. ROC curve for ANOVA -SVM-RBF classifier