A novel approach to optimizing customer profiles in relation to business metrics

ABSTRACT


INTRODUCTION
People whose businesses are inextricably linked to the customer.Every customer has data, and the existence of this data serves a purpose in policy implementation [1].Understanding customer data is critical for running a business.A customer profile is a feature or variable that identifies a customer.They can be either psychographic or demographic in nature [2].Some examples include age, gender, occupation, place of residence, social class, and purchasing behavior.Profiles assist businesses in learning more about their customers, such as who they are, what they purchase, and where they purchase [3].
Companies can plan their marketing mix with this knowledge in mind.Businesses must create profiles in order to identify and measure the habits and characteristics of consumers in the target market.The majority of marketing strategies fail because they fail to target anyone while attempting to appeal to everyone.As a result, the company sells customers unnecessary products.Profile knowledge is used to segment the market.Businesses ISSN: 2252-8938  can use segmentation to match their products and services to the needs of their customers, thereby enhancing the success of their marketing strategies [4].Business intelligence (BI) is defined as a collection of techniques and tools for gathering and transforming raw data into meaningful information for business analysis [5].The digital era has truly changed society, as evidenced by the presentation of e-metrics business designs that make it easier for users (customers) to behave electronically and provide information that can be applied [6].Similarly, issues frequently arise, particularly when attempting to obtain information about how to predict behavior, particularly when working with large amounts of data.In the context of BI, the most important feature is the condition of the amount of big data.To produce predictive patterns of customer behavior, the concept of data mining must be investigated [7].Because there is so much data to manage, predictions will be made using the concept of strong non-parametric regression.
The method is used to track the origin of information or predict social behavior by analyzing customer profiles based on degree of behavioral similarity [8].Prediction optimization refers to the process of systematically predicting what will happen based on data from the past and present This connection is used to build models and forecast future events [9].Some of the regression assumptions are violated by the nonparametric regression model, resulting in a less accurate predictive value.Multivariate adaptive regression spline (MARS) is used in linear regression analysis to handle outlier data and to model the relationship between each multivariable that appears [10].Robust is a method for solving optimization problems with uncertain data that are only known to belong to one of several uncertainty sets in the presence of outliers [11].Robust's goal is to find optimal or near-optimal solutions for any given realization of uncertain scenarios [12].The goal of this research is to develop an optimization model for predicting customer profiles based on multiple predictor variables that can be used by competitive organizations to make decisions.

METHOD
Based on registered customers, mobile wallets generate data collections [13].During the processing stage, several stages were completed, including data processing using the multivariate adaptive regression spline (MARS) method is used to process data [14].MARS identified customer behavior as a determining factor in reducing data deviation.Robust detects and resolves outliers using deviations before optimizing the data.The prediction results will use a new model to perform performance on the data, and model validation will be done using the Confusion matrix [15].During the testing process, the Confusion Matrix is used to assess the appropriateness of calculation accuracy [16].Further tests were conducted with various data distribution compositions for training and testing, namely 55%:45%, 65%:35%, 80%:20%, and 90%:10%.This is done to obtain a more accurate model comparison.Figure 1 depicts the research steps.

E-Metrics
E-metrics refers to customer behavior that is tracked electronically (e-customer behavior) [17].As evidenced by the ability to present everything electronically, provide the technological era has brought about significant changes in society by providing convenience about how a customer behaves electronically and providing information that can be applied [18].The number and frequency of merchants with Variants vary according to the type of business actor, especially in the digital space [19].Each merchant's user Financial Technology is used in e-metrics to track behavior activities.Figure 2 depicts the e-metrics ecosystem.

Customer lifecycle
The customer lifecycle is a method of managing the various stages of the customer interaction process with the system in detail and comprehensive [20], namely the customer experience is stored and managed from start to finish, can be seen in Figure 3.This is required as a foundation for the system to learn about customer habits.Figure 3 explains that customers register, then make requests for the desired product, and then the data is investigated.Data investigation is intended to sort the data and determine whether or not the data is appropriate.Customers who pass the investigation can place product orders, transactions, and payments.Furthermore, the system will record all customer habits as a type of variable in data processing.A novel approach to optimizing customer profiles in relation to business metrics (Marischa Elveny) 443

Multivariate adaptive regression spline (MARS)
Multivariate adaptive regression splines (MARS) is a high-dimensional data modeling technique.The model takes the form of an extension in the basic function of the product spline, with the data determining the number of basic functions and parameters associated with each function.This procedure is motivated by recursive partitioning and has the capability of capturing high-order interactions or involving the greatest number of interactions in a few variables and generating a continuous model.Furthermore, when identifying the additive contribution associated with multivariable interactions, the model can be represented in a discrete form.In this study, the MARS method was used to analyze the data [21].To get the best model, follow these steps: − Ascertain that each variable (response and predictor variables) has the same value scale.MARS employs a piecewise linear expansion of the form

−
As a first step in determining the relationship pattern between these variables, a graphic plot is created each predictor variable and each response variable.For each predictor variable a general model of the relationship between the predictor and response variable attempts to construct a reflected pair Determine the number of available basis functions (BF), which should be two to four times the number of predictor variables.Due to the use of 9 predictor variables, the maximum number of BFs in this study were 14, 21, and 28.
− Determine the maximum number of interactions that can occur in this study (MI).are numbered one, two, and three Because more than three interactions will result in a model interpretation that is extremely complex.

−
Because there is no basis or fixed boundary for determining the minimum number of observations between nodes, use trial and error.

−
Run parameter tests on the best model using the minimum Generalized cross-validation (GCV) value criteria from all possible combinations of basis functions (BF), maximum interaction (MI), and minimum observation (MO) values.Generalized cross-validation (GCV) can be used to indicate that the MARS model does not fit.

−
Concurrently or partially test the MARS model's significance with the regression coefficient test (F test) (t test).

−
Explain the MARS model and the variables that affect it.

Function base (BF)
To estimate the response variable in this study, 9 predictor variables were used.There are one, two, or three interaction variables (MI) that can be determined.Because a model with more than three interactions will result in a very complex interpretation.We chose minimum observations (MO) of 0, 1, 2, 3, 4, and 5 [22].This study makes use of a nonparametric model, with (4).
Where h denotes the response variable and x denotes a constant.(a1, a1, ..., ap) T is the predictor variable, which is a stochastic additive component with a finite variance and a zero mean.Following that, the data is analyzed to determine the optimal number of subregions and functions.optimally in accordance with the realization of each subregion [23].This study can solve the optimization problem using node values as data points, but because the values are so close to each variable, the best Base Function Model is required to find competitive predictions, as shown in (4).(5).Specificity is defined as the proportion of correctly identified class 0 that is truly negative [25].The proportion of truly positive, i.e., the proportion of class 1 that is correctly identified, is measured by sensitivity.The following is the specificity and sensitivity equation: There is an error in the training data when calculating the sensitivity of the transaction value to the merchant, as shown in Table 2, and the results of the error calculation are shown in Table 3. Table 3 In forecasting, the mean squared error (MSE) is displayed, double-check the estimation of the error value.A low MSE is one that is close to zero, indicates that the forecasting results are consistent with actual data and can be used to forecast the future [26].The MSE value, on the other hand, is close to zero.In mean absolute deviation (MAD), it's used to figure out the average absolute or absolute error.MAD determines forecast accuracy by averaging estimated errors (absolute value of each error).The Root Mean Square Error measures the difference in the predicted value of a model as an estimate of the observed value.

Robust
According to (6), where R represents the random response variable and  = ( 1 ,  2 , … .  )  is the random vector predictor variable, and all of them contain "noise."For each input   : [27] We set the U1 and U2 uncertainties in the data model for robust as polyhedrals for input and output to avoid increasing the complexity of the optimization problem.U1 and U2 are determined by the set [28].The term "robust" can be defined in (7): U1 is a polytype with 2 N-M max with an angle W 1 ,W 2 ,… W 2N-M max based on the MARS error calculation results, look for outliers from uncertainty.To begin, make sure that the input and output variables are distributed evenly [29].The model was then built using Salford MARS Version 3. The noise (uncertainty) was then incorporated into the real input data in order to apply a robust optimization technique to the MARS model [30].General algebraic modelling system (GAMS) Studio37 is a powerful optimization solution.

Contribution model
The forward stepwise part of the MARS algorithm determines the node points between data points to obtain BFs.Increasing the number of endpoints results in an increase in the number of data points.As a result, complications arise.As a result, using clustering theory, statistics, and optimal theory, it is possible to determine the node points based on the data set.As a result, the model is built with − Consider increasing the flexibility index in order to improve customer habits.

−
Using learning phenomena in the manufacturing process, such as discounts at specific times.Process enhancements that result in an increase in customers over a specific time period.

−
The goal of taking into account features is to maximize flexibility while minimizing risk.
The ( 8) is a mathematical representation of the first objective function, which examines behavior through transactions.The model's second goal is to minimize the function for time or period management, taking into account the use of discounts at specific times as shown in (9).The third goal is to maximize the flexibility of all customer activities, beginning with the time, type of transaction, type of product, and transaction costs as shown in (10).The fourth objective function in (11) is intended to reduce risk based on distance between the customer and the destination merchant [31], [32].

Model implementation
From January to December, the number of products purchased is shown in Table 4.The table shows which products were purchased the most from January to June.While there was a decrease in sales from July to December.This can be used as a guideline when deciding whether to increase sales during these months.Table 5 explains that transactions per time are carried out at 07.00-22.30WIB.Needed to see the habits of customers making transactions at certain hours.where this can be used as a reference by merchants in opening and closing their shops.A novel approach to optimizing customer profiles in relation to business metrics (Marischa Elveny) 447

Analysis of user habits based on indicators
Analyzing user behavior based on transaction and merchant indicators is accomplished by examining user habits when transacting with a merchant [33].The study was conducted based on the estimated maximum limit.Looking at the number of variables that appear, there are 448.
The rupiah currency was used in this study.According to Figure 5, the number of transactions made by customers ranges from 0 to 120, and the total payment made ranges from 0 to 8,000 rupiah.Figure 5 shows that there were more than 80 transactions with prices ranging from 70,000 to 80,000 rupiah.It can be concluded that price is an important indicator in determining customer behavior patterns.

Results
After the model is trained, data performance testing is then carried out.The results of the analysis aim to find out how the comparison between testing data and training data affects the best results.Test data is done with a comparison of 65% test data and 35% training data.To validate the model developed in this study, the Confusion Matrix technique was used.By calculating accuracy, the confusion matrix is used to assess the suitability of the data testing process.Accuracy = TP + TN TP + FP + TN + FN (12) Information: TP = True Positive, predicts the correct customer profile for transactions TN = True Negative, prediction of customer profiles that do not transact FP = False Positive, prediction of registered customer profile, but the fact is the transaction FN = False Negative, prediction of registered customer profile, but in fact no transaction The confusion matrix uses measurements based on precision, recall, F1-score, and accuracy [34] and the results show that the distribution of the training data is 485 and the test data is 898, with a division of 65%: 35%.Get the best results.With a total accuracy of 84.5%.The findings of this study can be seen in Table 6 which explains the findings of the analysis based on the four functional models implemented in the data.The results show that M3 and M4 functions have a higher level, where the value is close to 0. The M3 function expands the size in terms of time management (period), which is for example the length of merchant opening time, the existence of discounts at specific times, has a significant impact.in the first objective function, which can increase the number of transactions Meanwhile, the M4 function, i.e., the distance between the customer and the merchant, has a significant impact.Customers are looking for nearby merchants because of the efficiency in terms of time and transportation costs.These findings can be used to help manage new businesses where innovation emerges from interactions between customers and merchants.This viewpoint is important because ongoing marketing innovations may require a rethink about competition and collaboration between business groups.can be seen in Figure 6 explains that customer and merchant relationships are based on flexibility, customer habits, processing time, discounts and rewards as well as flexibility in minimizing risk.As: − Customer interface: how downstream customer relationships are structured and managed − Costs and benefits of the financial model − distribution among stakeholders in the business model Figure 6.Analysis for business model innovation

CONCLUSION
The study's findings were obtained through various studies of information poured into the knowledge base, Specifically, MARS data was generated using a function basis to find maximum values with upper, middle, and lower limits, as well as interactions that serve as a match between input variables until simplification is achieved.Robust can optimize outliers in data, resulting in new models.The new model was built by considering four main functions: transaction behavior, time or period management, maximizing the flexibility of all customer activities, starting from time, transaction type, product type, and transaction costs, as well as risk reduction based on the distance between the customer and the destination merchant.dividing the data 65:35% gives the best results.This is reflected in the total accuracy of 84.5%, where the model explains that improvements to time management (period), such as the length of merchant opening time or the availability of discounts at certain times, have a significant impact.Meanwhile, the distance between customers and merchants has a significant impact.Customers look for the nearest merchant because of the efficiency of time and transportation costs.We undertook this review to increase the conceptual clarity around what a business model and an innovation model mean.The main contribution of this paper is to pave the way for shared understanding and how to innovate for the development of research that will support academics and industry practitioners as well as business people in making decisions by increasing customer profiles.

Figure 1 .
Figure 1.State of the art

Figure 4
Figure 4 depicts the number of transactions per time division starting at 07.00-11:59 WIB, 12.00-16:59 WIB, and 17.00-22.30WIB.Where this display shows the number of transactions at any given time.The highest number of transactions can be seen at 12:00-16.59WIB.

Figure 4 .
Figure 4. Number of transactions per time

Figure 5 .
Figure 5. User behavior based on transactions

Table 1 .
Function base value

Table 2
displays the sensitivity and specificity values generated by the model's interpretation.Table2is generated based on

Table 2 .
Sensitivity dan specificity

Table 3 .
Results of the MARS data training error calculation data model A novel approach to optimizing customer profiles in relation to business metrics (Marischa Elveny) 445

Table 4 .
Grouping by product

Table 5 .
Number of transactions per time

Table 6 .
Optimal sum of each objective function with problem solving