A prediction model for benefitting e-commerce through usage of regional data: A new framework

Received Feb 25, 2021 Revised Sep 6, 2021 Accepted Sep 28, 2021 Today during ‘Covid-20’, people are more inclined towards online shopping. In general practice, analysis of browsing history and customer’s micro behaviour against online shopping habits have been used for future suggestions. Due to this, the predictions made were suffereing from oversimilarity problem and the user was unable to find any novelty in the recommended items. Observing these issues, e-shopping quality can be enhanced by adding a factor other than similarity. The current research suggests and advertise those products which belongs to a person’s region. For this research work the data has been collected on the basis of area-wise, like, country-based seggregation. Here the considered dataset belongs to country, ‘India’, its culture, its handicraft and its citizens. Datasets and their combinations based on multiple attributes are input for the proposed predictive system. In this paper, existing data is also considered for collecting customers demographic details which is further mapped with the area-wise dataset. Also, a framework has been proposed which uses database and user query as input for its predictive system in order to generate default suggestions for the user other than the submitted query also.


INTRODUCTION
The Consumer and their purchase style have been changed with time. As the web usage rate has increased, consumer is preferring online shopping which has many known reasons like, time-saving, brand visibility, offers, messaging, personalise recommendations and many more. The business personnel are also working parallelly to compete with the customer choices and the recommendations made by the system working in backend. It is the responsibility of businessmen to increase choices for consumer other than browsing history-based recommendations. Doan [1] has suggested some guidelines to improve online purchase intension like, value social influence, user-friendly website, and precise processing speed.
The business threats are for both the retailers and pure e-commerce dealers. There can be many reasons behind it. Dahiya [2] has briefed few reasons for e-commerce dealers like, customer trust factor regarding misuse of his personal information, customer feels uneasy initially while selecting any product without looking and touching product. Every website is not appealing for every customer due to type of his taste. For example, in case of clothing, there is variation in choices of different customers as in, what one prefers to carry. The same is the case with the person who is hosting a website and recommending products to the customers. It is quite tedious to come up with the right results as per the customer's choice. The sale

LITERATURE REVIEW
E-commerce industries are highly popular and they are proving to be a part of lifestyle. In metro cities, they are used by maximum people due to shortage of time and easy availability-based service. But in the current research, the concern is for rural areas business and people who are not handy with e-commerce industry. Kalambe [7], has discussed about challenges faced by urban and rural areas in India in context to ecommerce. Companies must try to reach rural areas also not only for product sales but also for utilizing the rural skills of businessmen. Mukherjee [8] has mentioned in an article about suicides by weavers of Andhra Pradesh due to less business income as the handicraft products are not reaching to the customers. Such scenarios are mandating the requirement of e-business by villagers. Kshetri [9] has analysed the factors behind rural-urban differences due to e-commerce. Wright [10] has discussed about Anglo-Indians who were left behind in rural areas and are facing problem in maintaining their culture and identity. So, this is true that developing countries face issues with e-commerce for rural areas people. Research in this direction may help people and industry both.
Customer reviews are very significant in analysing the online business. These ratings help in further analysing the products to be purchased by the customer. Jain and Kumar [11] identified the characteristics of customer reviews to help businessmen in improving marketing strategies. He applied analytics on social media data to identify demographic, psychographic and lifestyle information about customer. Obal and Lv [12] promoted banner ads with high visual complexity and calculated effective cost per activity using predictive modelling methods. Zhang et al. [13] discussed about online product recommendations (OPR) where a model has been proposed to find out positive and negative factors of OPR. Xu [14] analysed the effect of online customer reviews and the parameters which justify satisfaction. Huang et al. [15] proposed a graph model for e-commerce recommender systems. All such researches clarify that predictions can be done by using customer's information and their liking or disliking for the product. This is one of the research gaps where it is observed that product recommendation is done only using user feedback which creates over similarity in recommendations due to which no variety is seen by the user in the recommended items.
Singhal et al. [16] discussed Machine learning techniques which deal with various applications today belonging to different domains. These techniques help in data extraction. Castelli [17] discussed about knowledge extraction from customer's reviews using amazon.com database. He has discussed about prediction of successfully available products on amazon. Eslami et al. [18] analysed a fact that lengthy reviews give most helpful idea about products and reviews. Chou and Chuang [19] did feature extraction in online reservation system for restaurant where predictive modelling has been applied. Liébana-Cabanillas and Lara-Rubio [20] explored mobile payment system from merchant's point of view which indicated that mpayment behaviour is not applicable globally. Ge [21] developed a distributed prediction modelling ISSN: 2252-8938 A prediction model for benefitting e-commerce through usage of… (Shefali Singhal) 1011 framework for plant-wide process of industry where reasons for low performance were diagnosed. Also, TE benchmark process was used to evaluate the proposed model performance.
Chin et al. [22] compared traditional partial least squares structural equation modelling (PLS-SEM) with other methods like PLSpredict, CVPAT and model selection criteria (i.e. Bayesian information criterion (BIC), BIC weight, geweke-meese criterion (GM), GM weight, HQ and HQC) where low performance of traditional method has been found. Min et al. [23] used predictive modelling in a case study on chronic obstructive pulmonary disease (COPD) where prediction performance was analysed with respect to parameters like knowledge driven features, data driven features and one-year history of patient before discharge. Kim et al. [24] suggested benefits and challenges in multi-omics predictive analytics. Also, classification of omics was done. Singhal et al. [25] analysed classification algorithms which are suitable for e-commerce domain. Malhotra et al. [26] reviewed search-based techniques for predictions and effort estimations while using predictive modelling. Wazurkar et al. [27] suggested a decision-making process to handle large size dataset with minimum errors and failures using predictive analysis. Tax et al. [28] found in his research that long-short term memory (LSTM) neural network performs better while predicting about next event of current activity along with its time duration. Metzger et al. [29] used ensemble prediction techniques and applied on an industry dataset. It was found that there was a positive effect on cost estimation for proactive business process adaptation. Kang et al. [30] developed a predictive model for environmentally sustainable product purchases. This model showed a better prediction as compared to the theories applied earlier. For current purchases, he found that personal norms are better predictors for those who take risks. Mehdizadeh et al. [31] reviewed descriptive and predictive modelling techniques and their multiple applications with respect to road safety.
Busari et al. [32] developed a predictive model for the trip pattern for low density area. The study found that a car ownership highly increases the number of trips on daily basis where transportation planning and proper infrastructure may add on to daily trips. Yang et al. [33] did a survey on social networks using predictive analytics where he identified few basic tasks which can be performed by using social networks. Like, link prediction, group formation, decision support system, risk analysis and planning. They have also studied about applications based on social network which predicts those persons who are likely to follow some other persons. Francescomarino et al. [34] developed combination of techniques, recurrent neural network and long-short term memory cells which resulted into a better performance while predicting about future activities of ongoing current activity. Harl et al. [35] used gated graph neural network for decision making. He analysed that different activities which are included in the process, effected the prediction. Singhal et al. [36] proposed a logical structure to improve presentation of a web page, making it more userfriendly to promote e-business. Musso et al.
[37] used machine learning predictive models to predict low and high levels of performances in subjects in educational systems. Márquez-Chamorro et al. [38] summarises about basic concepts of predictive monitoring area in business processes.
Overall, it has been observed that predictive modelling plays a vital role in business-oriented applications and also in decision support systems. It is a way to overcome unwanted delays and flaws in system along with cost optimization. At the same time, one more research gap is observed that the till now predictive systems were developed where clustering of users were made but in current research clustering of products is done to make recommendations.

RESEARCH METHODOLOGY 3.1. Data-set preparation phase
In earlier researches, datasets based on online customer reviews has been already prepared and collected by researchers before like Hou et al. [5] analysed online review data collected from Amazon.com which is ideally a large volume of data. But here two things are different. Firstly, customer ratings are not considered which has been collected as per browsing history or any other feedback mode. More precisely, sentiment analysis is not the part of research as it has no novelty. Secondly, the dataset is specifically for India and Indian citizens.

Data model
The proposed data model is inspired by demographic details and other social details of a citizen. There are two different databases which has been considered here: i) region-wise data, ii) existing user data, and iii) user-product mapping − Region-wise data India is a culture rich country where every different region is identified by its culture and practices. As discussed in introduction, there are unlimited domains which define skillset of Indians belonging to different regions. These domains can be varied in the sense that they are promoting different business sectors Every village, city and town belonging to a state covers a different form of products. These products are developed by the people who are belonging to that place and they are trained for making those items. It is an art of such skilled persons which they have learned from their ancestors. But unfortunately, more often these skilled persons are not that much educated overtime. Due to low level of literacy, they could not match themselves with the fast pace of online business applications and their practice [37]. Hence, they are lagging behind financially. In rural areas, there are skilled persons but they face unemployment and poverty. One could not reach them to help them financially but yes usage of their skill-based products can be improved by some rate gradually. Online customers are willing to get such products which they have seen in their houses since childhood, but they could not find them easily during online shopping. These observations can be formulated in the form of a record which can be used to create a product and region-based database.
Considering the above scenario, the region-wise database has been prepared. The data has been collected from various authorised resources. Indian maps have been also considered while data was collected [39]. These maps indicated the products in symbolic form for all the states. There are 28 states in India as per the year 2021. The database has been created for all the states. While preparing state-wise database, many categories of products like wood work, glass beads, zari, metal work and many more were observed. So, domains have been categorised and then included in the database. In Figure 1, a snapshot of the created database has been displayed. Here data for few states have been mentioned to give a glimpse. This database will help once we identify the region of a user. Existing user data On the globe, everywhere there is a system of record keeping about people. This record can have different basis. Few records are containing a person's personal details, few may contain his official details, few may contain his citizenship details and many other different types of details. Such type of data is considered as raw data. It is a general practice that raw data is refined before using anywhere like, in any application. Here comes the usage of data mining. Data mining is used to extract data from an existing data as per requirement. One can say that by using data mining, an informed data can be created.
In DBMS, there is a concept of relational database. For a particular requirement there can be multiple tables, containing different attributes among which few may be common. There is also a command of joins which helps in joining multiple tables to develop a single table containing combination of maximum required attributes. Using all these techniques one can create a database of user's details which may support in online business. Existing user database has been created for the current research. The citizen details which are already existing for a user (belonging to a country) can be utilized here for the current model. This database has been created from various datasets which are collected from the authorized sources. Multiple datasets mean here that for example, Amazon online shopping site customers dataset is available in many ISSN: 2252-8938 A prediction model for benefitting e-commerce through usage of… (Shefali Singhal) 1013 forms for the same Indian customer. The difference may be in number of attributes, type of attributes or data values.
In Table 1 a sample table has been shown for the existing user data. This table contains User id, Gender, Age, State, District, Tehsil, Occupation and Pincode as attributes. User id is an attribute which is generally used in databases for identifying a customer uniquely. While joining multiple tables containing same user data this user id is used as primary key. State, District, Tehsil and Pincode are attributes which compile address details through which region in India can be matched. The attribute 'occupation' helps in creating further combinations of customers for clustering. − User-product mapping Whenever E-commerce based databases are observed, always there is some type of mapping. Similarly, in the current research mapping has been done. Mapping means using different attributes of different tables and making a matrix structure. Here users and products have been considered as the attributes which are cross verified and thus values are generated. Table 2 is a sample table where it is observed that values are either 0 or 1. So, if a user has selected or purchased that product earlier then the value is 1 otherwise it is 0. From here similarity can be calculated among users and thus similar users can be assigned to the specific group or cluster of users. Thus, the mapping plays a very significant role in providing a knowledgeful input to the proposed predictive system framework.

Proposed framework for the predictive system
Earlier, when business was totally based on manual procedures and calculations then it was very typical to analyse and keep the updates of one's business state. The major drawback was that even after much hard work, it was very tough to do analysis and make any assured decision related to business for future. Estimating any business performance is another issue which is not very easy to resolve because there are various parameters for performance analysis. Based on a particular factor, the performance may vary for the same business if checked with respect to another parameter. All such issues can be resolved with the usage of technologies. As per the state of art, in an online business, or more precisely, E-commerce sector, maximum tasks are sequenced and executed on the basis of some technologies like predictive system, business analytics, supply chain management, big data analytics and many more. The current research involves the concept of predictive modelling.
As per Heo et al. [40], Predictive modelling is a technique which uses current and historical data to make predictions about the most probable future results. This clarifies that data input is required for a predictive system. Proceeding in the same direction, Data sets have been created in the current research. The details have been discussed above. In Figure 2, at the top of the framework, there is a block named "User". From 'user' block, in right direction, there is a database connected with name 'Existing user data'. This is the database which already exists for every citizen of a country. How and from where this database has been gathered, has already been discussed above. Moving further, there are blocks which represents Data Preprocessing phase in sequence. Let's first discuss what is actually data pre-processing. Alexandropoulos et al. [41] has discussed about data pre-processing in detail. However, in current research, few required phases have been considered. For solving any prediction issue, it is required to find out the parameters and attributes from the databases which are mandatory inputs. This is done through data extraction. Vyas et al. [42] discussed ETL process. On the existing data set, first of all data extraction has been performed. Data extraction is generally the first step towards ETL, that is, extraction-transformation-loading. Data extraction simply means to fetch out or segregate those attributes along with their values which are actually needed for the data analysis part. Most of the times, data is very poorly organised. Also, data is collected from various sources due to which there can be different types of data, like, continuous data, batch data, and many more. All such shortcomings are considered during data extraction as, it combines the heterogeneous data. Also, it identifies the relevant data out of bulk data and removes duplicate data also. Overall, it refines the data for business intelligence. Hence, the resulting data is a mined data. In Figure 2, after data extraction, the resulting data is named as 'extracted user data'. This data is a relevant database on which further processes can be applied completely. After extracting data, transformation is required. Vyas et al. [42] and Runtuwene et al. [43] has discussed about transformation process. Transformation cleanse and aggregate data to give it a form such that it can be used for analysis. Cleaning means applying various operations which are customized as per the condition of extracted data. Operations may apply things like: − Providing values to blank places, like, replacing NULL values by zero. − Converting required text with respect to data type, units, date, time and many more. − Apply data integrity rules to remove issues like same data with different names and vice-versa. − Joining or splitting tables or table columns using key attributes. − Apply business-based functions or rules to generate new calculated values from the extracted data. In Figure 2, data transformation is applied on extracted user data, resulting into 'transformed data'. Transformed data is organized and formatted in improved way as compared to extracted data. It is also compatible with the application and system. Hence analytics can be applied on transformed data. This data needs to be loaded afterwards.
Data loading' is done after transforming data. Loading the transformed data into a warehouse, cloud or some other storage system is done in this phase. This helps in streamlining the data for future requirements. Higher efficiency and flexibility play a very important role as data is growing day by day. This is one of the major requirements of many business companies which are based on E-commerce. In Figure 2, 'Loaded data' is received after applying data loading operation. Last but not the least, loaded data passes through the final phase of verification.
Data verification' is an operation which is applied on the loaded data. Already data transformation stage has been crossed but still data verification is required for validating the data. Data validation ensures high accuracy in results. During data verification few things are cross checked like, data content, format of data, data duplicity and many other such type of database issues. Hence, after data verification, a verified user data, i.e., 'Knowledge base' is received as represented in Figure 2.
As discussed earlier in database section, there is a 'region-wise database' which has been collected after performing 'region-wise survey' as shown in Figure 2. Both the region-wise database and existing database are used to perform matching operation as seen in block, 'Matching with regional database' in Figure 2. Matching is done on the basis of user's region contained in knowledge base. When the region values collected from both the regional database and knowledgebase matches then the user profile can be prepared.
Region based user data' block is developed by receiving region values from both types of databases. Join operation on tables is applied to perform matching and segregating a single database per user. This part is very important for the current research as it is the key element which adds to the novelty regarding E-commerce sector where region-based products concept has been proposed. Using this data, 'Individual user profile' is created as shown in Figure 2. This user profile helps in identifying user's on internet. For any E-commerce website, user is as important as products in context to sale. Hence tracking user is a mandatory task in online business. Once user profile has been created, its required to use it as input for further processing. In the proposed work a predictive system has been framed. Here the user details can be used as input values. A predictive system always provides some probable outcomes which can relate and effect future decisions. In this research a predictive system has been proposed as shown in Figure 2.
Predictive system takes user query as input where the query may contain information about product or type of product. The internal structure of proposed predictive system contains mainly three modules: 'tokenize and feature extraction', 'similarity evaluation' and 'similarity detection'. First of all, 'tokenize and feature extraction' module takes the user query as input. This module performs tokenization. Tokenization means bifurcating the input text into set of meaningful words. These partitioned words are called as tokens. These tokens are further user used for feature extraction. Feature extraction means reducing the existing dataset or word content in such a way that the resulting text set is able to summarize the maximum information. Thus, after feature extraction the data is in a state using which further predictions are possible. For example, using feature extraction one can predict whether the fabric of a cloth is cotton or not. Hence, this module is helpful in adding meaning to the user query.
After feature extraction, similarity evaluation is done. For finding similarity, similarity evaluation module considers user profile, knowledge-base and output of previous module (tokenize and feature extraction) as input values as shown in Figure 2. Similarity check operation is performed here to find the similarity in product choices by user as compared to other users. Also, region-wise similarity is checked to match current user with the other similar users. This module plays a significant role in research as prediction results are based on it. Finally, the results of similarity evaluation module are used. In similarity detection module, condition is checked whether similarity exists or not. The percentage of similarity also matters in this case as its never an ideal state as in when one can predict with hundred percent yes or no. The percentage decides about similarity existence. For example, if a person asks about chicken kurta fabric then there is high probability for liking towards other chicken fabric-based suits. Thus, the prediction system may predict some optimized results if similarity exists. But if similarity percentage is very low then in that case the model will itself suggest some other plan on the basis of similar user clusters as shown in Figure 2.

CONCLUSION
Currently world is facing problem of COVID-19 to its maximum extent. As people are not in a situation to visit crowded places, they are meant to remain in their houses. In this situation they are trying to purchase things online. The current research work emphasizes on rural area based skilled businessmen who are having very little finance. They are not much literate and due to E-commerce industry; they are facing issues of low sale and low income. In this situation they can join hands with the E-commerce business holders and supply their products or they can themselves start online sale. To support these people a framework has been proposed here which emphasises on increase in regional product sale. This model considers region of a person or customer as input and then predict the products to be advertised on a person's screen. This will promote the regional business to some extent and help the poor regional businessmen to sell their product. Also, customer satisfaction improves as the customer is able to find and select those products which belongs to their region. This is however going to be very fruitful among Indians as every religion is full of various festivals during which they pursue the ancestral practice of using things which are produced in their area.

FUTURE SCOPE
The current research work generates many parameters and factors for further research. Researchers can work in several other directions making this research work as baseline. It has been observed that few skills are common among more than one state in India, like, silk weaving is done in Telangana, Tamilnadu, Maharashtra, Gujarat, and Assam. This combination can be used in future for further clustering. Another factor is that as the proposed database is created for India, however, such type of database can be collected for any other country also and promote E-commerce industries. One more direction is that using this data, 'Individual user profile' is created as shown in Figure 2. This user profile is playing role in identifying user's on internet. So, this data can be used further for different types of customer-based clustering. Another suggestion is that many other parameters can be added to find similarity from the databases. Seeing so many suggestions, researchers can perform well in this area.