Human behavior scoring in credit card fraud detection

Received Dec 19, 2020 Revised May 6, 2021 Accepted May 21, 2021 Now days, the analysis of the behavior of cardholders is one of the important fields in electronic payment. This kind of analysis helps to extract behavioral and transaction profile patterns that can help financial systems to better protect their customers. In this paper, we propose an intelligent machine learning (ML) system for rules generation. It is based on a hybrid approach using rough set theory for feature selection, fuzzy logic and association rules for rules generation. A score function is defined and computed for each transaction based on the number of rules, that make this transaction suspicious. This score is kind of risk factor used to measure the level of awareness of the transaction and to improve a card fraud detection system in general. The behavior analysis level is a part of a whole financial fraud detection system where it is combined to intelligent classification to improve the fraud detection. In this work, we also propose an implementation of this system integrating the behavioral layer. The system results obtained are very convincing and the consumed time by our system, per transaction was 6 ms, which prove that our system is able to handle real time process.


INTRODUCTION
In the virtual world, like that of banking transactions, knowing the user of a card, return to an authentication, a code or a phone number combined. However, each person has habits, preferences or even limits in his use of the credit card. For this, several researches are focused on the study of the behavior of the client or consumer to establish a known profile. In the field of fraud detection, the use of machine learning techniques (ML) is attractive for many reasons. First, they allow the discovery of patterns in large data streams, i.e. transactions arrive as a continuous stream and each transaction is defined by many variables. Second, fraudulent transactions are often correlated both in time and in space. For example, scammers usually attempt to commit fraud in the same store with cards within a short period of time. Third, machinelearning techniques can be used to detect and model existing fraudulent strategies and identify new strategies associated with cardholder behaviors.
In credit card fraud detection system (CCFD), it is important in the analysis of a transaction to compute its risk factor in order to know which kind of analysis to carry out, whether deep or light. In a previous work we proposed an architecture of a credit card fraud detection system and we proposed a multilevel strategy for transaction classification [1]. Notably, we proved the performance of the support vector machine (SVM) and bidirectional GRU (BiGRU) [2], [3] models at the classification level. Also the problems of unbalanced data were raised and dealt with in another work. The scope of this work of scoring result of a rough-set and associative classification rules in [15], we aim to combine rough set, fuzzy and association rules, for behavior scoring of credit card transactions. Based on generated rules, our system for credit card fraud detection CCFD, will assign a score for transactions before classifying them as fraudulent or not. Neuro-fuzzy expert system Generated - [6] Fuzzy-iterative dichotomiser 3 (ID3) -89.00% [7] Confidently Particular bank 97.58% [8] behavior-cluster based imbalanced classification method Financial institution and 18 UCI data sets 98.50% [9] Fuzzy Association rules Retail companies - [10] Life tables European financial services data - [11] Multiple linear regression Particular bank - [12] ontology + Fuzzification of the ontology + Fuzzy reasoning http://www.duslab.de/cosdeo/ - [13] Fuzzy cognitive maps + PSO Particular data - [14] ARJDKM: association rules (AR) + Jaccard distance (ARJD). + K-mode clustering (KM) Tencent advertising algorithm competition - [15] a rough-set + associative classification rules KDD-96 UC-Census dataset 83.91% [16] Rare Association Rules Mining Particular data - [17] NAR-Miner: Negative association rules private data -

BACKGROUND
In this section, we give a brief introduction of different proposed techniques in our approach for the credit card fraud detection based on cardholder's behavior. We emphasize, that our approach operates in the continuous learning approach to discover a new fraud pattern.

Credit card fraud detection system
This work is part of a project to build a credit card fraud detection system [1]. This system is structured in four levels: -authentication level: which executes all system controls and create a profile for the incoming transaction and he cardholder; -behavioral level: that computes the risk factor, which is the scope of this paper; -smart level: that classifies the transaction either with SVM or BiGRU models based on transaction risk. In addition to a transverse background processing level, to ensure the updating and guarantee the evolution of the system. In our previous work [2], [3] we have shown the efficiency of the proposed SVM and BGRU models for the classification of transactions. By analyzing behavior, we aim to improve the performance of our system by incorporating a co-behavioral layer integrating business expertise and the power of machine learning, which will first make it possible to assess the severity of transactions before their classification.

Association rules
By definition, association rules are defined on transaction sets. Given that it is more common to work with tuples rather than transactions in a database, various solutions have been proposed to this problem. When working with relational databases, it is usual is to consider each item to be a pair of (attribute, value) and each transaction to be a tuple in a table. An association rule, as introduced by Agrawal et al. [18], is said to be an ''implication" of the form A => C denoting the presence of item sets A and C in some of the T transactions, assuming that A, C ⸦ I, A ∩ C =Ø; and A, C ≠ Ø. This is for a given an item set I, and a transaction set T, where each transaction is a subset of I. crisp value ( ) = 1 means that e is 100% a member of A and ( ) = 0 means that e is 100% not a member of A, and in case of fuzzy logic 0 ≤ ( )≤ 1 which means that is partially member of A. Hence, as the membership values goes closer to 1, the intensity of membership of in A becomes stronger.

Rough set theory
Rough set theory is a new mathematical approach to imperfect knowledge, proposed by Pawlak [20], [21] presents yet another attempt at this problem. Rough assemblies have been proposed for a very wide variety of applications. In particular, the rough set approach appears to be important for artificial intelligence and cognitive science, especially in machine learning, knowledge discovery, data mining, expert systems, rough reasoning, and pattern recognition. The concept of rough set can be defined by means of topological, interior, and closing operations, called approximations.
Let X be a subset of U, i.e. X ⊆ U. Our goal is to characterize the set X with respect to R. To do this, we need some additional notation and some basic concepts of rough set theory, which presented below. By R(x), we denote the equivalence class of R determined by the element x. The indistinguishable relation R describes-in a sense-our ignorance of the universe U. The equivalence classes of the relation R, called granules, represent an elementary portion of knowledge that we are able to perceive thanks to R. In using only, the indistinguishable relation, in general, we are not able to observe individual objects from U but only the accessible granules of knowledge described by this relation. The definitions of set approximations presented above can be expressed in terms of granules of knowledge is being as. The lower approximation of a set is the union of all the granules that are fully included in the set; the upper approximation-is the union of all the granules which have a non-empty intersection with the set; the limit region of a set is the difference between the upper and lower approximation of the set [22].

PROPOSED APPROACH
In this paper, our main goal to propose an approach for behavior scoring for credit card fraud detection. The principle of scoring is to propose an evaluation of the risk of the transaction. Through our state of art [23], we noticed that fuzzy association rules are more suitable for the behavior layer than other techniques. That was confirmed in our pervious study and argute our choice [24]. The use of rough set theory model with fuzzy association rules technique will be a plus, since in other contexts it improves detection rate [15]. Figure 1 describes the follow of the processing for the rule's generation. First, we use feature engineer to complete the cardholder profile information, like the frequency of purchase the timing, and the merchant type. Second, we apply a feature selection with rough set theory, to select the best and more significant feature for rules generation. Then, we applied fuzzy logic to have a fuzzification of our chosen dataset. After this, we used association rules algorithm to generate rules and store them in rules database. For this purpose, we choose Apriori algorithm. The last component, it is a rules scoring function, which is described above, in the next paragraph.

Behaviour scoring
To ensure a good behavior scoring we analyze the user profile. The feature engineer will define the client profile through his card transaction habits. Therefore, for each client we have the information of frequency of transaction by type, time range of purchases, number of transactions and the usual inter- transaction time gap. All of this information will be extracted from system database and stored in a duplicate database, to be used in our behavior analysis. The goal is to check if the user's profile is compatible with the behavior rules already stored in the rules database. For example, if the user has never been abroad and we receive a transaction from an automatic terminal machine (ATM) in foreign country, perhaps with an amount not expected. We will check the rules of our database and label this transaction as suspicious. We, note that the stored rules concerns the suspicious transaction behave. For each incoming transaction, we will check all stored rules, and a counter incremented for every respected rule, that mean suspected transaction, so the expression of score is: Score=number of rules respected/number of all rules If the score is equal 0, that mean the risk of the transaction behavior is null, but if the score is reaches 1, that mean this transaction behave have a high risk to be fraudulent.

EXPERIMENTS 5.1. Dataset
This study is based on a generated dataset, composed of 60.000 transactions across 12 attributes, as decribed in Table 2. The attributes include transaction and cardholder information. Table 3 shows the distribution of legitimate and fraudulent transactions of our chosen dataset of kaggle. To construct this dataset, we try to have a randomly 200 transactions with two transactions status, genius and fraudulent, and data susceptible to be fraudulent transactions. The rest of dataset was generated with only the legitimate status.

Method
First, we will pre-process our dataset to be able to generate the association rules, we will start with the selection of the attributes, with rough set theory, that will help define the client's profile and emerge the mining rules. Then, a data fuzzification step is done, to make place for the Apriori algorithm to generate the rules. The chosen features for this study, which selected by rough, a set selector, are:  Transaction channel: ATM, E-commerce, POS.  Transaction type: National, International, E-commerce.  Time range purchase: weekend, evening, holiday, other.
For the fuzzification, we will have a dataset with eight variables instead of only three; therefore, the value will be 0 or 1: automatic teller machine (ATM), point of sale (POS), electronic commerce (Ecommerce), national, international, weekend, evening and holiday. Thus, our dataset is built and ready to be used, for the generation of association rules with Apriori, and stored in rules dataset. For each incoming transaction, the behaviour scoring have the responsibility to check stored rules, and return a behaviour score. We remind that the background processing, using database view, which generates these rules. A counter incremented for every missed rule, so the expression of score is: Score=number of rules not respected/number of all rules

703
If the score is equal 0, that mean the risk of the transaction behave is null, but if the score is equal 1 or approaching 1, that mean this transaction behave have a high risk to be fraudulent. After calculating the behaviour score, the transaction goes to a prediction function to decide if it considered as fraudulent or genuine one.

Results
In this part we will present, the finding of the proposed approach. We consider transaction from our dataset, the feature selection was done in data preprocessing step, which is guaranteed by the background level, when we constructed the dataset, we will calculate the score and the risk for giving parameters, for simulation, and we will pass our transaction into classifying algorithm based on these rules. The rules will be generated as described before by the back-processing part. The Table 4 presents the results obtained by applying our selected approach, which is a hybrid solution that combined the three well-known methods; rough set, fuzzy logic and association rules and comparison with others approaches from baseline. The given results is about different implementation of association rules: efficient-apriori application programming interface (API); an efficient pure Python implementation of the Apriori algorithm, machine learning extensions (MLxtend) API which is a Python library of useful tools for the day-to-day data science tasks, and pyfpgrouth API; a Python implementation of the frequent pattern growth algorithm. As we can see, our approach outperforms other methods in number of generated rules, and the detection rate. We notify that the number of rules is not important and the low detection rate. This is due to the fact that the nature of the fraudulent transaction which are few compared to no fraudulent ones. The fraudulent transactions rules are rare and the fraud detection dataset are always unbalanced data problems. That is why the generated rules are not quite appropriate for the subset of the tests, but in a standard environment, this concern will be resolved because of the volume of the data and their resemblance.

CCFD FRAMEWORK IMPLEMENTATION
In this section, we propose a global evaluation of our CCFD by integrating the different levels and in particular the behavioural level as shown in Figure 2. Note that the proposed system is updated, and able to stand on real time world of credit card fraud detection. This is insured by a complementary background processing. This process discovers new rules of associations, emerged from new data when updating the database from the financial system database. In addition, it is responsible of pre-processing data before generation of rules (balancing data/feature selection), updating latest status of previous treated transactions. These entire tasks are periodically done. We can see that; this processing is a part of our approach for cardholder's behaviour analysis for credit card fraud detection. By analysing the user's profile rules stored in database to check the behaviour of this user and report any derivation of normal habits.

Dataset experiment process
In this experience, first the background process balances the data, split it for across validation and train our too models (SVM, BGRU), this layer also generates new rules and store them in the rules database. Secondly, the authentication layer constructs the transaction and cardholder profile for each transaction in the data set. Then, the behavior layer checks the rules for the cardholder and calculate a score based on stored rules. Finally, the smart layer calculates the risk given by the transaction profile and make the decision of which model to use. This decision is the sum of score behavior and transaction risk. This test was applied on a generated dataset, composed of 14924 transaction across 13 features. The lake of clear real data pushes us to use a generated dataset. Our previous work [2], [3] gives result on the well-known dataset of kaggle [25], but for behavior part, we have to get a clear data to analyze each cardholder profile.

Performance metric
For performance metrics, several commonly used classification performance measures based on the confusion matrix are employed in this paper to evaluate the performance of fraud detection architecture: The risk equation, as describe in our previous work [2], using the logistic model: To synthetize our finding, we display in Figure 3 (see in appendix) the number of transactions classified by BiGRU or SVM, the consumed time for whole transaction treatment, and the performance metrics obtained by classification report function. By using transaction score, our system has classified 27.25% of transaction by SVM and 82.75% by BiGRU. The time consumed by the smart layer to treat 14924 transactions was 90.10s, as described in Table 5 of classification report, an average of 6ms per transaction, which is quite good. These results prove that our system respects the real-time processing, and the background processing in the fourth level of our CCFD system guarantees its adaptability. In addition, the system's performances are very promising; this on our generated dataset but the results obtained on our recent work [2] showed that, BiGRU deep neural network classifier had very promising results with an accuracy rate of 97.16%. Moreover, that on a standard dataset from Kaggle with real transactions is even better and the results exceed all those reported in previous works.

CONCLUSION
Fraud analysis is of critical importance in the banking industry and the biggest challenge remains the cost of fraud, whether to analyze it, detect it or prevent it. Since transactions take place in realtime, require a process that consumes little time and is as efficient as the size and infrastructure of the financial institute that adapts it. In this paper, we presented our behavioral analysis to credit card fraud detection based on a hybrid methods using Apriori, rough set and fuzzy techniques that gave a prominsing results. The comparative study proved that our approach is the best combinaison to generate rules in a context where fraud remains low compared to legitimate transactions. We also proposed an implementation of the whole CCFD system and gave results of transaction classification based on the score given by behavior layer. Even if the classification results do not reach the results obtained with SVM and BiGRU in our last paper, we consider that the behavior layer can improve the financial fraud detection system, not only by generating rules but also we can benefit from human expertise to integrate a new rules in this layer. We also proved that our system is able to work in real time; the average time consumed per transaction was 6 ms, which is a very satisfying running time. In our future work, we will focus on the impact of the dataset quality on classification and improve the confidence of rules generated to improve the performance of the whole CCFD system. Figure 3. CCFD processing flow