Empowering anomaly detection algorithm: a review

ABSTRACT


INTRODUCTION
Technological advancement and the Internet can significantly affect human activities [1].As such, the sustainability of the modern industrial era created an urban area requiring camera surveillance systems to secure the targeted public places [2], installing internet of things (IoT) sensors and using smart devices.Streaming data produced daily [1] are stored as big data [3].Since big data consists of volume, variety, and velocity, data have to be autonomously processed for information and knowledge [4] to benefit the users.Most surveillance cameras or sensor data are classified as normal data behavior.While abnormal data in some situations could provide the user with some information to solve problems related to the case.
Detecting anomalies or unidentified events is crucial and tedious since big data gets too big.Hence, the extraction of wrong data produces faulty information.However, faster and more efficient data processing is needed for real-time data.Therefore, anomaly detection is significant in solving abnormal behaviors while streaming or in real-time data.Many researchers have begun expanding their research on inventing new algorithms for anomaly detection.For instance, Rettig et al. [5] created an online anomaly detection in big data, Costa et al. [6] created fault detection in a recursive way which is memory efficient, Bose et al. [7] detected anomalies using driving patterns, Dharmadhikari and Kolhe [8] used heterogeneous detectors of the anomaly using association rule, while Ali and Angelov [9] used heterogeneous data to detect the abnormality.
Due to time restrictions, algorithms that were proposed between the years 2010 and 2022 only were utilized in this study.Then, the suitable algorithms are selected randomly.Hence, many other studies on

METHODOLOGY
This section details the steps employed in conducting this review.These steps start with research questions which consist of several problems.Then, the keywords and literature are searched according to the needs of previous research questions.Finally, the knowledge from each literature is extracted and differentiated to understand the gap between the algorithms.Figure 1 illustrates the methodology.It consists of five essential steps.Each step is explained in the following subsections.

Research questions
Since there are many types of anomalies [10], it is important to draft research question/s to aid with the methodology.So, how can the anomalies be detected when it involves complicated anomalies?Machine learning algorithms are widely used in different scopes.Krammer [4] proposed an algorithm to detect abnormality in a communication platform.Hence, the first question would revolve around the types of algorithms used to detect anomalies.Considering several related algorithms, a key question to be considered would be the criteria used to detect the abnormality.Thus, the second question in this study considered the criteria required by an anomaly detector.The final question would be related to determining the best algorithm to detect all the anomalies in the data world.The differences between the algorithms must be identified to determine the best among the selected algorithms.Thus, the following research questions were drafted in this study, − Which algorithms were used to detect anomalies?− What are the criteria an anomaly detector needs?− What are the differences between the algorithm invented to detect anomalies?

Keyword and literature search
The databases used for this purpose include IEEE and Science Direct.Most of the research papers were retrieved from the IEEE database.Some journals provided more information than proceeding papers [13].The following two criteria were used to filter the selected papers in the database, − The algorithm was developed between 2010 and 2022.The algorithm must be new and unique.If the proposed algorithms were manipulated and differed from other invented anomaly detection algorithms, they will be considered in this study.

−
The research article only described a newly invented anomaly detection algorithm and did not describe the application or mechanism of previously developed anomaly detection algorithms.
Figure 2 shows the evolution of keywords.It shows the keywords used with their respective result, which consist of selected research papers.These keywords include anomaly detection, heterogenous detector, and automatic detection.

Figure 2. Evolution of keywords used
From the tree in Figure 2, ten papers can be considered using anomaly detection keywords.Research paper [c] [9] is not considered for this review since it does not introduce any new algorithm.However, manuscript title [c] is quite impressive (Anomalous behaviour detection based on heterogeneous data and data fusion) since the term heterogeneous data is used.Thus, the keyword was changed to "heterogeneous detector" to broaden the algorithm search.The paper [L] [8] is found using this keyword.In [c], the term automatic from reference is considered, which describes how to detect anomalous data without human intervention.Hence, from [c], the author uses [M] [14].Further search on automatic detection found [N] [15].
Several articles were removed from the list as they did not meet the requirement.Among the reasons for rejection were: i) no new algorithm found [7], [11], [13]; ii) use empirical data analysis [9], [16]; iii) use previous anomaly detection modal and algorithm [17], [18]; iv) use the previous algorithm to detect anomaly without manipulation [5], and; v) used recursive density estimation, which introduces before 2010 [3].Thus, only thirteen algorithms were selected to be included in this study.

Knowledge extraction and knowledge differentiation
The following two steps in this review ("Knowledge Extraction" and "Knowledge Differentiation") were based not only on the understanding of the proposed algorithms in each of the papers (Section 3) but also on explaining each of the identified uniqueness.The information (i.e., /e.g., assumptions, recursive mechanism, automation, learning type) that was extracted from the different algorithms (Section 4) was influential in differentiating the advantages and disadvantages of the algorithms (Section 5).Finally, the best algorithm was chosen as part of this review.

ANOMALY DETECTION ALGORITHMS 3.1. Incremental spatio-temporal learner (ISTL)
ISTL [2] is an algorithm used specifically for real-time surveillance cameras.Figure 3 represents the ISTL approach.Firstly, video surveillance is injected into the ISTL as input represented as normal behavior.Next, the trained model acts as an anomaly detector and localizes the current input stream data.Human experts As depicted in Figure 3, active learning was used to train the model with the user input continuously.The fuzzy aggregation model supports it to retain the stability of the iteration during learning.ISTL comprises a spatiotemporal autoencoder that will learn the motion of the video streams.The ISTL approach is finally able to detect anomalies with its respective localization.

Anomaly detection in online detection
This algorithm was used to detect anomalies in communication platforms by differentiating unwanted users in the platforms [4].The algorithm consists of three phases-multiple canopy clustering, cluster membership analysis, and classification model training.Anomaly is declared when the number of records in the cluster is less than the anomaly threshold.There are four types of thresholds used which are max_density which represents the maximum density a clustering can have, seed_count for limiting how many times a cluster can repeat, significant_ratio for limiting the amount of data in a cluster and finally anomaly_thresholds to declare anomaly.For example, if the parameter is less than this threshold, it will be declared an anomaly.If both max_density and seed_count were set at the maximum level, the method stability can be increased but requires more computational time.

Anomaly extraction using association rule
This algorithm was built to detect anomalies in network traffic; however, the architecture can also be manipulated to suit other domains [8].Anomaly extraction using association rules is performed to detect frequent patterns and create rules between them.The first step involves a pre-filter to determine suspicious flow, where it removes the maximum fraction of normal flow.Next, the association rule is employed through the Apriori algorithm.Finally, the heterogenous detector is used to identify the anomaly.

Eccentricity analysis
Typicality and eccentricity based on data analytics (TEDA) were built to solve a traditional statistical method that is unapplicable to apply in the real world [12].This algorithm was proposed [12] and published [19] to build an anomaly detector based on the TEDA mechanism, which can be used in any domain.Moreover, TEDA does not use any prior assumptions [12].Meanwhile, the first time assumption is realistic for the pure random process but not for the real-world process [12].Even though there is a rationale for using thresholds, it also contains many disadvantages.The disadvantages are described in section 5.
The author proposed nσ, which sometimes n represents 3 for anomaly detection, applicable in both TEDA and statistical analysis.In the traditional nσ method, the mean and average represent all other data samples.A σ gap appears when data eccentricity becomes higher, declaring the presence of an anomaly.While the "ε vicinity" defines anomaly in a stream where the noise is smaller and the anomaly forms abnormal points.The proposed gap accumulated proximities and analyzed the two pairs of suspected regions.Meanwhile, the traditional method only uses average proximity, not including the distance between the outlier data points.The minutiae of the method used to detect anomalies are: − Normalized eccentricity of the data point is calculated.

−
Otherwise, continue to check all the data, whether there are anomalies.− End.

Abnormal human events on train platforms
This algorithm is built to detect abnormal human events in train platforms using a surveillance camera [14].However, detecting anomalous behavior using surveillance cameras is challenging since this event is probable.All information on size and shape is used to classify different objects and events.The algorithm used to identify train status is.a. Estimate motion vectors in the train bed area.b.If the motion is parallel to the edge of the bed track, then.− Analyze the motion vector to estimate the speed.

−
Check if the speed is more than the threshold.

−
Update the train status: the train is arriving, the train is departing, the train is discharging passengers, and the train bed is clear.
The line differentiating platforms from the train bed is set manually.The mean of the background image vectors includes unwanted noise viewed by camera sensors and other sources.Hence, the background image is assumed to be more probable than the foreground pixel.
The displacement of vectors in the image indicates the train speed.A high-speed train yields a high displacement value, while a low-speed train yields a low displacement value, whereas a halted train has a zerodisplacement value.The detection of a moving object (anomaly) is turned on when no objects or trains are in the track bed.The entire video frame is utilized to determine any blob in the foreground area, which is assumed as an object.The alarm is raised when the detected object moves across the track bed, indicating a suspicious event.

Dangerous motion detector in human crowds
This is an algorithm to detect dangerous motions in crowded places [15].It is very crucial to detect abnormal events such as stampedes.The algorithm includes, a. Calculate the dense optical flow and the corresponding two-dimensional flow of the motion direction and magnitude histogram.b.Averaging the histogram over a short time interval.c.An increase in lateral oscillations denotes congestion when using the front view camera as follows: − The histogram acts as a congestion indicator, showing motion that goes to the left and right, which shows oscillations.

−
The histogram indicates a high degree of symmetry by marking the area as highly dense.

−
The low value of symmetry indicates the area is congested.d.The alarm will be triggered if the result is more than the threshold.e. Thresholds, θ, are calculated based on the targeted area's current condition, which is a data-driven threshold.Shock waves (i.e., anomaly) may occur during the congestion.A single people's movement will cause propagation to make others move in the region.The shock wave is dangerous because people will begin to follow it as they cannot control the movement causing some to fall and get crushed.The shock wave is defined as a sudden increase in the magnitude of optical flow.The standard deviation of direction involving the vicinity

Transfer deep learning for hyperspectral image
This algorithm is used in images and involves a convolutional neural network-based detector (CNND) that evaluates the same class for similarity and a different class for dissimilarity [20].The proposed detection involved three steps: i) learning a deep CNN in reference data; ii) measuring similarity between test data and train data; and iii) averaging the detection product as an outcome.Firstly, reference data is inserted with ground truth to generate differences (0-similarity, 1 dissimilarity) between pixels.An anomaly is declared if the output exceeds the threshold when comparing the detection output with the threshold set.

Improved RX with CNN framework
Reed-Xiaoli (RX) algorithm is used in the image and uses Mahala Nobis distance which considers the mean and covariance of the matrix [21].It will assume the background point as normal.Meanwhile, the subspace RX (SSRX) algorithm deletes the background subspace and applies a detector on the target subspace.Hence, higher results are chosen in the algorithm to detect the target.However, anomaly detection becomes harder in small targets due to complicated backgrounds.Also, the RX algorithm can improve its performance by increasing its peak value.The three steps involved in detecting an anomaly include: i) Learn using chosen images from airborne visible/infrared imaging spectrometer (AVIRIS) hyperspectral algorithm; ii) The text pixel between the surroundings is compared using CNN to generate an approximation score; and iii) The approximation score is added to the central pixel and transferred to the RX algorithm for improvement and detection.
Then, the algorithm will produce a training dataset.The following steps are needed to produce a training dataset: i) Existing categories will act as training sets that involve manual selection of background and target, for example, road and vehicle, respectively; ii) Background and target class samples are paired and subtracted.
An evaluation score between -1 and 1 is obtained.Indicating the degree of similarity between the background and target.The greater the evaluation score from 0, the pixel is closer to abnormal targets and vice versa.

Autonomous anomaly detection
The empirical data analysis (EDA) proposed by Angelov et al. [1] is an extended version of TEDA [22].Data in the real world is unknown and probably not labeled.EDA is based on observed data, and it accumulates properties without making any prior assumption.Based on EDA, autonomous anomaly detection (AAD) was created and can be used in any domain.Firstly, assuming { 1 ,  2 , . . .,   } ∈   where   is i th data sample followed by K number of data samples.In this data, there will be also same data value available possibly more than one denoted by ∃ ≠ |  =   .The unique dataset is { 1 ,  2 , . . .,   } ∈   and its frequencies are { 1 ,  2 , . . .,   } ∈   .The algorithm works.i.
iii.The product is ranked in ascending order {D MM (x)}.iv.The first half of the 1  2 of the smallest value from (ii) is selected and declared as potential anomalies {} 1  .
The n value is such as in eccentricity analysis, which is set to 3. v. D MM is less sensitive to local sparsity.Hence, for the less sensitive D MM , consider there are data {x1,x2,…xK}, and calculate the Euclidean distance between them using (7). .Data located inside this hypersphere are known as neighbors.vii.Consider the neighbors of ui.Therefore, the neighbors of ui are {}   .Accordingly, the unimodal value can be determined using (8), where    is the mean of {}   and    is the average scalar product.
viii.Next, unimodal density is weighted by its frequency using (9).The Ni in ( 9) represents the cardinality of the set {}   .Then, the unimodal product is then arranged in ascending order {D WL (x)}.The second smallest value selected among them is declared as the second potential anomaly detected.{} 2  .
(  ) = After that, the algorithm will determine whether the detected potential anomalies can form data clouds.This was done using the autonomous data partitioning (ADP) algorithm introduced by Gu et al. [23].In the final stage, the algorithm will confirm whether the potential anomaly is actual.The potential anomaly will be declared as an anomaly if the potential anomaly cannot form any data clouds.In (11), if the data clouds support is less than average support, then it is formed by the anomaly.In both equations, S represents support or number of members in a data cloud, and ci represents i th data cloud.

Hierarchical pattern matching
Hierarchical pattern matching (HPM) for anomaly detection uses a piecewise linear function to detect abnormal line fitting from streaming patterns [24].This algorithm used a recursive mechanism which is the same as isolation forest.It is memory efficient since it only stores unique data patterns once, avoiding data redundancy.Firstly, the algorithm extract pattern from streaming data.The size of the scrolling window is predefined before the algorithm is executed.For each window, the best fitting line is found by using a piecewise linear function.
Furthermore, after finding the best fitting line, the algorithm will go through much dipper to find the best fitting again by using the previous piecewise linear function.As a result, a hierarchical tree is formed.In the anomaly detection phase, assume that the streaming data is running.Then the algorithm chooses the time window which contains a specific amount of data from the time series event.The data is compared with the previous hierarchical tree.If a new pattern is found, the alarm is raised, which means an anomaly pattern is detected.

Multi-aspect data stream anomaly detection
Multi-aspect data stream anomaly detection (MDS_AD) solves issues in many state-of-the-art algorithms [25].For example, current anomaly detection algorithms overlook the relationship between attributes and the dynamic existence of data in a streaming environment.The salient issue is the current stateof-the-art algorithms do not fulfill multi-aspect requirements.Multi-aspect means each record has multi-type data, such as categorical and numerical.
Firstly, it used principal component analysis (PCA) to reduce dimensionality while preserving the correlations of each attribute.Then, it combines categorical and numerical data using one of the localitysensitive-hashing (LSH) functions called Record Hash.Record Hash can work in streaming data, enabling it to update the model faster.Then, the isolation forest algorithm is fetched, creating multiple trees.After model construction, the algorithm will receive online data.
The data will enter the algorithm along the time in the online anomaly detection phase.Firstly, the data entries will be reduced in their dimensionality using PCA.Then, the output from PCA will transverse along the modeled trees.Along the journey, the path length is calculated.The shorter the path length, the higher the anomaly score will be.The anomaly score is between 0 and 1.The anomaly score that is closer to one is After that, the model update is conducted.The model is updated after a certain amount of online data is stored.After the model is updated, the new model is used to detect anomalies in the online stage.The stored data used to update the model is emptied.Then, the process is repeated, making the model evolve along the online dynamic environment.

Anomaly detection using an array of sliding windows and PDDS
This algorithm runs unsupervised in the streaming environment [26].Firstly, the algorithm assumes the size of the sliding window, sub window, and a number of targets.Using the probability density function (PDF), probability density-based descriptors (PDD) are obtained for each sub window.Target is selected by partitioning the range between maximum and minimum windows.The midpoint of each interval is obtained.Using the midpoint, a set of PDD is obtained for each window.
Then, the distance between PDD is calculated.The more the distance, the more abnormal it is.Then, the expanded maximum distance is also calculated to determine how far the PDD of a sub window can go.If there are three PDDs, assume that there are fw1, fw2, and fw3.If the distance between fw1 and fw2 is more than the expanded maximum distance, and if the distance between fw2 and fw3 is less than or equal to average distance, the corresponding window will be declared an anomaly.Otherwise, it is normal.

Correlated anomaly detection from large streaming data
It is called correlated anomaly detection (CAD) [27].It detects correlated anomalies in which the data have a stronger correlation with each other.Meanwhile, normal data do not correlate with each other.There are two new algorithms and a framework.It solved principal component degeneration.Principle component degeneration is, for example, when normal data is more than anomaly data making the anomaly data hard to detect.There are two algorithms: randomized principal score (rPS), which detects suspicious anomalies, and generative principal score (gPS), which detects suspicious and core anomalies.
The principal score is denoted as ρ(X), where X is a sequence of large data.If ρ > ρ , there is a possibility of a correlation or anomalous data, which can trigger human attention.The threshold ρ ̃ that decreasing can cause a false positive.The threshold is said to be set to 0.7, which is the most ideal threshold.There are also additional thresholds used in robotic process automation (rPA) and grade point average (gPA).For instance, rPA used P to control sampling quantity and correlation sensitivity.The gPA uses α, which should not be far from ρ .
It assumes many types of assumptions to build the algorithms.For assumption one, the normal data entries are weakly correlated, and if ρ(X) is closer to one, then X contains anomalous data.Assumption two, there are many correlated normal data which should be anomalies but lower than the threshold ρ .Assumption three is when a significant quantity in the data vector is correlated, making it anomalies.
For example, botnet cases where a large quantity tries to enter the server to trigger a distributed denialof-service (DDoS) attack.The gPS is introduced to tackle assumption four, where anomalous behavior tries to camouflage.Assumptions four is each anomaly set has higher internal correlations than external correlations.The gPS algorithm can detect anomalous sets that are weakly correlated.It solved the rPS algorithm, where it can possibly raise a false alarm.Then rPS and gPS form a framework that accepts entries from large data streams.

ANOMALY DETECTION CRITERION
In this section, the criteria required by each algorithm are further explained to differentiate between thirteen algorithms.These six criteria are very important in any anomaly detection algorithm.In addition, these criteria will help to detect anomalies in streaming data since it is dynamic or unknown [12], [19].As a result, these criteria are believed to help researchers to invent the best anomaly detector.These criteria include: i) No assumptions; ii) Fast computational time; iii) Memory efficient; iv) Automation; v) Type of learning: Semi-supervised and Unsupervised; and vi) Ability to detect all types of anomalies in the data world.
Most of the assumptions in the traditional statistical method are impractical [1].The assumption for the first time is realistic for the pure random process but not for the real-world process [12].The assumptions widely used in artificial intelligence algorithms are also known as threshold values.For example, in deep learning, the linear perceptron is made of assumption, but the data labels are only approximated [28].Hence, assumptions will only provide approximated values and not the exact value.Therefore, for a particular problem that requires thresholds and parameters, especially in industrial applications, assumptions are not suitable [6].
On the other hand, the fast computational time is significant since the algorithm needs to act whenever an anomaly is detected in the data.Recursive storage and update enable the system to operate faster and keep Int J Artif Intell ISSN: 2252-8938  Empowering anomaly detection algorithm: a review (Muhammad Yunus Iqbal Basheer) 17 a large dataset suitable for online streaming data [6].Hence, whenever the algorithm uses recursive calculation, it is known to have these criteria [16].− Computational efficient.

−
Prevent vigorous usage of memory, which is not needed.

−
Reuse and update important information in Fast computational time.
In other words, the recursive calculation can also reduce memory usage, allowing better use of memory consumption [1].Memory efficiency is an important criterion for an anomaly detection algorithm.Streaming data involves incoming data that cannot be stored in the memory due to limited memory in the computer and simply processing the data since it comes in various forms [11].Automation can reduce human intervention in decision making.The urban area is transforming into autonomous machinery where human intervention is not required [2].
For instance, human expertise cannot detect all anomalies in a specific video stream [2].Since a lot of data arrives at every millisecond, autonomous anomaly detection can help reduce this dimensionality by focusing on small data only consisting of rare events compared to human expertise [9].It does not mean having no human intervention at all because every piece of machinery needs a human touch to decide.This aspect is related to the prevention of mistakes and making intelligent machinery.
To make an algorithm more intelligent, learning is required.There are three types of learning in artificial intelligence, namely supervised learning, semi-supervised learning, and unsupervised learning.Supervised learning is more accurate and effective compared to statistical methods when dealing with anomalous data [22].But, sometimes, in an anomalous world, data could be unknown or not labeled [22].So, to detect an anomaly, only semi-supervised and unsupervised learning can be used.It is because supervised learning, like classification, needs labeled data [16].The supervised and unsupervised method is usable in the semi-automated identification of potential threats [4].
As fully autonomous mentioned before, it does require any assumptions and training dataset [29].In short, it means that it does not need to learn anomalous data; instead, it only captures normal data patterns to differentiate them.Since abnormal data does not fit in normal data [11], it deviates from normal data to form suspicious abnormal data [8].Sometimes, normal data also contains anomalous data that is undetectable.Besides, the definition of normal behavior is currently hard to capture as this is one constraint to bring up anomaly detection algorithms [2], [3].When unexplainable anomaly data is found, the old normal situation becomes wholly different [10].Therefore, one cannot understand the types of anomalies without referring to the structure of the data [10].Hence, an anomaly detection algorithm developer needs to understand the data structure to ensure the algorithm's performance.

EVALUATION
All the thirteen algorithms reviewed in section 3 were evaluated using the criteria discussed in section 4. Identifying whether all the algorithms can detect all types of anomalies is difficult since the algorithm needs to be tested first.But it is believed that EDA [1], which was further upgraded into the anomaly detector [22] can detect all anomalies [12].Instead of the algorithm's speed and memory, the recursive calculation was used to differentiate algorithms as shown in Table 1.It is hard to evaluate speed and memory consumption in each algorithm since the authors did not mention them.While recursive storage and update enable a system to operate faster and store large datasets suitable for online streaming data [6].
Prior assumptions cannot be used to close the differentiation gap between all thirteen algorithms and to create a robust anomaly detection algorithm.Since the anomaly is unknown, one cannot simply assume or draw the line between anomaly and normal data.Besides that, supervised learning is inapplicable in this case.For example, human behavior is unknown and hard to predict, which sometimes changes according to their goal [30].Hence, in this case, semi-supervised is the best learning method.Furthermore, autonomous anomaly detection is better since it does not require human expertise, as it could ease human life and prevent human errors.
To know whether the algorithm is autonomous or not, the algorithm should not have any assumptions and training datasets [29].Based on Table 1, there are various ways of conducting this method to know whether the algorithm is automatic.Firstly, using the title.If the title contains the word automatic, it will be considered automatic.This includes autonomous anomaly detection [22], automatic detection of human events on train platforms [14], and automatic detection of dangerous motion behavior in human crowds [15].
Then, the algorithm can also be said as automatic if there is content in the introduced algorithm paper describing automatic.For example, ISTL [2] describes automating anomaly detection using deep learning.It uses spatial temporal learning with anomaly detection and localization.The transferred deep learning algorithm used CNND [20], which finds similarities and dissimilarities of images on its own to represent abnormality, making it an automatic algorithm.Eccentricity analysis [19] is entirely based on data and their distribution, with no user specific thresholds and no kernel require, which further studies make it autonomous fault detection.Meanwhile, the RX algorithm [21] produces its own training dataset, making it an algorithm that does not need human intervention to add more data.Then, [24], [27] uses no training data, and no human intervention is necessary during operation.

Assumption
Recursive mechanism Automatic Learning Type Autonomous anomaly detection [22]    US Eccentricity analysis [12]    US Anomaly detection in online detection [4]    SS Abnormal human events on train platforms [14]    US Incremental spatio-temporal learner [2]    SS Transfer deep learning for hyperspectral image [20]    SS Dangerous motion detector in human crowds [15]    US Anomaly extraction using association rule [8]    US Improved RX with CNN framework [21]    SS Hierarchical pattern matching [24]    US Multi-aspect data stream anomaly detection [25]    SS Anomaly detection using an array of sliding windows and PDDs [26]    US Correlated anomaly detection from large streaming data [27]    US *Legend: US; unsupervised, SS; semi-supervised

Results
In this subsection, the result based on Table 1 is further explained.Firstly, autonomous anomaly detection does not need any prior assumptions.It brought EDA characteristics which utilize recursive updates such as in mean, average scalar product, and data density calculation.It is an automatic algorithm and unsupervised algorithm which does not need training or labeled data to detect anomaly.Eccentricity analysis was also implemented in streaming data [31].It used a recursive update.Furthermore, it is automatic and uses unsupervised learning.Unfortunately, it needs assumptions or a threshold which, if the calculated normalized eccentricity is bigger than the calculated threshold, then it is an anomaly [31].
Anomaly detection in online detection was used in social chat with the aid of four thresholds.It does not utilize recursive updates and is not automatic.At the same time, it will label data detected as normal and abnormal and inject it into the machine learning algorithm.Therefore, it used semi-supervised learning.Abnormal human events detection in train platforms is not generic and only used in train platforms.It requires assumption.The algorithm will check abnormal events by using the speed of the train in the train bad and the threshold set.It does not use any recursive method; hence it is assumed not as speed and memory efficient as the algorithm that has it.But it is automatic and utilizes unsupervised learning.
ISTL is used in surveillance cameras which run automatically.It used a semi-supervised method of learning.It used the validated data from the experts, meaning the data was labeled.There is no recursive calculation used, and it requires assumption.For example, in evaluation, two thresholds are used, which are anomaly threshold and temporal threshold.Transferred deep learning for hyperspectral images is used in images using a convolutional based detector (CNND).It used threshold to declare a section of pixel on an image is anomaly or not.It is semi-supervised, where it uses reference data to generate ground truth.It does not have any recursive calculation and is automatic.
Dangerous motion detectors in human crowds are used to avoid stampedes and other dangerous events.It uses assumption.For example, the alarm will be raised if the histogram dense flow exceeds the threshold set.It does not have a recursive update.But it is fully automatic, reducing human intervention as well as using unsupervised learning.
Meanwhile, anomaly extraction using the association rule is not automatic.It is built especially for detecting anomaly events in network pipelines.It uses a heterogenous detector without any recursive calculation and requires assumption to detect the anomaly.But it learns in an unsupervised manner without any aid of labeled data from experts.
Then, improved RX with CNN framework was used to detect anomalies in an image.It uses threshold, and no recursive mechanism is found in the algorithm.It is automatic which requires no human intervention.But it used semi-supervised learning, generating many trainings dataset to use in the algorithm.The HPM algorithm uses thresholds such as predefined amount of data in a window.It uses recursive mechanism, the same as the isolation forest [32].It is an automatic and unsupervised algorithm where training data is not required.The MDS_AD algorithm uses assumptions to determine the degree of anomaly score.It uses isolation forest [32], which uses a recursive mechanism.Unfortunately, the model keeps evolving from time to time.It is not automatic and is a semi-supervised algorithm.The anomaly detection using an array of sliding windows and PDDs uses three assumptions.Firstly, increasing the number of windows will affect true and false positive scores.Then increasing the number of sub windows will increase true and false positives.Finally, increasing the number of targets will less affect the algorithm's performance.Therefore, assumptions affect the algorithm's performance.There is no recursive mechanism, and it is not automatic.It is also an unsupervised algorithm.
Finally, the correlated anomaly detection from large streaming data uses assumptions which can affect algorithm performance.Furthermore, they are built based on assumptions problems.In the future, anomalies existence may not know, which will make this algorithm fail.For example, a botnet may modify to attack normal users accessing the server.The normal user may mark it as an anomaly, whereas the access root is from another user trying to freeze the server operations.It does not use any recursive mechanism and is automatic.It is also an unsupervised algorithm.

Discussion
Hence, based on Table 1, autonomous anomaly detection [22] was the best algorithm to fulfill the requirement for the best anomaly detector.It is the only algorithm that does not use any assumptions.Meanwhile, eccentricity analysis [12] used comparison threshold to differentiate normal and abnormal states [31].It also has a recursive, unsupervised, and fully automatic mechanism that detects anomalous data without human intervention.
Many additional algorithms could help close this gap apart from the reviewed algorithms.For example, autoencoders that could provide accurate input [33] and CNN-based features are preferred than other hand-crafted algorithms [34].Additionally, explainable deep neural networks (xDNN) can upgrade the anomaly detection algorithm, combining reasoning and learning in a synergistic way [35].Besides the training algorithm, both normal and abnormal data need to be balanced.
However, obtaining balanced data in the real world is difficult.But some anomaly detection algorithms can be used [36] to solve this issue.Therefore, a hybrid anomaly detection algorithm can be more powerful than a single anomaly detection algorithm to help close this gap quickly.For example, a hybrid algorithm can ease the burden of collecting balance data which becomes much fairer when training new anomaly detection algorithms.In other words, combining additional algorithms makes anomaly detection algorithms more reliable.
Finally, the anomaly detection algorithm can be improved by implementing all the criteria mentioned in this paper.As cybersecurity and IoT development thrive, anomaly detection is needed, especially in highspeed data.This is to make sure that the anomaly can be detected at the time it arrives.By using the autonomous system, which learns by itself [37] the dynamic existence of data [12], it can help in cybersecurity and IoT in detecting suspicious data.

CONCLUSION
This study introduced a review of algorithms related to anomaly detection.This review focused on algorithms that were developed from 2010 to 2022.Although there were other related algorithms designed during that period, six criteria were considered and discussed to select the appropriate algorithms.Hence, this study conducted a literature review for the thirteen algorithms along with the criteria needed for each anomaly detection algorithm to be applicable in the real world.As for the three research questions presented in this review, six criteria were presented to ensure the efficacy of an anomaly detection algorithm.In this sense, AAD was the only algorithm that had no assumptions compared to the other algorithm.This unique characteristic of EDA makes it suitable to be implemented in streaming data.As a recommendation, it will be much easier if an anomaly detection algorithm is implemented in devices to help detect unknown anomalies.It is also recommended for the anomaly detection algorithm be built based on the six criteria mentioned in this review.Consequently, it could reduce human intervention in detecting anomalies by detecting all possible anomalies instantly.


ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 9-22 12 then validate the trained algorithm before being aggregated into training data through the fuzzy aggregation method.Finally, the training data is used for other stream surveillance inputs.


ISSN: 2252-8938 Int J Artif Intell, Vol. 13, No. 1, March 2024: 9-22 14 becomes smaller when the shock wave is triggered.Shock waves are detected by comparing the current magnitude with the previous magnitude.