(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 8, 2023 An Empirical Internet Protocol Network Intrusion Detection using Isolation Forest and One-Class Support Vector Machines Gerard Shu Fuhnwi1, Victoria Adedoyin2, Janet O. Agbaje3 Gianforte School of Computing, Montana State University, Montana 59715, USA1 Department of Chemistry, Montana State University, Montana 59715, USA2 Department of Mathematical Sciences, Montana Technological University, Montana 59701, USA3 Abstract—With the increasing reliance on web-based appli- until it’s discovered. Some redirect users who are unaware of cations and services, network intrusion detection has become their website through cracking passwords or mimicking your a critical aspect of maintaining the security and integrity of website [1]. Sometimes, intruders absorb network resources computer networks. This study empirically investigates internet intended for other uses or users, which can lead to a denial of protocol network intrusion detection using two machine learning service [3]. These unauthorized penetrations on the digital net- techniques: Isolation Forest (IF) and One-Class Support Vector Machines (OC-SVM), combined with ANOVA F-test feature work are imperil on many occasions the security of networks selection. This paper presents an empirical study comparing the and their data [4]. effectiveness of two machine learning algorithms, Isolation Forest Network security breaches are rapidly increasing and result (IF) and One-Class Support Vector Machines (OC-SVM), with ANOVA F-test feature selection in detecting network intrusions in a significant amount of loss to organizations, and often leads using web services. The study used the NSL-KDD dataset, to a loss of confidence in them from their unaware customers encompassing hypertext transfer protocol (HTTP), simple mail that have fallen victims. The IBM report shows that the average transfer protocol (SMTP), and file transfer protocol (FTP) web cost of a data breach has risen 12 percent over the past five services attacks and normal traffic patterns, to comprehensively years to 3.92 million dollars per incident on average [5]. This evaluate the algorithms. The performance of the algorithms is more than the cost of a breach caused by a system glitch or is evaluated based on several metrics, such as the F1-score, human error. detection rate (recall), precision, false alarm rate (FAR), and Area Under the Receiver Operating Characteristic (AUCROC) curve. Many researchers have carried out research and projects Additionally, the study investigates the impact of different hyper- on network intrusion detection [6], [7], [8]. Wang and Bat- parameters on the performance of both algorithms. Our empirical titi identified intrusions in computer networks with principal results demonstrate that while both IF and OC-SVM exhibit high component analysis [9]. Liao and Vemuri used a k-nearest efficacy in detecting network intrusion attacks using web services neighbour classifier for intrusion detection [10]. Gaffney and of type HTTP, SMTP, and FTP, the One-Class Support Vector Ulvila evaluated intrusion detectors using a decision theory Machines outperform the Isolation Forest in terms of F1-score (SMTP), detection rate(HTTP, SMTP, and FTP), AUCROC, and approach [11]. But this area still longs for more work as a a consistent low false alarm rate (HTTP). We used the t-test to result of the rapid rise in network intrusion. Therefore, we need determine that OCSVM statistically outperforms IF on DR and to design an efficient algorithm that can successfully defend FAR. against network intrusions in an ever-evolving threat landscape. To achieve proactive security control, organizations must put Keywords—HTTP; SMTP; FTP; ANOVA F-test; AUCROC; OC- in place a good network security infrastructure and leverage SVMs; FAR; DR; IF the potential of machine learning, which has the capability of automatically and continuously detecting network intrusions. I. INTRODUCTION This will help block intruders and prevent them from achieving Network Intrusion can be referred to as an unauthorized their goals. The remainder of this paper is organized as follows: penetration of a computer in an establishment or an address in In Section II, we briefly review some related work in anomaly one’s assigned domain [1]. The nature and types of network detection based Network Intrusion Detection. Section III gives intrusion have evolved over the years and become more a description of the algorithms used in this paper. Section IV rampant in recent years [2]. analyzes the empirical evaluation, where we review the data sets used, evaluation metrics description, results, and result An intrusion can be passive or active. In passive intrusion, discussion. Section V covers the conclusion. the penetration is gained stealthily and without detection, while in active intrusion, changes to network resources are II. RELATED WORK affected. Intrusion can either come from an insider or an outsider. By insider, we mean an employee, customer, or Liu and Ting [23] focused on using an Isolation Forest business partner. Outsider means someone not connected to to detect anomalies that have many applications in the areas the organization. Network intrusions can occur in different of fraud detection, network intrusion, medical and public ways. Some announce their presence by defacing the website, health, industrial damage detection, and so on. The goal while others are malicious, with the goal of siphoning off data here is to build a tree-based structure that isolates anomalies www.ijacsa.thesai.org 1 | P a g e (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 8, 2023 rather than profiles anomalies like in the previous methods A. ANOVA F-test such as classification-based methods [12], and clustering-based methods [13]. Their proposed method, called Isolation Forest, The ANOVA F-test, or Analysis of Variance F-test, is a builds a collection of individual tree structures that recursively statistical technique used to compare the means of two or more partition a given data set, where anomalies are instances with groups to determine whether significant differences exist. It is a short path length on the trees. The anomaly score is used to commonly employed in feature selection or variable ranking determine instances that are anomalies, and has values between tasks, where the goal is to identify the most relevant features 1 and 0, with a score close to 1 being an anomaly and vice or variables for a particular analysis or model. versa. The authors compared their results with other methods Applying the ANOVA F-test to a dataset can rank features for anomaly detection techniques [14] like ORCA, LOF, and based on their F-statistic or p-value. Features with high F- RF on real-world data sets with high dimensions and large data statistic values or low p-values are considered more relevant, sizes using the metric AUC (Area Under the Curve) and run as they exhibit significant differences between the groups or times. [15] proposed a hybrid of SVM and decision trees in classes. These relevant features can then be selected for further classifying attacks of different forms of intrusion in knowledge analysis or modeling, while less informative features can be discovery and data mining 1999 (KDDCUP99) data. discarded to reduce dimensionality and improve computational efficiency. In the case of web network intrusion detection, the In [16], Sarumi et al. compared SVM and Apriori using ANOVA F-test can be used to identify the most discriminative Network Security Laboratory Knowledge Discovery and Data features that differentiate between normal network traffic and Mining (NSL-KDD) data and the University of South Wales malicious intrusion attempts. By selecting the most significant NB 2015 (UNSW NB-15) dataset. From their results, they con- features, it is possible to improve the performance and effi- cluded that SVM outperformed Apriori in terms of accuracy, ciency of intrusion detection systems by focusing on the most while Apriori showed a better performance in terms of speed. relevant information and reducing noise or irrelevant variables. In [17], Farnaaz and Jabbar proposed a detection intrusion system using random forest. Experimental results were con- B. Isolation Forest (IF) ducted on the NSL-KDD dataset. Empirical results show that IF has been applied in different scenarios. Isolation Forest the proposed model achieved a low false alarm rate and a high is an unsupervised learning algorithm for anomaly detection recall. Similarly, [18], [19], [20], and [21] applied machine that works on the principle of isolating anomalies, instead learning techniques for network intrusion detection systems. of the most common technique of profiling normal points [22] and [23]. It is different from other distance and density All the above mentioned papers discuss intrusion detection based algorithms (see Fig. 1). The underlying assumption for methods without any statistics to compare their results, attacks this algorithm is that fewer instances of anomalies result in using web services, and no user guidance for using the a smaller number of partitions (shorter path length) and the proposed algorithms. To overcome this, one can look at the instances with distinguishable attribute values are more likely statistical significance of the various evaluation metrics based to be separated in early partitioning [24]. This implies that data on the different machine learning algorithms proposed by them points that have a shorter path length are likely to be anomalies. and also change the various parameters in the machine learning The necessary input parameters for building Isolation Forest algorithms to observe their performance. algorithm are the subsampling size,the number of trees, and the This paper compares the performance of One-class SVM height of the tree [24]. The subsampling size was suggested and Isolation Forest machine learning algorithms in network to be smaller for the machine learning algorithm to function intrusion using a two-sample t-test and parameter alternation faster and yield a better detection result [25]. We can use log to provide some guidance on these algorithms’ usage to to base 2 (number of data points) to get the depth of trees new researchers in this field. Our approach can also guide needed, but the path length converge before t = 100. [25]. evaluating and analyzing these techniques in solving intrusion detection problems. Also, this method can overcome one of the main challenges of intrusion detection techniques, accurate representative labels for normal and abnormal instances, which is a significant concern. To overcome this challenge in most intrusion detection problems, our approach can be used as a pre-labeling technique and then supervised anomaly detection techniques to solve intrusion detection problems. Overall, our empirical results demonstrate the potential of Isolation Forest and One-Class SVM and provide valuable insights for future Fig. 1. Algorithm 1. research in this field. C. One-Class Support Vector Machines III. METHODS One-Class Support Vector Machines (OC-SVMs) [26] are a This section presents the intrusion detection approach used natural extension of SVMs. One-Class SVM is an unsupervised in this paper. These approaches include the ANOVA F-test, the learning technique capable of differentiating test samples from Isolation Forest, the One-Class Support Vector Machines, and a particular class from other classes. The One-Class SVM the two-sample t-test. works on the basics of minimizing the hypersphere of one www.ijacsa.thesai.org 2 | P a g e (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 8, 2023 class in the training set and then considers every other class B. Data Pre-processing not within the hypersphere as anomalies or outliers. In order to identify suspicious observations, an OC-SVM estimates a The NSL-KDD dataset has 41 features, each representing distribution that encompasses most of the observations and an attack type described in Section 4.1 and an attack category then labels as “suspicious” those that lie far from it with respect (class feature). These features are both numeric (38 features) to a suitable metric. This model uses different kernel functions and categorical (3 features). The categorical features are proto- or hyperspheres: linear, radial basis, sigmoid, and polynomial. col type (3 types), service (70 types), and flag (11 types) that need to be converted to numeric features. We want to extract the most popular attacks caused by using different internet protocols (the service feature). These widespread attacks use D. Two Sample t-test web services (internet protocols) such as hypertext transfer The two-sample t-test, also known as the independent protocol (HTTP), simple mail transfer protocol (SMTP), and samples t-test or unpaired t-test, is a statistical hypothesis file transfer protocol (FTP). After extracting the various inter- test used to compare the means of two independent groups to net protocols, the attack types (class) feature is labeled with a determine if there is a significant difference between them. The numeric type, starting with Normal, labeled as 0 and 1 for the test assumes that the data is normally distributed and that the different attack types. variances of the two groups are equal (although there are modi- Using ANOVA F-test feature elimination, the most relevant fications available if this assumption does not hold). In order to features with the highest F-statistic values in the dataset compare the performance of IF and OCSVM with the ANOVA are identified, eliminating the least important features. These F-test, we used a two-sample t-test to test whether there is a features are src bytes (number of data bytes transferred from significant difference between the mean performances of DR, source to destination in a single connection), dst bytes (number FAR, F1 score, AUCROC, and precision. The null hypothesis of data bytes transferred from destination to source in a single (H0) for the two-sample t-test states that there is no significant connection), and duration. difference between the mean performances of the two models, while the alternative hypothesis (H1) states that there is a C. Confusion Matrix significant difference, unlikely to have occurred by chance, between the mean performances of the two models. The performance of machine learning techniques can be evaluated using different parameters. These parameters are calculated using True Positive (TP), False Negative (FN), False Positive (FP), and True Negative (TN) as shown in the IV. EMPIRICAL EVALUATION confusion matrix [28] in Table II. The following parameters A. Data Description are used to evaluate our proposed approach. 1) Detection Rate (DR): It is the ratio between the total The NSL−KDD dataset is an improved version of the number of attacks detected by the NIDS and the total number KDD99 dataset, in which a large amount of data redundancy of attacks present in the dataset [17] which can be calculated has been removed [27]. This dataset has the same attributes as using the formula: the KDD99 having 41 features that are labeled as either normal or attacks using different web services (http, smtp, ftp, etc.). The NSL−KDD dataset repository has two files: KDDTrain.txt TP DR = and KDDTest.txt. Table I shows the attack categories using TP + FN different services and the number of data points per category in the NSL−KDD train and test datasets. The NSL−KDD dataset 2) Precision: This measures the fraction of examples pre- has 125973 data points in the training dataset and 22544 in dicted as attacks that turned out to be attacks, which can be the testing dataset. calculated using the formula: TP Precision = TP + FP TABLE I. THE ATTACK TYPES (CLASS) USING DIFFERENT INTERNET PROTOCOLS (HTTP, SMTP AND FTP), THE NUMBER OF RECORDS IN THE 3) F1 Score: It is the harmonic mean of the fraction of NSL-KDD TRAINING AND TESTING DATASET examples predicted as attacks that turned out to be attacks (precision). It can also be described as the ratio between the Attacks using different No. of records Attack Types (class) total number of attacks detected by the NIDS and the total Internet Protocol Training Testing number of attacks present in the dataset (the detection rate) Normal 45,078 7,291 Normal traffic data Worm, Land, Smurf, Udpstorm, which can be calculated using the formula: HTTP 2,289 1,180 Teardrop, Pod, Mailbomb, Neptune, Process table, 2 ∗ TP F Apache2, Back 1 Score = 2 ∗ TP + FN + FP SMTP 284 316 Ipsweep, Nmap, Satan Portsweep, Mscan, Saint WarezClient, Worm, 4) False Alarm Rate (FAR): It is the fraction of non attacks SnmpGetAttack, WarezMaster, that are misclassified as attacks, which can be calculated using FTP 648 48 Imap, SnmpGuess, Named, the formula: MultiHop, Phf, SPy, Sendmail, Ftp Write, Xsnoop, Xlock, Guess Password FP FAR = FP + TN www.ijacsa.thesai.org 3 | P a g e (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 8, 2023 TABLE II. CONFUSION MATRIX: A CONTINGENCY CONTAINING FOUR Precision, and low FAR. The SMTP and FTP subset performs METRICS, TRUE POSITIVE (TP), TRUE NEGATIVE (TN), FALSE POSITIVE best on the evaluation metrics with 50 estimators. Generally, (FP), AND FALSE NEGATIVE (FN). both Isolation Forest and One-class support vector machines didn’t perform well on the FTP subset, having very high Predicted Class FAR and low DR, F1 Score, AUCROC, and Precision. It Attack Yes No is evident in Table III and IV that the one-class support Actual Yes TP FN vector machines outperform Isolation Forest on all subsets, class No FP TN that is, HTTP, SMTP, and FTP having high DR, F1 Score, and low FAR. Statistical analysis of overall performance on the one-class support vector machines and Isolation Forest 5) Receiver Operating Characteristic (ROC) Curve: The results used a two-sample t-test with two-tailed probability to Receiver Operating Characteristic (ROC) curve is a graphical determine if each model’s DR and FAR score on the test data representation used to evaluate the performance of binary yielded statistically significant differences (p < 0.05). In the classification models in machine learning. It is created by HTTP, SMTP, and FTP, the one-class support machines had a plotting the ratio between the total number of attacks detected significantly different DR,and FAR score (p < 0.001), which by the NIDS to the total number of attacks present in the showed that our hypothesis was accepted. dataset (detection rate) against the fraction of non-attacks that Our approach improved detection capabilities on selected are misclassified as attacks (False Alarm Rate) at various clas- attack types when compared to other models and benchmarks sification threshold levels. The area under the curve (AUC) of in related work. The RFE process identified several key the ROC quantifies the overall performance of the classification features highly relevant to network intrusion detection. These model. AUC values range from 0 to 1, with a value of 0.5 features align with our expectations and prior research [18], representing a random classifier and a value of 1 indicating [20], and [25], confirming the importance of specific traffic a perfect classifier. A higher AUC value suggests a better- characteristics in detecting malicious activities. The combina- performing classification model. tion of IF, OC-SVM and ANOVA F-test not only improved A good NIDS should have high detection rates, precision, the model’s performance but also reduced the complexity of AUCROC, F score but low FAR. the model by eliminating redundant and irrelevant features. 1 Most machine learning algorithms are evaluated using The practical implications of our findings are significant predictive accuracy, but this is not appropriate for network for the field of network intrusion detection. The improved intrusion detection because it mostly involves imbalanced data. detection rates offered by our approach can help security In terms of imbalanced data, we mean that the proportion practitioners identify and respond to cyber threats more effec- of data points in each class is not approximately equal. The tively. Additionally, reducing false positives and negatives can evaluation metrics adopted in this paper for evaluation and minimize the operational overhead of manually investigating comparison of our models are standard AUC (Area under false alarms. Furthermore, our approach demonstrates potential curve). The area under the receiver operating curve gives an scalability and adaptability for different network environments average measure of performance across all possible classifica- and evolving cyber threats. tion thresholds. V. CONCLUSION AND FUTURE WORK D. Experimental Results The experiments performed on the NSL-KDD network All experiments were performed in Python with alternating intrusion data show that One-class support vector machines had parameters for Isolation Forest (Sklearn) and One-class support the overall best performance in terms of DR and FAR scores vector machines (Sklearn) using the Intel(R) Core(TM) i7- over the Isolation Forest, with the best performance obtained 10510U CPU at 1.80 GHz and 2.30 GHz processor with 16 by tuning the default parameters in both algorithms. Also, the GB of RAM. Training and testing of the Isolation Forest took number of estimators in Isolation Forest is comparable; using four seconds, while it took 40 seconds to train the One-Class 100 and 50 estimators outperformed 200 estimators. support vector machine model on the three selected features from the NSL−KDD dataset. The experimental results for Therefore, One-class support is a good model for network One-class support vector machines and Isolation Forest on intrusion detection by changing the default parameters in different performance metrics are shown in Table III and Table Sklearn. Also, polynomial or sigmoid kernel functions could IV respectively. be the best kernels to choose when using One-Class SVM on network intrusion data. Because of the usage of feature selection, the computational cost decreases (four seconds for E. Discussion of Results Isolation Forest and forty seconds), and our experimental In Table III, the polynomial kernel outperformed the other results indicate that our proposed approach increases the DR, kernels on HTTP and SMTP subsets with high DR, F Score, F1 score, AUCROC, and precision and decreases FAR for 1 AUCROC, Precision, and low FAR. On the other hand, in Table three types of attacks. We equally compared one-class support III, the sigmoid kernel performed much better than the different vector machines and Isolation Forest selected attack types kernels on the FTP subset. using a two-sample t-test and found that our proposed approach (with fewer features) is promising. For future work, we will In Table IV, isolation forest with 100 estimators on the experiment with deep learning approaches like GANs and HTTP subset achieved the highest DR, F1 Score, AUCROC, autoencoders since they are capable of handling data of higher www.ijacsa.thesai.org 4 | P a g e (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 8, 2023 TABLE III. ONE-CLASS SVM PERFORMANCE MEASURE ON NSL−KDD TEST Attacks using different Kernel Gamma DR F1 Score AUCROC Precision FAR Internet Protocol Linear 0.00005 0.9802 0.9303 0.6308 0.8852 0.0198 HTTP Sigmoid 0.00005 0.9969 0.9386 0.8867 0.6383 0.0031 Polynomial 0.00005 0.9969 0.9385 0.6378 0.8866 0.0031 SMTP Linear 0.00005 0.8398 0.7228 0.4468 0.6345 0.1602 Sigmoid 0.00005 0.9806 0.7953 0.5156 0.6689 0.0194 Polynomial 0.00005 0.9838 0.7969 0.5172 0.6696 0.0162 FTP Linear 0.00005 0.3333 0.5000 0.6667 1.0000 0.6667 Sigmoid 0.00005 0.5208 0.6848 0.7604 1.0000 0.4791 Polynomial 0.00005 0.3333 0.5000 0.6667 1.0000 0.6667 TABLE IV. ISOLATION FOREST PERFORMANCE MEASURE ON NSL−KDD TEST Attacks using different Estimators maximum samples DR F1 Score AUCROC Precision FAR Internet Protocol 50 256 0.9618 0.9563 0.8402 0.9508 0.0382 HTTP 100 256 0.9631 0.9570 0.8409 0.9509 0.0369 200 256 0.9619 0.9563 0.8399 0.9507 0.0381 SMTP 50 256 0.9725 0.7846 0.6697 0.9725 0.0275 100 256 0.9709 0.7838 0.6697 0.9709 0.0291 200 256 0.9693 0.7835 0.6699 0.9693 0.0307 FTP 50 256 0.2917 0.3043 0.6225 0.3182 0.7083 100 256 0.2083 0.2273 0.5809 0.2500 0.7916 200 256 0.2708 0.2857 0.6121 0.3023 0.7292 dimensions and also evaluate one-class support vector ma- [10] Liao, Yihua and Vemuri, V Rao:Use of k-nearest neighbor classifier chines and Isolation Forest using supervised learning methods for intrusion detection, Computers & security,vol.21,no.5, pp.439–448, like the random forest, Xgboost, and cost sensitive support Elsevier , 2002. vector machines. [11] Ulvila, Jacob W and Gaffney Jr, John E: Evaluation of intrusion detection systems, Journal of Research of the National Institute of Standards and Technology, vol.108, no. 6, pp.453, 2003. REFERENCES [12] Abe, Naoki and Zadrozny, Bianca and Langford, John: Outlier detection [1] Michael West: Chapter 2 - Preventing System Intrusions: Network and by active learning. Proceedings of the 12th ACM SIGKDD international System Security (Second Edition), pp.29–56. Syngress, Boston, 2014. conference on Knowledge discovery and data mining, pp.504–509, 2006. doi:10.1016/B978-0-12-416689-9.00002-2. [13] He, Zengyou and Xu, Xiaofei and Deng, Shengchun:Discovering [2] Thomas M. Chen and Patrick J. Walsh: Chapter 3 - Guarding Against cluster-based local outliers,Pattern Recognition Letters,vol.24,no.9?10, Network Intrusions. Network and System Security (Second Edition). pp.1641–1650, Elsevier, 2003. pp.57–82. Syngress, Boston, 2014. doi:10.1016/B978-0-12-416689-9. [14] Fuhnwi, Gerard Shu and Agbaje, Janet O and Oshinubi, Kayode and 00003-4. Peter, Olumuyiwa James: An Empirical Study on Anomaly Detection [3] Robert Moskowitz: Network Intrusion: Methods of Attack, 2014. Using Density-based and Representative-based Clustering Algorithms, Journal of the Nigerian Society of Physical Sciences, pp.1364–1364, [4] Isabell Gaylord: Network Intrusion: How to Detect and Prevent It. 2023. Reducing Risk, United states Cybersecurity Magazine, 2020. https: //www.uscybersecurity.net/network-intrusion/. [15] Mulay, Snehal A and Devale, PR and Garje, GV: Intrusion detection system using support vector machine and decision tree, International [5] David Bisson: How to Foil the 6 Stages of a Network Journal of Computer Applications, vol.3, no.3, pp.40–43, Citeseer, 2010. Intrusion,Tripwire State of security news, 2019. https://www.tripwire. com/state-of-security/security-data-protection/security-hardening/ [16] Sarumi, Oluwafemi A and Adetunmbi, Adebayo O and Adetoye, 6-stages-of-network-intrusion-and-how-to-defend-against-them/. Fadekemi A: Discovering computer networks intrusion using data an- alytics and machine intelligence, Scientific African, vol.9, pp.e00500, [6] Kumar, Amit and Maurya, Harish Chandra and Misra, Rahul: A research Elsevier, 2020. paper on hybrid intrusion detection system, International Journal of Engineering and Advanced Technology (IJEAT), vol.2, no.4, pp.294– [17] Farnaaz, Nabila and Jabbar, MA: Random forest modeling for network 297, Citeseer, 2013. intrusion detection system, Procedia Computer Science, Elsevier, pp.213– 217, (2016). [7] Li Tian and Wang Jianwen: Research on Network Intrusion Detection System Based on Improved K-means Clustering Algorithm, International [18] WS, Jenif D Souza and Parvathavarthini, B: Machine learning based in- Forum on Computer Science-Technology and Applications, vol. 1, pp. trusion detection framework using recursive feature elimination method, 76–79, 2009. doi:10.1109/IFCSTA.2009.25. 2020 International Conference on System, Computation, Automation and Networking (ICSCAN), pp.1–4, IEEE, 2020. [8] Dikshant Gupta and Suhani Singhal and Shamita Malik and Archana Singh:Network intrusion detection system using various data mining [19] Ingre, Bhupendra and Yadav, Anamika: Performance analysis of NSL- technique, International Conference on Research Advances in Integrated KDD dataset using ANN, 2015 international conference on signal Navigation Systems (RAINS), pp.1–6, 2016. doi:10.1109/RAINS.2016. processing and communication engineering systems, pp.92–96, IEEE, 7764418. 2015. [9] Wang, Wei and Battiti, Roberto:Identifying intrusions in computer net- [20] Aljawarneh, Shadi and Aldwairi, Monther and Yassein, Muneer Bani: works based on principal component analysis, University of Trento, 2005. Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model, Journal of Computational Science, pp.152–160, Elsevier, 2018. www.ijacsa.thesai.org 5 | P a g e (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 14, No. 8, 2023 [21] Hamed, Tarfa and Dara, Rozita and Kremer, Stefan C: Network intru- [25] Liu, Fei Tony and Ting, Kai Ming and Zhou, Zhi-Hua:Isolation- sion detection system based on recursive feature addition and bigram based anomaly, ACM Transactions on Knowledge Discovery from Data technique, Computers & Security, pp.137–155, Elsevier, 2018. (TKDD). vol.6, no.1, pp.1–39, Acm New York, NY, USA, 2012. [22] Wikipedia contributors: Isolation forest — Wikipedia, The Free En- [26] Larry M. Manevitz and Malik Yousef: One-Class SVMs for Document cyclopedia, 2020, https://en.wikipedia.org/w/index.php?title=Isolation Classification, Journal of Machine Learning Research 2, pp.139–154, forest&oldid=985700362. 2001. [23] Liu, Fei Tony and Ting, Kai Ming and Zhou, Zhi-Hua:Isolation- [27] Ring, Markus and Wunderlich, Sarah and Scheuring, Deniz and Landes, based anomaly detection, Eighth IEEE International Conference on Data Dieter and Hotho, Andreas, “A survey of network-based intrusion detec- Mining, pp.413–422, IEEE, 2008. tion data sets,” Computers & Security, vol.86, pp. 147–167, Elsevier, [24] Arunraj, Nari S and Hable, Robert and Fernandes, Michael and Leidl, 2019. Karl and Heigl, Michael: Comparison of supervised, semi-supervised [28] Beauxis-Aussalet, Emma and Hardman, Lynda, “IEEE Conference on and unsupervised learning methods in network intrusion detection system Visual Analytics Science and Technology (VAST)-Poster Proceedings,” (NIDS) application, Anwendungen und Konzepte der Wirtschaftsinfor- pp. 1–2, 2014. matik, no.6, 2017. www.ijacsa.thesai.org 6 | P a g e