(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 8, 2023
An Empirical Internet Protocol Network Intrusion
Detection using Isolation Forest and One-Class
Support Vector Machines
Gerard Shu Fuhnwi1, Victoria Adedoyin2, Janet O. Agbaje3
Gianforte School of Computing, Montana State University, Montana 59715, USA1
Department of Chemistry, Montana State University, Montana 59715, USA2
Department of Mathematical Sciences, Montana Technological University, Montana 59701, USA3
Abstract—With the increasing reliance on web-based appli- until it’s discovered. Some redirect users who are unaware of
cations and services, network intrusion detection has become their website through cracking passwords or mimicking your
a critical aspect of maintaining the security and integrity of website [1]. Sometimes, intruders absorb network resources
computer networks. This study empirically investigates internet intended for other uses or users, which can lead to a denial of
protocol network intrusion detection using two machine learning service [3]. These unauthorized penetrations on the digital net-
techniques: Isolation Forest (IF) and One-Class Support Vector
Machines (OC-SVM), combined with ANOVA F-test feature work are imperil on many occasions the security of networks
selection. This paper presents an empirical study comparing the and their data [4].
effectiveness of two machine learning algorithms, Isolation Forest Network security breaches are rapidly increasing and result
(IF) and One-Class Support Vector Machines (OC-SVM), with
ANOVA F-test feature selection in detecting network intrusions in a significant amount of loss to organizations, and often leads
using web services. The study used the NSL-KDD dataset, to a loss of confidence in them from their unaware customers
encompassing hypertext transfer protocol (HTTP), simple mail that have fallen victims. The IBM report shows that the average
transfer protocol (SMTP), and file transfer protocol (FTP) web cost of a data breach has risen 12 percent over the past five
services attacks and normal traffic patterns, to comprehensively years to 3.92 million dollars per incident on average [5]. This
evaluate the algorithms. The performance of the algorithms is more than the cost of a breach caused by a system glitch or
is evaluated based on several metrics, such as the F1-score, human error.
detection rate (recall), precision, false alarm rate (FAR), and Area
Under the Receiver Operating Characteristic (AUCROC) curve. Many researchers have carried out research and projects
Additionally, the study investigates the impact of different hyper- on network intrusion detection [6], [7], [8]. Wang and Bat-
parameters on the performance of both algorithms. Our empirical titi identified intrusions in computer networks with principal
results demonstrate that while both IF and OC-SVM exhibit high component analysis [9]. Liao and Vemuri used a k-nearest
efficacy in detecting network intrusion attacks using web services neighbour classifier for intrusion detection [10]. Gaffney and
of type HTTP, SMTP, and FTP, the One-Class Support Vector Ulvila evaluated intrusion detectors using a decision theory
Machines outperform the Isolation Forest in terms of F1-score
(SMTP), detection rate(HTTP, SMTP, and FTP), AUCROC, and approach [11]. But this area still longs for more work as a
a consistent low false alarm rate (HTTP). We used the t-test to result of the rapid rise in network intrusion. Therefore, we need
determine that OCSVM statistically outperforms IF on DR and to design an efficient algorithm that can successfully defend
FAR. against network intrusions in an ever-evolving threat landscape.
To achieve proactive security control, organizations must put
Keywords—HTTP; SMTP; FTP; ANOVA F-test; AUCROC; OC- in place a good network security infrastructure and leverage
SVMs; FAR; DR; IF the potential of machine learning, which has the capability of
automatically and continuously detecting network intrusions.
I. INTRODUCTION This will help block intruders and prevent them from achieving
Network Intrusion can be referred to as an unauthorized their goals. The remainder of this paper is organized as follows:
penetration of a computer in an establishment or an address in In Section II, we briefly review some related work in anomaly
one’s assigned domain [1]. The nature and types of network detection based Network Intrusion Detection. Section III gives
intrusion have evolved over the years and become more a description of the algorithms used in this paper. Section IV
rampant in recent years [2]. analyzes the empirical evaluation, where we review the data
sets used, evaluation metrics description, results, and result
An intrusion can be passive or active. In passive intrusion, discussion. Section V covers the conclusion.
the penetration is gained stealthily and without detection,
while in active intrusion, changes to network resources are II. RELATED WORK
affected. Intrusion can either come from an insider or an
outsider. By insider, we mean an employee, customer, or Liu and Ting [23] focused on using an Isolation Forest
business partner. Outsider means someone not connected to to detect anomalies that have many applications in the areas
the organization. Network intrusions can occur in different of fraud detection, network intrusion, medical and public
ways. Some announce their presence by defacing the website, health, industrial damage detection, and so on. The goal
while others are malicious, with the goal of siphoning off data here is to build a tree-based structure that isolates anomalies
www.ijacsa.thesai.org 1 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 8, 2023
rather than profiles anomalies like in the previous methods A. ANOVA F-test
such as classification-based methods [12], and clustering-based
methods [13]. Their proposed method, called Isolation Forest, The ANOVA F-test, or Analysis of Variance F-test, is a
builds a collection of individual tree structures that recursively statistical technique used to compare the means of two or more
partition a given data set, where anomalies are instances with groups to determine whether significant differences exist. It is
a short path length on the trees. The anomaly score is used to commonly employed in feature selection or variable ranking
determine instances that are anomalies, and has values between tasks, where the goal is to identify the most relevant features
1 and 0, with a score close to 1 being an anomaly and vice or variables for a particular analysis or model.
versa. The authors compared their results with other methods Applying the ANOVA F-test to a dataset can rank features
for anomaly detection techniques [14] like ORCA, LOF, and based on their F-statistic or p-value. Features with high F-
RF on real-world data sets with high dimensions and large data statistic values or low p-values are considered more relevant,
sizes using the metric AUC (Area Under the Curve) and run as they exhibit significant differences between the groups or
times. [15] proposed a hybrid of SVM and decision trees in classes. These relevant features can then be selected for further
classifying attacks of different forms of intrusion in knowledge analysis or modeling, while less informative features can be
discovery and data mining 1999 (KDDCUP99) data. discarded to reduce dimensionality and improve computational
efficiency. In the case of web network intrusion detection, the
In [16], Sarumi et al. compared SVM and Apriori using ANOVA F-test can be used to identify the most discriminative
Network Security Laboratory Knowledge Discovery and Data features that differentiate between normal network traffic and
Mining (NSL-KDD) data and the University of South Wales malicious intrusion attempts. By selecting the most significant
NB 2015 (UNSW NB-15) dataset. From their results, they con- features, it is possible to improve the performance and effi-
cluded that SVM outperformed Apriori in terms of accuracy, ciency of intrusion detection systems by focusing on the most
while Apriori showed a better performance in terms of speed. relevant information and reducing noise or irrelevant variables.
In [17], Farnaaz and Jabbar proposed a detection intrusion
system using random forest. Experimental results were con- B. Isolation Forest (IF)
ducted on the NSL-KDD dataset. Empirical results show that IF has been applied in different scenarios. Isolation Forest
the proposed model achieved a low false alarm rate and a high is an unsupervised learning algorithm for anomaly detection
recall. Similarly, [18], [19], [20], and [21] applied machine that works on the principle of isolating anomalies, instead
learning techniques for network intrusion detection systems. of the most common technique of profiling normal points
[22] and [23]. It is different from other distance and density
All the above mentioned papers discuss intrusion detection based algorithms (see Fig. 1). The underlying assumption for
methods without any statistics to compare their results, attacks this algorithm is that fewer instances of anomalies result in
using web services, and no user guidance for using the a smaller number of partitions (shorter path length) and the
proposed algorithms. To overcome this, one can look at the instances with distinguishable attribute values are more likely
statistical significance of the various evaluation metrics based to be separated in early partitioning [24]. This implies that data
on the different machine learning algorithms proposed by them points that have a shorter path length are likely to be anomalies.
and also change the various parameters in the machine learning The necessary input parameters for building Isolation Forest
algorithms to observe their performance. algorithm are the subsampling size,the number of trees, and the
This paper compares the performance of One-class SVM height of the tree [24]. The subsampling size was suggested
and Isolation Forest machine learning algorithms in network to be smaller for the machine learning algorithm to function
intrusion using a two-sample t-test and parameter alternation faster and yield a better detection result [25]. We can use log
to provide some guidance on these algorithms’ usage to to base 2 (number of data points) to get the depth of trees
new researchers in this field. Our approach can also guide needed, but the path length converge before t = 100. [25].
evaluating and analyzing these techniques in solving intrusion
detection problems. Also, this method can overcome one of
the main challenges of intrusion detection techniques, accurate
representative labels for normal and abnormal instances, which
is a significant concern. To overcome this challenge in most
intrusion detection problems, our approach can be used as a
pre-labeling technique and then supervised anomaly detection
techniques to solve intrusion detection problems. Overall, our
empirical results demonstrate the potential of Isolation Forest
and One-Class SVM and provide valuable insights for future Fig. 1. Algorithm 1.
research in this field.
C. One-Class Support Vector Machines
III. METHODS
One-Class Support Vector Machines (OC-SVMs) [26] are a
This section presents the intrusion detection approach used natural extension of SVMs. One-Class SVM is an unsupervised
in this paper. These approaches include the ANOVA F-test, the learning technique capable of differentiating test samples from
Isolation Forest, the One-Class Support Vector Machines, and a particular class from other classes. The One-Class SVM
the two-sample t-test. works on the basics of minimizing the hypersphere of one
www.ijacsa.thesai.org 2 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 8, 2023
class in the training set and then considers every other class B. Data Pre-processing
not within the hypersphere as anomalies or outliers. In order
to identify suspicious observations, an OC-SVM estimates a The NSL-KDD dataset has 41 features, each representing
distribution that encompasses most of the observations and an attack type described in Section 4.1 and an attack category
then labels as “suspicious” those that lie far from it with respect (class feature). These features are both numeric (38 features)
to a suitable metric. This model uses different kernel functions and categorical (3 features). The categorical features are proto-
or hyperspheres: linear, radial basis, sigmoid, and polynomial. col type (3 types), service (70 types), and flag (11 types) that
need to be converted to numeric features. We want to extract
the most popular attacks caused by using different internet
protocols (the service feature). These widespread attacks use
D. Two Sample t-test web services (internet protocols) such as hypertext transfer
The two-sample t-test, also known as the independent protocol (HTTP), simple mail transfer protocol (SMTP), and
samples t-test or unpaired t-test, is a statistical hypothesis file transfer protocol (FTP). After extracting the various inter-
test used to compare the means of two independent groups to net protocols, the attack types (class) feature is labeled with a
determine if there is a significant difference between them. The numeric type, starting with Normal, labeled as 0 and 1 for the
test assumes that the data is normally distributed and that the different attack types.
variances of the two groups are equal (although there are modi- Using ANOVA F-test feature elimination, the most relevant
fications available if this assumption does not hold). In order to features with the highest F-statistic values in the dataset
compare the performance of IF and OCSVM with the ANOVA are identified, eliminating the least important features. These
F-test, we used a two-sample t-test to test whether there is a features are src bytes (number of data bytes transferred from
significant difference between the mean performances of DR, source to destination in a single connection), dst bytes (number
FAR, F1 score, AUCROC, and precision. The null hypothesis of data bytes transferred from destination to source in a single
(H0) for the two-sample t-test states that there is no significant connection), and duration.
difference between the mean performances of the two models,
while the alternative hypothesis (H1) states that there is a C. Confusion Matrix
significant difference, unlikely to have occurred by chance,
between the mean performances of the two models. The performance of machine learning techniques can be
evaluated using different parameters. These parameters are
calculated using True Positive (TP), False Negative (FN),
False Positive (FP), and True Negative (TN) as shown in the
IV. EMPIRICAL EVALUATION confusion matrix [28] in Table II. The following parameters
A. Data Description are used to evaluate our proposed approach.
1) Detection Rate (DR): It is the ratio between the total
The NSL−KDD dataset is an improved version of the number of attacks detected by the NIDS and the total number
KDD99 dataset, in which a large amount of data redundancy of attacks present in the dataset [17] which can be calculated
has been removed [27]. This dataset has the same attributes as using the formula:
the KDD99 having 41 features that are labeled as either normal
or attacks using different web services (http, smtp, ftp, etc.).
The NSL−KDD dataset repository has two files: KDDTrain.txt TP
DR =
and KDDTest.txt. Table I shows the attack categories using TP + FN
different services and the number of data points per category in
the NSL−KDD train and test datasets. The NSL−KDD dataset 2) Precision: This measures the fraction of examples pre-
has 125973 data points in the training dataset and 22544 in dicted as attacks that turned out to be attacks, which can be
the testing dataset. calculated using the formula:
TP
Precision =
TP + FP
TABLE I. THE ATTACK TYPES (CLASS) USING DIFFERENT INTERNET
PROTOCOLS (HTTP, SMTP AND FTP), THE NUMBER OF RECORDS IN THE 3) F1 Score: It is the harmonic mean of the fraction of
NSL-KDD TRAINING AND TESTING DATASET examples predicted as attacks that turned out to be attacks
(precision). It can also be described as the ratio between the
Attacks using different No. of records Attack Types (class) total number of attacks detected by the NIDS and the total
Internet Protocol Training Testing number of attacks present in the dataset (the detection rate)
Normal 45,078 7,291 Normal traffic data
Worm, Land, Smurf, Udpstorm, which can be calculated using the formula:
HTTP 2,289 1,180 Teardrop, Pod, Mailbomb,
Neptune, Process table, 2 ∗ TP
F
Apache2, Back 1 Score =
2 ∗ TP + FN + FP
SMTP 284 316 Ipsweep, Nmap, Satan
Portsweep, Mscan, Saint
WarezClient, Worm, 4) False Alarm Rate (FAR): It is the fraction of non attacks
SnmpGetAttack, WarezMaster, that are misclassified as attacks, which can be calculated using
FTP 648 48 Imap, SnmpGuess, Named, the formula:
MultiHop, Phf, SPy, Sendmail,
Ftp Write, Xsnoop, Xlock,
Guess Password FP
FAR =
FP + TN
www.ijacsa.thesai.org 3 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 8, 2023
TABLE II. CONFUSION MATRIX: A CONTINGENCY CONTAINING FOUR Precision, and low FAR. The SMTP and FTP subset performs
METRICS, TRUE POSITIVE (TP), TRUE NEGATIVE (TN), FALSE POSITIVE best on the evaluation metrics with 50 estimators. Generally,
(FP), AND FALSE NEGATIVE (FN). both Isolation Forest and One-class support vector machines
didn’t perform well on the FTP subset, having very high
Predicted Class FAR and low DR, F1 Score, AUCROC, and Precision. It
Attack
Yes No is evident in Table III and IV that the one-class support
Actual Yes TP FN vector machines outperform Isolation Forest on all subsets,
class No FP TN that is, HTTP, SMTP, and FTP having high DR, F1 Score,
and low FAR. Statistical analysis of overall performance on
the one-class support vector machines and Isolation Forest
5) Receiver Operating Characteristic (ROC) Curve: The results used a two-sample t-test with two-tailed probability to
Receiver Operating Characteristic (ROC) curve is a graphical determine if each model’s DR and FAR score on the test data
representation used to evaluate the performance of binary yielded statistically significant differences (p < 0.05). In the
classification models in machine learning. It is created by HTTP, SMTP, and FTP, the one-class support machines had a
plotting the ratio between the total number of attacks detected significantly different DR,and FAR score (p < 0.001), which
by the NIDS to the total number of attacks present in the showed that our hypothesis was accepted.
dataset (detection rate) against the fraction of non-attacks that Our approach improved detection capabilities on selected
are misclassified as attacks (False Alarm Rate) at various clas- attack types when compared to other models and benchmarks
sification threshold levels. The area under the curve (AUC) of in related work. The RFE process identified several key
the ROC quantifies the overall performance of the classification features highly relevant to network intrusion detection. These
model. AUC values range from 0 to 1, with a value of 0.5 features align with our expectations and prior research [18],
representing a random classifier and a value of 1 indicating [20], and [25], confirming the importance of specific traffic
a perfect classifier. A higher AUC value suggests a better- characteristics in detecting malicious activities. The combina-
performing classification model. tion of IF, OC-SVM and ANOVA F-test not only improved
A good NIDS should have high detection rates, precision, the model’s performance but also reduced the complexity of
AUCROC, F score but low FAR. the model by eliminating redundant and irrelevant features.
1
Most machine learning algorithms are evaluated using The practical implications of our findings are significant
predictive accuracy, but this is not appropriate for network for the field of network intrusion detection. The improved
intrusion detection because it mostly involves imbalanced data. detection rates offered by our approach can help security
In terms of imbalanced data, we mean that the proportion practitioners identify and respond to cyber threats more effec-
of data points in each class is not approximately equal. The tively. Additionally, reducing false positives and negatives can
evaluation metrics adopted in this paper for evaluation and minimize the operational overhead of manually investigating
comparison of our models are standard AUC (Area under false alarms. Furthermore, our approach demonstrates potential
curve). The area under the receiver operating curve gives an scalability and adaptability for different network environments
average measure of performance across all possible classifica- and evolving cyber threats.
tion thresholds.
V. CONCLUSION AND FUTURE WORK
D. Experimental Results The experiments performed on the NSL-KDD network
All experiments were performed in Python with alternating intrusion data show that One-class support vector machines had
parameters for Isolation Forest (Sklearn) and One-class support the overall best performance in terms of DR and FAR scores
vector machines (Sklearn) using the Intel(R) Core(TM) i7- over the Isolation Forest, with the best performance obtained
10510U CPU at 1.80 GHz and 2.30 GHz processor with 16 by tuning the default parameters in both algorithms. Also, the
GB of RAM. Training and testing of the Isolation Forest took number of estimators in Isolation Forest is comparable; using
four seconds, while it took 40 seconds to train the One-Class 100 and 50 estimators outperformed 200 estimators.
support vector machine model on the three selected features
from the NSL−KDD dataset. The experimental results for Therefore, One-class support is a good model for network
One-class support vector machines and Isolation Forest on intrusion detection by changing the default parameters in
different performance metrics are shown in Table III and Table Sklearn. Also, polynomial or sigmoid kernel functions could
IV respectively. be the best kernels to choose when using One-Class SVM
on network intrusion data. Because of the usage of feature
selection, the computational cost decreases (four seconds for
E. Discussion of Results Isolation Forest and forty seconds), and our experimental
In Table III, the polynomial kernel outperformed the other results indicate that our proposed approach increases the DR,
kernels on HTTP and SMTP subsets with high DR, F Score, F1 score, AUCROC, and precision and decreases FAR for
1
AUCROC, Precision, and low FAR. On the other hand, in Table three types of attacks. We equally compared one-class support
III, the sigmoid kernel performed much better than the different vector machines and Isolation Forest selected attack types
kernels on the FTP subset. using a two-sample t-test and found that our proposed approach
(with fewer features) is promising. For future work, we will
In Table IV, isolation forest with 100 estimators on the experiment with deep learning approaches like GANs and
HTTP subset achieved the highest DR, F1 Score, AUCROC, autoencoders since they are capable of handling data of higher
www.ijacsa.thesai.org 4 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 8, 2023
TABLE III. ONE-CLASS SVM PERFORMANCE MEASURE ON NSL−KDD TEST
Attacks
using
different Kernel Gamma DR F1 Score AUCROC Precision FAR
Internet
Protocol
Linear 0.00005 0.9802 0.9303 0.6308 0.8852 0.0198
HTTP Sigmoid 0.00005 0.9969 0.9386 0.8867 0.6383 0.0031
Polynomial 0.00005 0.9969 0.9385 0.6378 0.8866 0.0031
SMTP Linear 0.00005 0.8398 0.7228 0.4468 0.6345 0.1602
Sigmoid 0.00005 0.9806 0.7953 0.5156 0.6689 0.0194
Polynomial 0.00005 0.9838 0.7969 0.5172 0.6696 0.0162
FTP Linear 0.00005 0.3333 0.5000 0.6667 1.0000 0.6667
Sigmoid 0.00005 0.5208 0.6848 0.7604 1.0000 0.4791
Polynomial 0.00005 0.3333 0.5000 0.6667 1.0000 0.6667
TABLE IV. ISOLATION FOREST PERFORMANCE MEASURE ON NSL−KDD TEST
Attacks
using
different Estimators maximum samples DR F1 Score AUCROC Precision FAR
Internet
Protocol
50 256 0.9618 0.9563 0.8402 0.9508 0.0382
HTTP 100 256 0.9631 0.9570 0.8409 0.9509 0.0369
200 256 0.9619 0.9563 0.8399 0.9507 0.0381
SMTP 50 256 0.9725 0.7846 0.6697 0.9725 0.0275
100 256 0.9709 0.7838 0.6697 0.9709 0.0291
200 256 0.9693 0.7835 0.6699 0.9693 0.0307
FTP 50 256 0.2917 0.3043 0.6225 0.3182 0.7083
100 256 0.2083 0.2273 0.5809 0.2500 0.7916
200 256 0.2708 0.2857 0.6121 0.3023 0.7292
dimensions and also evaluate one-class support vector ma- [10] Liao, Yihua and Vemuri, V Rao:Use of k-nearest neighbor classifier
chines and Isolation Forest using supervised learning methods for intrusion detection, Computers & security,vol.21,no.5, pp.439–448,
like the random forest, Xgboost, and cost sensitive support Elsevier , 2002.
vector machines. [11] Ulvila, Jacob W and Gaffney Jr, John E: Evaluation of intrusion
detection systems, Journal of Research of the National Institute of
Standards and Technology, vol.108, no. 6, pp.453, 2003.
REFERENCES [12] Abe, Naoki and Zadrozny, Bianca and Langford, John: Outlier detection
[1] Michael West: Chapter 2 - Preventing System Intrusions: Network and by active learning. Proceedings of the 12th ACM SIGKDD international
System Security (Second Edition), pp.29–56. Syngress, Boston, 2014. conference on Knowledge discovery and data mining, pp.504–509, 2006.
doi:10.1016/B978-0-12-416689-9.00002-2. [13] He, Zengyou and Xu, Xiaofei and Deng, Shengchun:Discovering
[2] Thomas M. Chen and Patrick J. Walsh: Chapter 3 - Guarding Against cluster-based local outliers,Pattern Recognition Letters,vol.24,no.9?10,
Network Intrusions. Network and System Security (Second Edition). pp.1641–1650, Elsevier, 2003.
pp.57–82. Syngress, Boston, 2014. doi:10.1016/B978-0-12-416689-9. [14] Fuhnwi, Gerard Shu and Agbaje, Janet O and Oshinubi, Kayode and
00003-4. Peter, Olumuyiwa James: An Empirical Study on Anomaly Detection
[3] Robert Moskowitz: Network Intrusion: Methods of Attack, 2014. Using Density-based and Representative-based Clustering Algorithms,
Journal of the Nigerian Society of Physical Sciences, pp.1364–1364,
[4] Isabell Gaylord: Network Intrusion: How to Detect and Prevent It. 2023.
Reducing Risk, United states Cybersecurity Magazine, 2020. https:
//www.uscybersecurity.net/network-intrusion/. [15] Mulay, Snehal A and Devale, PR and Garje, GV: Intrusion detection
system using support vector machine and decision tree, International
[5] David Bisson: How to Foil the 6 Stages of a Network Journal of Computer Applications, vol.3, no.3, pp.40–43, Citeseer, 2010.
Intrusion,Tripwire State of security news, 2019. https://www.tripwire.
com/state-of-security/security-data-protection/security-hardening/ [16] Sarumi, Oluwafemi A and Adetunmbi, Adebayo O and Adetoye,
6-stages-of-network-intrusion-and-how-to-defend-against-them/. Fadekemi A: Discovering computer networks intrusion using data an-
alytics and machine intelligence, Scientific African, vol.9, pp.e00500,
[6] Kumar, Amit and Maurya, Harish Chandra and Misra, Rahul: A research Elsevier, 2020.
paper on hybrid intrusion detection system, International Journal of
Engineering and Advanced Technology (IJEAT), vol.2, no.4, pp.294– [17] Farnaaz, Nabila and Jabbar, MA: Random forest modeling for network
297, Citeseer, 2013. intrusion detection system, Procedia Computer Science, Elsevier, pp.213–
217, (2016).
[7] Li Tian and Wang Jianwen: Research on Network Intrusion Detection
System Based on Improved K-means Clustering Algorithm, International [18] WS, Jenif D Souza and Parvathavarthini, B: Machine learning based in-
Forum on Computer Science-Technology and Applications, vol. 1, pp. trusion detection framework using recursive feature elimination method,
76–79, 2009. doi:10.1109/IFCSTA.2009.25. 2020 International Conference on System, Computation, Automation and
Networking (ICSCAN), pp.1–4, IEEE, 2020.
[8] Dikshant Gupta and Suhani Singhal and Shamita Malik and Archana
Singh:Network intrusion detection system using various data mining [19] Ingre, Bhupendra and Yadav, Anamika: Performance analysis of NSL-
technique, International Conference on Research Advances in Integrated KDD dataset using ANN, 2015 international conference on signal
Navigation Systems (RAINS), pp.1–6, 2016. doi:10.1109/RAINS.2016. processing and communication engineering systems, pp.92–96, IEEE,
7764418. 2015.
[9] Wang, Wei and Battiti, Roberto:Identifying intrusions in computer net- [20] Aljawarneh, Shadi and Aldwairi, Monther and Yassein, Muneer Bani:
works based on principal component analysis, University of Trento, 2005. Anomaly-based intrusion detection system through feature selection
analysis and building hybrid efficient model, Journal of Computational
Science, pp.152–160, Elsevier, 2018.
www.ijacsa.thesai.org 5 | P a g e
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 14, No. 8, 2023
[21] Hamed, Tarfa and Dara, Rozita and Kremer, Stefan C: Network intru- [25] Liu, Fei Tony and Ting, Kai Ming and Zhou, Zhi-Hua:Isolation-
sion detection system based on recursive feature addition and bigram based anomaly, ACM Transactions on Knowledge Discovery from Data
technique, Computers & Security, pp.137–155, Elsevier, 2018. (TKDD). vol.6, no.1, pp.1–39, Acm New York, NY, USA, 2012.
[22] Wikipedia contributors: Isolation forest — Wikipedia, The Free En- [26] Larry M. Manevitz and Malik Yousef: One-Class SVMs for Document
cyclopedia, 2020, https://en.wikipedia.org/w/index.php?title=Isolation Classification, Journal of Machine Learning Research 2, pp.139–154,
forest&oldid=985700362. 2001.
[23] Liu, Fei Tony and Ting, Kai Ming and Zhou, Zhi-Hua:Isolation- [27] Ring, Markus and Wunderlich, Sarah and Scheuring, Deniz and Landes,
based anomaly detection, Eighth IEEE International Conference on Data Dieter and Hotho, Andreas, “A survey of network-based intrusion detec-
Mining, pp.413–422, IEEE, 2008. tion data sets,” Computers & Security, vol.86, pp. 147–167, Elsevier,
[24] Arunraj, Nari S and Hable, Robert and Fernandes, Michael and Leidl, 2019.
Karl and Heigl, Michael: Comparison of supervised, semi-supervised [28] Beauxis-Aussalet, Emma and Hardman, Lynda, “IEEE Conference on
and unsupervised learning methods in network intrusion detection system Visual Analytics Science and Technology (VAST)-Poster Proceedings,”
(NIDS) application, Anwendungen und Konzepte der Wirtschaftsinfor- pp. 1–2, 2014.
matik, no.6, 2017.
www.ijacsa.thesai.org 6 | P a g e