What is a network intrusion detection system (NIDS) and how does it work?
Computer security starts with security threat detection or intrusion detection, which was first introduced in 1980s. An intrusion is a deliberate unauthorized attempt (successful or not) by an intruder to get access to, manipulate, or misuse a device or a network. Examples include computer viruses, trojans, denial of service, probe, and brute force attacks. An intrusion detection system (IDS) can be a device or a program that monitors a given system or a network for malicious activity or policy violations. It aims to identify as soon as possible the threats that traditional security approaches, like firewalls, are unable to protect against. Any detected harmful activity is then reported or automatically acted upon to further provide security and safety for the monitored system.
Various types of IDS
IDSs can work in various environments and we can commonly classify them into:
- Network Intrusion Detection Systems monitor the incoming and outgoing network traffic at a strategic point in the network. They can analyze the traffic of the entire sub-network and report intrusions without using any resources of the monitored systems. They provide real-time detection and response because they generally analyze small amounts of data (packets or flows). However, they are unable to detect attacks that happen locally on the devices or on the parts of the network from which traffic does not pass the monitoring point and are unable to analyze the encrypted packet payloads.
- Host Intrusion Detection Systems monitor a single device, where they can monitor the local activity and incoming and outgoing network traffic. They can analyze the system's inner workings and network packet payloads in great detail and provide can provide real-time detection and response. On the other hand, they use the monitored system resources to function and have a limited view of the activity of the computer network to which a device is connected.
IDSs also differ in their approach to detecting intrusions:
- Signature-based Intrusion Detection Systems compare current device or network activity against a database of signatures of known attacks. If the current activity matches a stored signature, then a possible attack is detected. These approaches have excellent detection performance, however, they need frequent updates to signatures and are unable to detect new and unknown attacks. Preparing quality signatures is also time and resource-consuming and can also require human experts.
- Anomaly-based Intrusion Detection Systems build models of normal device or network activity and then flag short and abrupt deviations from these models as possible attacks. Their main advantage is that they can detect new and unknown attacks, however, they have issues with detecting threat patterns that are similar to normal activity or misclassifying changes in normal traffic as threats. These approaches can be computationally demanding because they are monitoring multiple points in the system and build models of activity in real time.
IDSs can also be grouped by their specific detection methodology:
- Statistics-based methods analyze the activity of the system or network using complex statistical algorithms to identify threats.
- Pattern-based methods identify the characters, forms and patterns in the data that present threats.
- Rule-based methods use a threat signature to detect a potential attack from the monitored system and network activity.
- State-based methods examine a stream of events to identify a possible threat.
- Heuristic-based methods identify any abnormal activity that is our go ordinary activity.
Types of anomaly-based Intrusion Detection Systems
The anomaly-based IDSs have developed variously in recent years, however, they can be generally grouped into three categories, which can be further split into multiple subcategories:
- Statistics-based techniques in general collect and examine every data point of the monitored system to build a statistical model of normal activity and flag large deviations from this model as anomalies:
- Univariate methods create a normal profile for the activity of only a single variable and look for abnormalities in each variable.
- Multivariate methods can model two or more variables in a single model that also represents the relationships between multiple correlated variables.
- Time series model methods observe a series of observations over a certain time interval. A new observation is abnormal if its probability of occurring at that time is too low.
- Knowledge-based techniques try to identify the activity from legitimate existing system data such as protocol specifications or network traffic instances and unknown activity is reported as anomalous.
- Finite State Machine methods use a computation model that represents and controls the execution flow, where the model is represented in the form of states, transitions and activities. The state checks the history data and variations in the input are analyzed and possible variations in transitions can lead to anomaly detection.
- Description languages methods define the syntax of rules that are used to describe the characteristics of a defined attack. These rules can be built using description languages such as N-grammars and Unified Modeling Languages.
- Expert systems comprise several rules that define attacks. The rules are usually manually defined by domain experts.
- Signature analysis rely on string matching of a network packet against a distinct signature of attacks.
- Machine learning-based techniques acquire complex pattern-matching capabilities from training data. Supervised learning usually consists of two stages, namely training and testing. During training the model identifies the relevant features and classes from labeled training data. During testing the performance of the model is evaluated against labeled test data. After that, the model can be used to classify unlabeled data. On the other hand, unsupervised learning builds the models from the unlabeled data that store the representative patterns.
- Decision trees comprise several decision nodes that hold an attribute test, branches that represent possible decisions based on the threshold of the decision nodes and leaf nodes that represents the class to which the instance belongs.
- Naive Bayes methods apply conditional probability formulae to the features and rely on the assumption that features have a different probability of occurring during normal activity and in an attack. In general, they try to answer the questions such as what is the probability that a particular attack is occurring, given the observed system activity.
- Genetic algorithms provide a heuristic approach to optimization, where each possible solution is represented as a series of features and the quality of the solutions improves over time by the application of selection and reproduction operations that favour better solutions.
- Artificial neural network builds a web of interconnected nodes on multiple layers, where the first layer presents the input and the last layer presents the prediction. Each node and connection has a weight that is used to model the interactions between features. During the training phase, the predictions are compared to the true outputs and all the weights in the network are then modified accordingly with the procedure called backpropagation.
- Fuzzy logic is based on the degrees of uncertainty, where an instance can belong, possibly partially, to multiple classes at the same time. The methods provide a clear way of arriving at a final decision based on unclear ambiguous, noisy, inaccurate or missing input data. They can model the difference between normal and abnormal activity in more detail than just with two classes.
- Support vector machines are a discriminative classifier defined by a splitting hyperplane. They use various kernel functions to map the training data into a higher-dimensioned space so that intrusion is linearly classified on each plane separately.
- Hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unseen data. It consists of nodes that present states of classes and connections between states both with corresponding weights that are achieved during the training process.
- K-nearest neighbors methods classify the unlabeled instance to the majority class of the k nearest neighbor instances, where k represents the number of nearest neighbors.
- K-means is the most common technique of clustering used today. It discriminates an arbitrary amount of data points into k clusters. A data point is clustered to the cluster with the closest mean value. The number of clusters has to be determined in advance, which can be its main drawback.
- Hierarchical clustering aims to create a hierarchy of clusters, which can be achieved in two different ways. The first is the agglomerative or bottom-up approach, where each data point starts as its cluster and is then merged while the algorithm climbs through the level until a satisfactory number of clusters is achieved. The second is a divisive or top-down approach, where all data points start in one cluster that is then split into two sub-clusters on each level until a satisfactory number of clusters is achieved.
Some methods combine multiple approaches such as ensemble and hybrid methods. Others combine supervised and unsupervised learning into so-called semi-supervised learning or even reinforcement learning in more recent years.