Enhancing Intrusion Detection with SecurityOnion using Machine Learning

Overview and Context of the project

The work involved the setup and configuration of Security Onion as a Network Security Monitoring (NSM) solution, developing a machine learning model for alert classification, integrating the ML model with the Security Onion solution filter out false positivea and reduce Alert Fatigue, and creating a graphical user interface (GUI) to streamline the usage of the solution.

This project was conducted as part of an internship at the National Agency of Cybersecurity (NACS), or Agence Nationale de la Cybersécurité (ANCS) in French, a Tunisian agency specializing in safeguarding digital infrastructures. The primary objective of this project is to enhance the efficiency of Intrusion Detection Systems (IDS) by reducing the number of false positives generated, allowing security analysts to focus on real threats.

Repository Structure

The repository is organized as follows:

App/: Contains the main application code, including the scripts for network monitoring and machine learning model integration with Security Onion.
IPython Notebook/: Includes Jupyter notebooks for machine learning model development and alert prioritization.
report_images/: Stores images used for documentation purposes.
README.md: Documentation of the project and instructions for setup.

Virtualized Architecture Setup

The virtualized architecture was established using Oracle VirtualBox. Three machines were set up:

Attacking Machine (Kali Linux): This machine was used to simulate attacks.
Victim Machine (Windows 10): This machine contained vulnerabilities to simulate attack scenarios.
Security Onion Machine (Ubuntu Server 20.04): Hosted Security Onion for monitoring and intrusion detection.
Network Configuration: All three machines were connected to the same NAT Network, establishing a controlled testing environment.

Machine Learning Model Development

Classification Model:

This phase of the project centers on the development of a machine learning classification model, trained on the UNSW-NB15 dataset, to predict the authenticity of network traffic, distinguishing between genuine threats and false positives.

UNSW-NB15 Dataset:

The UNSW-NB15 Dataset is a publicly available dataset widely used in cybersecurity to develop and test intrusion detection systems (IDS) and intrusion prevention systems (IPS). It was developed by the Australian Centre for Cyber Security (ACCS) at the University of New South Wales in Australia.

The figure below summarizes the full steps of the ML Model development:

Implementation:

The implementation for this step is documented in a Jupyter notebook, offering a step-by-step explanation of the process. Please refer to Classification model.ipynb for access to this detailed guide.

Alert Prioritization with Packet Analysis and Machine Learning

The alert prioritization process involved four essential steps arranged in a pipeline:

Elasticsearch Data Extraction
PCAP Files Fetching
Feature Extraction
Prediction

Elasticsearch Data Extraction:

The first step involves extracting Suricata alerts from Elasticsearch, identifying the associated connection information (flow info) for each alert, and saving the results in a CSV file.

Implementation:

The implementation for this step is documented in a Jupyter notebook, offering a step-by-step explanation of the process. Please refer to Classification model.ipynb for access to this detailed guide.

PCAP Files Fetching:

This step focuses on fetching Packet Capture (PCAP) files essential for acquiring the complete network flow associated with each alert. The process involves SSH connectivity with the Security Onion machine.

Implementation:

The implementation for this step is documented in a Jupyter notebook, providing a comprehensive, step-by-step explanation of the process. You can access the detailed guide in Remote PCAP Request and Retrieval (so-standalone).ipynb for the standalone node and in Remote PCAP Retrieval and Filtering (so-import).ipynb for the import node, each tailored to their respective implementations.

Features Extraction:

In this step, features for each alert are computed based on the UNSW-NB15 dataset. The relevant features are extracted from each alert's PCAP file and saved in a CSV file.

Prediction:

The final step involves predicting whether the alerts correspond to true attacks or false alarms, using the previously trained classification model.

Implementation:

The implementation for this step is documented in a Jupyter notebook, offering a step-by-step explanation of the process. Please refer to Predictive Analysis.ipynb for access to this detailed guide.

Graphical User Interface (GUI) Development

A graphical interface was developed using CustomTkinter to facilitate interaction with the system.

Use Cases:

The diagram below offers a visual representation of the various interaction scenarios, providing a comprehensive understanding of the functionalities and use cases.

App Interfaces:

Elasticsearch Alerts Interface: Allows analysts to retrieve Suricata alerts and related flow information.

Security Onion Machine Interface: Enables packet capture retrieval and feature extraction.

Prediction Panel: Predicts whether an alert is a genuine threat or a false positive.

Automation Panel: Automates the alert retrieval and prediction process.

Conclusion

This project successfully addressed the challenge of reducing false positives in intrusion detection systems. By integrating machine learning models into the IDS workflow, the project reduced unnecessary alerts and improved the overall efficiency of cybersecurity operations. This work represents a significant step toward enhancing the resilience of digital defenses in a world where cybersecurity threats continue to evolve.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Enhancing Intrusion Detection with SecurityOnion using Machine Learning

Overview and Context of the project

Repository Structure

Virtualized Architecture Setup