This repository contains the source code for the Quantum Computing part of the project "Serial Code Porting on HPC & Quantum Computing". The project is dedicated to developing tools based on HPC and Quantum Computing to help on cybersecurity-related classification tasks. In the Quantum Computing part of the project, developed as a collaboration between Sogei and Politecnico di Milano, we tested Quantum Annealing Feature Selection methods based on existing research on a publicly available cybersecurity classification dataset.
Here we explain how to install dependencies, download and prepare the dataset, and run experiments included in this repository. Be aware that, due to some changes in D-Wave's terms of services, we were unable to access a quantum annealer when working on this project. Therefore, you will only find classical solvers here, but you can check the previously linked repository for more options.
NOTE: This repository requires Python 3.8
It is suggested to install all the required packages into a new Python environment.
After repository checkout, enter the repository folder and create a new environment using your environment manager of choice (such as uv, conda or virtualenv).
In this repository we provide both a requirements.txt file and a pyproject.toml file for compatibility with different package managers.
The code was tested with both uv and virtualenv with pip.
We also provide onboarding scripts for both Unix-like and Windows operating systems that install uv and setup the virtual environment (setup_venv.sh for Unix-like and setup_venv.ps1 for Windows).
PyMIToolbox is a Python wrapper to the C library MIToolbox, which is used to efficiently compute Mutual Information, used for the MIQUBO feature selection method.
NOTE: If you setup the environment on a Unix-like OS using the provided
setup_venv.shscript you can skip to the dataset section. If you setup the environment on Windows using thesetup_venv.ps1script you can skip the building section.
In order to use PyMIToolbox you first need to download and compile the MIToolbox library in the PyMIToolbox directory.
You can download the MIToolbox source code here.
Unzip the file and rename the extracted folder to MIToolbox.
Make sure to move this folder into the existing PyMIToolbox directory.
Now, go into the MIToolbox directory and compile the C library. If you are on Linux or macOS run the following command:
cd PyMIToolbox/MIToolbox/
make x64while on Windows you need some tool to get access to the make command, such as MSYS2, MinGW or WSL.
After installing one of these tools and making sure the make command is available, run the following:
cd PyMIToolbox/MIToolbox/
make x64_winThis will result in a compiled library file (.so on Linux/macOS and .dll on Windows) to be placed in the PyMIToolbox/MIToolbox/ folder.
If you don't see the file, it may have been compiled to another directory and should be moved to the correct folder.
After installing all dependencies, you will need to download the publicly available UNSW-NB15 dataset (you can find its description here). In particular, you will need to download the following files:
NUSW-NB15_features.csv
UNSW-NB15_1.csv
UNSW-NB15_2.csv
UNSW-NB15_3.csv
UNSW-NB15_4.csv
and place them under the ./data/full/ directory.
Then, execute the preprocessing script by running
python preprocessing.py
after activating the previously installed virtual environment.
This preprocessing step will separate the dataset into 8 datasets containing data for different cyberattack categories, saved in the ./data/attack_cat/ folder, and a preprocessed version of the full dataset under ./data/full/.
To run the experiments you can use the quantum_feature_selection.py Python script.
That script accepts the --quboalgorithm, --qubosolver and --category parameters.
You can browse the code to check what are the valid values for these parameters.
We also provide scripts to execute the entire set of experiments on both Unix-like and Windows operating systems (qfs.sh and qfs.ps1 respectively).
For convenience, you can edit those script in order to restrict your experiments to specific configurations.
All the results will be saved in the ./results/[category]/[quboalgorithm]/[qubosolver]/ directory.
QUBO_result_df.csv files contain DataFrames with the selected features (and additional infos) for each experiment configuration, while the sampleset_* files contain the various solutions found by each QUBO solver in its runs.
If you want to merge the separate feature selection results after running the experiments, you can execute the merge_results.py Python script, which will generate the combined ./results/QFS_features.csv DataFrame file.
NOTE: Running all the experiments on a quantum annealer may require a significant amount of QPU time, so pay attention to the number of experiments you run, if you have limited resources.