Skip to content

A repository containing datasets and code for a Final Year Project (FYP) on Malicious URLs Detection. Explore various versions of datasets, documentation, and implementations of traditional machine learning and neural network models for detecting malicious URLs. Includes functionality for feature extraction, model testing, and evaluation.

License

Notifications You must be signed in to change notification settings

auth-Afham/FYP-MaliciousURLsDetection

Repository files navigation

FYP-MaliciousURLSDetection

Project Image

Project Overview

This repository contains the dataset and code for the Malicious URLs Detection project as part of the CGEB4323 Project 2 CS course at UNITEN.

Dataset

Version 1.00

  • malicious_phish.csv: Dataset containing information about malicious phishing URLs.

  • README.md: Information about the dataset version 1.00.

Version 1.01

  • README.md: Information about the dataset version 1.01.

  • updated_urls.csv: Dataset containing updated URLs.

Version 1.02

  • README.md: Information about the dataset version 1.02.

  • split_urls.csv: Dataset containing split URLs.

Code

Functions

  • get_headers.ipynb: Jupyter Notebook for extracting headers.

  • test_sklearn_model.ipynb: Jupyter Notebook for testing the scikit-learn model.

  • test_tf_model.ipynb: Jupyter Notebook for testing the TensorFlow model.

  • test_xgb_model.ipynb: Jupyter Notebook for testing the XGBoost model.

Neural Network

  • tk_tf_nn.ipynb: Jupyter Notebook for tokenizer with the TensorFlow Neural Network.

Traditional ML

  • cv_lr.ipynb: Jupyter Notebook for count vectorizer with Logistic Regression.

  • cv_rf.ipynb: Jupyter Notebook for count vectorizer with Random Forest.

  • cv_svm.ipynb: Jupyter Notebook for count vectorizer with Support Vector Machine.

  • cv_xgb.ipynb: Jupyter Notebook for count vectorizer with XGBoost.

  • tf-idf_lr.ipynb: Jupyter Notebook for TF-IDF with Logistic Regression.

  • tf-idf_rf.ipynb: Jupyter Notebook for TF-IDF with Random Forest.

  • tf-idf_svm.ipynb: Jupyter Notebook for TF-IDF with Support Vector Machine.

  • tf-idf_xgb.ipynb: Jupyter Notebook for TF-IDF with XGBoost.

Label Encoders

  • label_encoder_cv_xgb.pkl: Pickle file for Label Encoder used in count vectorizer with XGBoost.

Models

  • cv_xgb.pkl: Pickle file for the XGBoost model used in count vectorizer.

Vectorizers

  • vectorizer_cv_xgb.pkl: Pickle file for the count vectorizer used with XGBoost.

About

A repository containing datasets and code for a Final Year Project (FYP) on Malicious URLs Detection. Explore various versions of datasets, documentation, and implementations of traditional machine learning and neural network models for detecting malicious URLs. Includes functionality for feature extraction, model testing, and evaluation.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published