This repository contains the dataset and code for the Malicious URLs Detection project as part of the CGEB4323 Project 2 CS course at UNITEN.
-
malicious_phish.csv: Dataset containing information about malicious phishing URLs.
-
README.md: Information about the dataset version 1.00.
-
README.md: Information about the dataset version 1.01.
-
updated_urls.csv: Dataset containing updated URLs.
-
README.md: Information about the dataset version 1.02.
-
split_urls.csv: Dataset containing split URLs.
-
get_headers.ipynb: Jupyter Notebook for extracting headers.
-
test_sklearn_model.ipynb: Jupyter Notebook for testing the scikit-learn model.
-
test_tf_model.ipynb: Jupyter Notebook for testing the TensorFlow model.
-
test_xgb_model.ipynb: Jupyter Notebook for testing the XGBoost model.
- tk_tf_nn.ipynb: Jupyter Notebook for tokenizer with the TensorFlow Neural Network.
-
cv_lr.ipynb: Jupyter Notebook for count vectorizer with Logistic Regression.
-
cv_rf.ipynb: Jupyter Notebook for count vectorizer with Random Forest.
-
cv_svm.ipynb: Jupyter Notebook for count vectorizer with Support Vector Machine.
-
cv_xgb.ipynb: Jupyter Notebook for count vectorizer with XGBoost.
-
tf-idf_lr.ipynb: Jupyter Notebook for TF-IDF with Logistic Regression.
-
tf-idf_rf.ipynb: Jupyter Notebook for TF-IDF with Random Forest.
-
tf-idf_svm.ipynb: Jupyter Notebook for TF-IDF with Support Vector Machine.
-
tf-idf_xgb.ipynb: Jupyter Notebook for TF-IDF with XGBoost.
- label_encoder_cv_xgb.pkl: Pickle file for Label Encoder used in count vectorizer with XGBoost.
- cv_xgb.pkl: Pickle file for the XGBoost model used in count vectorizer.
- vectorizer_cv_xgb.pkl: Pickle file for the count vectorizer used with XGBoost.