This project involves the development of a machine learning(NLP) model to detect fake news.The dataset used for this project contains labeled news articles, allowing the model to differentiate between real and fake news. Several algorithms were evaluated to identify the best-performing model.
Requirements The project requires the following Python packages: pandas numpy scikit-learn matplotlib seaborn
Data The dataset used for this project is a labeled dataset of news articles. It contains two columns: the article text and the label (real or fake).
data: Contains the dataset used for training and testing the models. notebooks: Jupyter notebooks for data exploration, model training, and evaluation. scripts: Python scripts for data preprocessing, feature extraction, and model training. models: Saved models and evaluation results. results: Results including confusion matrices and ROC curves.
Results The Logistic Regression model achieved the highest accuracy with the following confusion matrix:
Predicted True Predicted Fake True 4154 43 Fake 56 4685 The ROC curve for the Logistic Regression model shows an AUC of 1.00, indicating excellent performance.