Skip to content

Latest commit

 

History

History
38 lines (27 loc) · 1.94 KB

README.md

File metadata and controls

38 lines (27 loc) · 1.94 KB

NLP Sentiment Analysis Challenge

Overview

This repository is dedicated to the Sentiment Analysis challenge on the IMDB Dataset of 50K Movie Reviews. The objective is to apply Natural Language Processing (NLP) techniques to accurately determine the sentiment of movie reviews.

Classifiers

In this project, we explore the effectiveness of various machine learning models for NLP tasks. The classifiers include:

  • Random Forest
  • K-Nearest Neighbors (K-NN)
  • Multinomial Naive Bayes
  • TF-IDF Vectorization as a feature extraction method
  • BERT (Bidirectional Encoder Representations from Transformers) as a state-of-the-art language model

Dataset

The dataset used in this challenge consists of 50,000 movie reviews from the IMDB database. Each review is labeled as positive or negative, providing a binary classification target for sentiment analysis.

Repository Structure

  • data/: Directory containing the IMDB dataset and any additional data files used in the analyses.
  • notebooks/: Jupyter notebooks with detailed analyses and model training steps.
  • models/: Serialized versions of the trained models ready for inference.
  • reports/: Generated reports and visualizations that summarize the findings and model performances.

Results

The models are evaluated based on accuracy, precision, recall, and F1-score to ensure a comprehensive understanding of their performance. Detailed results and discussions are presented within the Jupyter notebooks in the notebooks/ directory.

Contributing

Contributions to the NLP Sentiment Analysis Challenge are welcome! Please refer to CONTRIBUTING.md for guidelines on how to contribute to this project.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

For any queries or discussions regarding the project, please open an issue in this repository.


Happy coding!