Twitter Sentiment Analysis

This project focuses on sentiment analysis of Twitter data using a dataset from Kaggle. It preprocesses the data and builds a classification model to determine whether a tweet has a positive, negative, or neutral sentiment.

Project Overview

This project uses a dataset sourced from Kaggle to analyze the sentiment of tweets. The process includes:

Installing necessary dependencies.
Fetching and extracting the dataset.
Preprocessing the text data by removing stop words and performing stemming.
Building a classification model using Logistic Regression to classify tweets as positive, negative, or neutral.

Installation

Follow these steps to set up the project:

Install Kaggle Library: The project uses the Kaggle API to download the dataset:
```
pip install kaggle
```

Configure Kaggle Credentials: Ensure you have a Kaggle API key (kaggle.json) configured:

# Copy kaggle.json to the appropriate directory
mkdir -p ~/.kaggle
cp /path/to/kaggle.json ~/.kaggle/
chmod 600 ~/.kaggle/kaggle.json

Download the Dataset: Use the Kaggle API to download the sentiment analysis dataset:
```
kaggle datasets download -d kazanova/sentiment140
```

Extract the Dataset: Extract the dataset from the zip file:

from zipfile import ZipFile
dataset = '/content/sentiment140.zip'
with ZipFile(dataset, 'r') as zip:
    zip.extractall()
    print("The dataset is extracted")

Usage

Preprocess the Data: The text data is cleaned and preprocessed, including the removal of stopwords and stemming using PorterStemmer from the NLTK library.

Example of importing dependencies:
```
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
```

Load the Dataset: Load the sentiment dataset into a Pandas DataFrame:

twitter_data = pd.read_csv('/content/training.1600000.processed.noemoticon.csv', encoding='ISO-8859-1')

Train the Model: Split the data into training and testing sets, then use Logistic Regression to train the sentiment classifier:

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

model = LogisticRegression()
model.fit(X_train, y_train)

Technologies Used

Python
Libraries:
- Numpy
- Pandas
- Scikit-learn
- NLTK (Natural Language Toolkit)

Dataset

The dataset used for this project is the Sentiment140 dataset from Kaggle. It contains 1.6 million tweets labeled with sentiment (positive, negative, or neutral).

Source: Sentiment140 Dataset

Results

The Logistic Regression model was trained on the dataset and evaluated using accuracy. Further results are available in the notebook.

Contributing

If you'd like to contribute, feel free to submit a pull request or raise an issue.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
Twitter_Sentiment_Analysis.ipynb		Twitter_Sentiment_Analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Twitter Sentiment Analysis

Table of Contents

Project Overview

Installation

Usage

Technologies Used

Dataset

Results

Contributing

License

About

Releases

Packages

Languages

Akanksha9821/Twitter_Sentiment_Analysis

Folders and files

Latest commit

History

Repository files navigation

Twitter Sentiment Analysis

Table of Contents

Project Overview

Installation

Usage

Technologies Used

Dataset

Results

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages