Dialogue Prediction

In the dialogue prediction task, a model is trained to perdict a character's name based on the given dialogue.

File directories

Dataset

Friends tv series scripts used as dataset. Friends is an American television sitcom which aired on NBC from September 22, 1994, to May 6, 2004.

There are 6 main characters (classes) in this show:

Ross
Rachel
Joey
Chandler
Monica
Phoebe

The scripts are gathered from Here.

How to run

Requirements

Python packages must be installed:

pip install -r requirements.txt

Crawler

To run crawler and gather/update dataset:

cd src/crawler
scrapy crawl scripts -t csv -o ../../data/raw/scripts.csv
scrapy crawl dialogues -t csv -o ../../data/raw/dialogues.csv

Preprocessing

Step 1: Remove white spaces
Step 2: Lowercase all letters
Step 3: Remove special characters
Step 4: Remove short words
Step 5: Remove stopwords

cd src/preprocessing
python preprocessor.py

Statistics

Each person words count
Each person types count
Each person wordcloud
Each person histogram

Above metrics are extracted for entire scripts before and after preprocessing.

cd src/statistics
python main.py

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
data		data
models		models
notebooks		notebooks
reports		reports
src		src
statistics		statistics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dialogue Prediction

File directories

Dataset

How to run

Requirements

Crawler

Preprocessing

Statistics

Out of repo resources

About

Languages

License

ali-sedaghi/Dialogue-Character-Prediction

Folders and files

Latest commit

History

Repository files navigation

Dialogue Prediction

File directories

Dataset

How to run

Requirements

Crawler

Preprocessing

Statistics

Out of repo resources

About

Resources

License

Stars

Watchers

Forks

Languages