Skip to content

In the dialogue prediction task, a model is trained to predict a character's name based on the given dialogue

License

Notifications You must be signed in to change notification settings

ali-sedaghi/Dialogue-Character-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Dialogue Prediction

In the dialogue prediction task, a model is trained to perdict a character's name based on the given dialogue.

File directories

Dataset

Friends tv series scripts used as dataset. Friends is an American television sitcom which aired on NBC from September 22, 1994, to May 6, 2004.

There are 6 main characters (classes) in this show:

  • Ross
  • Rachel
  • Joey
  • Chandler
  • Monica
  • Phoebe

The scripts are gathered from Here.

How to run

Requirements

Python packages must be installed:

pip install -r requirements.txt

Crawler

To run crawler and gather/update dataset:

cd src/crawler
scrapy crawl scripts -t csv -o ../../data/raw/scripts.csv
scrapy crawl dialogues -t csv -o ../../data/raw/dialogues.csv

Preprocessing

  • Step 1: Remove white spaces
  • Step 2: Lowercase all letters
  • Step 3: Remove special characters
  • Step 4: Remove short words
  • Step 5: Remove stopwords
cd src/preprocessing
python preprocessor.py

Statistics

  • Each person words count
  • Each person types count
  • Each person wordcloud
  • Each person histogram

Above metrics are extracted for entire scripts before and after preprocessing.

cd src/statistics
python main.py

Out of repo resources

About

In the dialogue prediction task, a model is trained to predict a character's name based on the given dialogue

Resources

License

Stars

Watchers

Forks