Paraphrase identification in Python 3 using scikit-learn

It's a reimplementation and extension of ASOBEK.

Currently only adds as a feature, the dot product of the sums of word2vec vectors from both tweets and replaces SVC with AdaBoost-ed decision trees.

Currently the performance of the method is unstable, sometimes yielding an F1 score of 0.6903 (beating ASOBEK's 0.674) and sometimes as low as 0.63.

The word2vec database is "pre-trained vectors trained on part of Google News" from the word2vec website.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
paraphrase		paraphrase
pyword2vec		pyword2vec
ReadMe.md		ReadMe.md
modelclasses.py		modelclasses.py

Provide feedback