A solution to the Semeval Paraphrase identification task
It's a reimplementation and extension of ASOBEK.
Currently only adds as a feature, the dot product of the sums of word2vec vectors from both tweets and replaces SVC with AdaBoost-ed decision trees.
Currently the performance of the method is unstable, sometimes yielding an F1 score of 0.6903 (beating ASOBEK's 0.674) and sometimes as low as 0.63.
The word2vec database is "pre-trained vectors trained on part of Google News" from the word2vec website.