-
Notifications
You must be signed in to change notification settings - Fork 0
ML features
Mikhail Koltsov edited this page Nov 5, 2016
·
11 revisions
Mike:
- number of first-person/second-person words (implementation);
- number of subjective/objective terms (not implemented, because in the referred paper they used separate classifier to check subjectiveness);
Boris:
- length of review (sum of lengths of Mystem-lemmatized words, implementation)
- number of CAPS words (implementation);
- number of contradistinction words ("а", "но", ",") (implementation);
- uni- and bigram representations of review (implementation);
- number of adjectives, pairs of adjective+noun, ... other POS-related ideas (using MyStem, implementation);
Lesya:
- do we have an answer from shop (difficult to implement, because Y.Market API seem not to give answers from shops);
- number of exclamation marks, smileys, ... (implementation)
- mean length of word (words are normalized using Mystem implementation);
- number of synonym words inside the review (implementation uses synonym dictionary from Internet);
- meta-information about review: author name, number of reviews by the same author, described product, distribution of votes for the same thing, ... (implemented: author name, is anonymous, shop id, number of reviews by the same author);
Misha:
- number of spelling mistakes (implemented using Enchant library);
- bag of words representation of review (implemented using sklearn and nltk);