Skip to content

ML features

Mikhail Koltsov edited this page Nov 5, 2016 · 11 revisions

Machine learning features we use:

Mike:

  • number of first-person/second-person words (implementation);
  • number of subjective/objective terms (not implemented, because in the referred paper they used separate classifier to check subjectiveness);

Boris:

  • length of review (sum of lengths of Mystem-lemmatized words, implementation)
  • number of CAPS words (implementation);
  • number of contradistinction words ("а", "но", ",") (implementation);
  • uni- and bigram representations of review (implementation);
  • number of adjectives, pairs of adjective+noun, ... other POS-related ideas (using MyStem, implementation);

Lesya:

  • do we have an answer from shop (difficult to implement, because Y.Market API seem not to give answers from shops);
  • number of exclamation marks, smileys, ... (implementation)
  • mean length of word (words are normalized using Mystem implementation);
  • number of synonym words inside the review (implementation uses synonym dictionary from Internet);
  • meta-information about review: author name, number of reviews by the same author, described product, distribution of votes for the same thing, ... (implemented: author name, is anonymous, shop id, number of reviews by the same author);

Misha:

Clone this wiki locally