Skip to content

Tokenizer

CSchott edited this page May 9, 2016 · 1 revision

Functionality

The tokenizer is used to create a list of single words/tokens from a given string. To accomplish this the tokenizer separates the string by white space. Then he deletes unwanted characters like parentheses( For a total List see below). The # and @ characters will be handled in another way. Words marked with these will be added to the list with AND without the given character.

Separators

Currently the tokenizer uses following separators (more following):

  • punctuation: . , : ; ! ?
  • parentheses: () [] {} < >
  • operators: + / *
  • Quotation: ' "
Clone this wiki locally