Tokenizer

Functionality

The tokenizer is used to create a list of single words/tokens from a given string. To accomplish this the tokenizer separates the string by white space. Then he deletes unwanted characters like parentheses( For a total List see below). The # and @ characters will be handled in another way. Words marked with these will be added to the list with AND without the given character.

Separators

Currently the tokenizer uses following separators (more following):

punctuation: . , : ; ! ?
parentheses: () [] {} < >
operators: + / *
Quotation: ' "

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tokenizer

Functionality

Separators

Clone this wiki locally