Tokenizer
is provided as a .gem
package. Simply install it via
RubyGems.
To install tokenizer
issue the following command:
$ gem install tokenizer
If you want to do a system wide installation, do this as root
(possibly using sudo
).
Alternatively use your Gemfile for dependency management.
You can use +Tokenizer+ in two ways.
- As a command line tool:
$ echo 'Hi, ich gehe in die Schule!. | tokenize
- As a library for embedded tokenization:
> require 'tokenizer'
> de_tokenizer = Tokenizer::WhitespaceTokenizer.new
> de_tokenizer.tokenize('Ich gehe in die Schule!')
> => ["Ich", "gehe", "in", "die", "Schule", "!"]
- Customizable
PRE
andPOST
list
> require 'tokenizer'
> de_tokenizer = Tokenizer::WhitespaceTokenizer.new(:de, { post: Tokenizer::Tokenizer::POST + ['|'] })
> de_tokenizer.tokenize('Ich gehe|in die Schule!')
> => ["Ich", "gehe", "|in", "die", "Schule", "!"]
See documentation in the Tokenizer::WhitespaceTokenizer
class for details
on particular methods.