From f40f123a17451552275502dc65d3ef2ec5ce12bc Mon Sep 17 00:00:00 2001 From: KG <41345727+kg583@users.noreply.github.com> Date: Fri, 29 Dec 2023 11:52:23 -0600 Subject: [PATCH] Update tokenization section --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 131ae0a..e17e4a0 100644 --- a/README.md +++ b/README.md @@ -240,7 +240,7 @@ img.show() ### Tokenization -Functions to decode and encode strings into tokens can be found in `tivars.tokenizer`. Support currently exists for all models in the 82/83/84 series as well as the TI-73; PR's concerning the sheets themselves should be directed upstream to [TI-Toolkit/tokens](https://github.com/TI-Toolkit/tokens). +Functions to decode and encode strings into tokens can be found in `tivars.tokenizer`. These functions utilize the [TI-Toolkit token sheets](https://github.com/TI-Toolkit/tokens), which are kept as a submodule in `tivars.tokens`. Support currently exists for all models in the 82/83/84 series; PR's concerning the sheets themselves should be directed upstream. > [!IMPORTANT] > In contrast to some other tokenizers like SourceCoder, tokenization does _not_ depend on whether the content appears inside a BASIC string literal. Text is always assigned to the _longest_ permissible token.