Tokenizing strings on digit/word boundaries #789

polyfloyd · 2024-11-18T15:05:44Z

polyfloyd
Nov 18, 2024

Hi!

I am working on integrating Meilisearch in our product and have come to the conclusion that our users perform a lot of queries for numeric terms that are not surrounded by separator tokens, but whole words.

Example:

186941 should find 123XYZ186941
2110063 should find p2110063

From my understanding of Meilisearch internals, these queries do not return these results because the search term does not occur at the start of the tokens to be matched.

The solution I would propose is to have digit/word/? boundaries be counted as token separators. So e.g. 123XYZ186941 would be split into 123, XYZ, 186941. The last token in this series would match the search query.

We are currently working around this limitation by inserting known separators in strings before sending them off to Meili for indexing, but this has as disadvantage that the returned highlighting information no longer matches the original text.

macraig · 2024-12-18T15:35:38Z

macraig
Dec 18, 2024
Maintainer

Hi @polyfloyd, we have an experimental CONTAINS filter operator that would return the results you mention.

Would that solve your use case or do you need Meilisearch to somehow separate letters from numbers without a separator?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Meilisearch

Tokenizing strings on digit/word boundaries #789

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Meilisearch

Tokenizing strings on digit/word boundaries #789

polyfloyd Nov 18, 2024

Replies: 1 comment

macraig Dec 18, 2024 Maintainer

polyfloyd
Nov 18, 2024

macraig
Dec 18, 2024
Maintainer