Weird korean tokenizer behavior #623
Locked
intoxicated
started this conversation in
Feedback & Feature Proposal
Replies: 1 comment
-
Hello @intoxicated, Thank you for the report, about this highlighting issue, could you retry with version v1.0.2? This should be fixed. If it's not the case, don't hesitate to open a bug report issue directly on Meilisearch pinging me. Regarding Korean support, we recently changed how we segment words in this PR: meilisearch/charabia#154. I lock this conversation in favor of the dedicated discussion below: Thanks again! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
is this normal behavior for korean tokenizer?
감 shows two results and 감독 shows one result where previous two records contain 감독, (highlighting is also inaccurate)
Beta Was this translation helpful? Give feedback.
All reactions