Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

N-gram search #82

Open
10 tasks
peterhil opened this issue Jul 15, 2021 · 0 comments
Open
10 tasks

N-gram search #82

peterhil opened this issue Jul 15, 2021 · 0 comments
Assignees
Labels
prio: incremental Low effort, low impact
Milestone

Comments

@peterhil
Copy link
Owner

peterhil commented Jul 15, 2021

Index the categories and their IDs using ngrams.
Use this to make an autocomplete for category search.

N-gram search:

Create a separate NPM package, existing ones do not really do this exactly.

  1. Tokenize the input: Splitting on Unicode whitespace (one or more characters, keep punctuation)
  2. Optionally: normalize character variations (åäáà -> a, is it possible with asian characters?)
  3. Build index: mapping ngrams of varying kength (min: 1, max: ?) to IDs (or array index)
  4. Tokenize query: Build ngrams out of search query
  5. Search: Get the ngrams from the index, and check that they are continuous
  6. Fuzzy matching: Allow fuzzy matching for incomplete matches based on a threshold between 0 and 1.0

Integration:

  • Get all categories in background (Issue #49)
  • Update index (on bookmark events or flush and rebuild index)
  • Prevent index updates on bookmark import and rebuild index afterwards
  • Replace categorySearch with ngram search on category index
@peterhil peterhil self-assigned this Jul 15, 2021
@peterhil peterhil added the prio: incremental Low effort, low impact label Jul 15, 2021
@peterhil peterhil added this to the 0.6.0 milestone Jul 15, 2021
@peterhil peterhil modified the milestones: 0.6.0, 0.7.0 Jul 17, 2023
@peterhil peterhil moved this to To do in Robustness Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio: incremental Low effort, low impact
Projects
Status: To do
Development

No branches or pull requests

1 participant