You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, a very nice work!
I am using paradedb atm for bm25 pg search and I googled out this repo when checking whether there is an alternative implementation.
You say that creating bm25 index from table / column is costly.
Would it be possible to add incremental index updates, like search engines usually do?
The text was updated successfully, but these errors were encountered:
Updating the index is challenging, because all the words in the vocabulary (all unique tokens/words in all documents) have a parameter called inverse document frequency (idf) which must be updated even if just 1 document is added (or removed), so almost everything must be recalculated at INSERT or DELETE. However we could spare some time by avoiding re-tokenizing the unchanged documents, so this can be a good optimization.
UPDATE - when no document is added or removed, just existing documents are being updated - should recalculate idf and wsmap scores for all the words in the old document contents and the new contents.
I'll think about this and try to implement something when I'll have some time, but probably first in BM25opt, then porting here.
Hello, a very nice work!
I am using paradedb atm for bm25 pg search and I googled out this repo when checking whether there is an alternative implementation.
You say that creating bm25 index from table / column is costly.
Would it be possible to add incremental index updates, like search engines usually do?
The text was updated successfully, but these errors were encountered: