Document Relevancy Ranking and Similarity Scoring using Vector Space Model.
Supporting all modes described here.
To install directly from github, run:
pip install git+ssh://git@github.com/mauricesvp/vespa.git
# or
pip install git+https://git@github.com/mauricesvp/vespa.git
To install from source:
git clone git@github.com:mauricesvp/vespa.git
# or
git clone https://github.com/mauricesvp/vespa.git
cd vespa
pip install .
from vespa import Vespa
corpus = ["Example document."] # corpus: list of documents (strings)
vsm = Vespa(corpus)
results = vsm.score("Example query")
# > (0.7071067811865475, 'Example document.')
results = vsm.k_score("Example query", k=1)
# > [(0.7071067811865475, 'Example document.')]
The default mode is lnc.ltc
, which means lnc
is applied to each corpus document, and ltc
to each query document.
You can either supply a different mode when initializing, or to k_score
or score
directly (this will change the mode for subsequent calls).
If you want to get the score of a specific document, you can use the additional document
argument for score
:
results = vsm.score(query="Your query", document="Some document in corpus")
Documents can be added to the corpus:
vsm.add("some new document") # str or list of str
or the corpus can be rebuilt, removing all previous entries:
vsm.corpus(new_corpus) # str or list of str
All available modes are noted below (more details).
Vespa does not feature:
- Lemmatization and Stemming
- Stopword filtering
- Spelling correction
- Any kind of machine learning
For further reading, please reference: