Skip to content

Commit

Permalink
Python: Adding MongoDB Atlas Vector Search Connector (#2818)
Browse files Browse the repository at this point in the history
### Motivation and Context
Resolves: #2591 

<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->

### Description

<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->

* Added in the `MongoDBAtlasMemoryStore`. A memory store abstraction
that facilitates connections between a MongoDB Atlas cluster to conduct
[Atlas Vector
Search](https://www.mongodb.com/products/platform/atlas-vector-search)
on the Microsoft Semantic Kernel.

* Leverages our async python driver
[motor](https://motor.readthedocs.io/en/stable/) to keep with the
heuristic of asynchronous code

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄

### Callout
* When writing my test, I noticed there's a habit of other tests
conducting a `try` on imports. Grepping through those implementations, I
see some of them [encapsulate their respective libraries in function
calls rather than doing direct imports in the
module.](https://github.com/microsoft/semantic-kernel/blob/main/python/semantic_kernel/connectors/memory/chroma/chroma_memory_store.py#L55-L63)
Is this the general code pattern we should follow? Or is it fine to have
our imports at the top of the module fine?

---------

Co-authored-by: Steven Silvester <steven.silvester@ieee.org>
Co-authored-by: Abby Harrison <54643756+awharrison-28@users.noreply.github.com>
Co-authored-by: Abby Harrison <abby.harrison@microsoft.com>
  • Loading branch information
4 people authored Oct 2, 2023
1 parent 320e72b commit 8ce66fe
Show file tree
Hide file tree
Showing 12 changed files with 866 additions and 4 deletions.
2 changes: 2 additions & 0 deletions .github/workflows/python-integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ jobs:
Postgres__Connectionstr: ${{secrets.POSTGRES__CONNECTIONSTR}}
AZURE_COGNITIVE_SEARCH_ADMIN_KEY: ${{secrets.AZURE_COGNITIVE_SEARCH_ADMIN_KEY}}
AZURE_COGNITIVE_SEARCH_ENDPOINT: ${{secrets.AZURE_COGNITIVE_SEARCH_ENDPOINT}}
MONGODB_ATLAS_CONNECTION_STRING: ${{secrets.MONGODB_ATLAS_CONNECTION_STRING}}
run: |
if ${{ matrix.os == 'ubuntu-latest' }}; then
docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest
Expand Down Expand Up @@ -152,6 +153,7 @@ jobs:
Postgres__Connectionstr: ${{secrets.POSTGRES__CONNECTIONSTR}}
AZURE_COGNITIVE_SEARCH_ADMIN_KEY: ${{secrets.AZURE_COGNITIVE_SEARCH_ADMIN_KEY}}
AZURE_COGNITIVE_SEARCH_ENDPOINT: ${{secrets.AZURE_COGNITIVE_SEARCH_ENDPOINT}}
MONGODB_ATLAS_CONNECTION_STRING: ${{secrets.MONGODB_ATLAS_CONNECTION_STRING}}
run: |
if ${{ matrix.os == 'ubuntu-latest' }}; then
docker run -d --name redis-stack-server -p 6379:6379 redis/redis-stack-server:latest
Expand Down
1 change: 1 addition & 0 deletions python/.env.example
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ AZURE_OPENAI_ENDPOINT=""
AZURE_OPENAI_API_KEY=""
AZURE_COGNITIVE_SEARCH_ENDPOINT=""
AZURE_COGNITIVE_SEARCH_ADMIN_KEY=""
MONGODB_ATLAS_CONNECTION_STRING=""
PINECONE_API_KEY=""
PINECONE_ENVIRONMENT=""
POSTGRES_CONNECTION_STRING=""
Expand Down
127 changes: 126 additions & 1 deletion python/poetry.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions python/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ regex = "^2023.6.3"
openapi_core = "^0.18.0"
prance = "^23.6.21.0"
pydantic = "<2"
motor = "^3.3.1"

[tool.poetry.group.dev.dependencies]
pre-commit = "3.3.3"
Expand Down
2 changes: 2 additions & 0 deletions python/semantic_kernel/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
azure_openai_settings_from_dot_env,
bing_search_settings_from_dot_env,
google_palm_settings_from_dot_env,
mongodb_atlas_settings_from_dot_env,
openai_settings_from_dot_env,
pinecone_settings_from_dot_env,
postgres_settings_from_dot_env,
Expand All @@ -32,6 +33,7 @@
"postgres_settings_from_dot_env",
"pinecone_settings_from_dot_env",
"bing_search_settings_from_dot_env",
"mongodb_atlas_settings_from_dot_env",
"google_palm_settings_from_dot_env",
"redis_settings_from_dot_env",
"PromptTemplateConfig",
Expand Down
52 changes: 52 additions & 0 deletions python/semantic_kernel/connectors/memory/mongodb_atlas/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# microsoft.semantic_kernel.connectors.memory.mongodb_atlas

This connector uses [MongoDB Atlas Vector Search](https://www.mongodb.com/products/platform/atlas-vector-search) to implement Semantic Memory.

## Quick Start

1. Create [Atlas cluster](https://www.mongodb.com/docs/atlas/getting-started/)

2. Create a collection

3. Create [Vector Search Index](https://www.mongodb.com/docs/atlas/atlas-search/field-types/knn-vector/) for the collection.
The index has to be defined on a field called ```embedding```. For example:
```
{
"mappings": {
"dynamic": true,
"fields": {
"embedding": {
"dimension": 1024,
"similarity": "cosine",
"type": "knnVector"
}
}
}
}
```

4. Create the MongoDB memory store
```python
import semantic_kernel as sk
import semantic_kernel.connectors.ai.open_ai
from semantic_kernel.connectors.memory.mongodb_atlas import (
MongoDBAtlasMemoryStore
)

kernel = sk.Kernel()

...

kernel.register_memory_store(memory_store=MongoDBAtlasMemoryStore(
# connection_string = if not provided pull from .env
))
...

```

## Important Notes

### Vector search indexes
In this version, vector search index management is outside of ```MongoDBAtlasMemoryStore``` scope.
Creation and maintenance of the indexes have to be done by the user. Please note that deleting a collection
(```memory_store.delete_collection_async```) will delete the index as well.
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
from semantic_kernel.connectors.memory.mongodb_atlas.mongodb_atlas_memory_store import (
MongoDBAtlasMemoryStore,
)

__all__ = ["MongoDBAtlasMemoryStore"]
Loading

0 comments on commit 8ce66fe

Please sign in to comment.