Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dropbox rehaul #5

Merged
merged 14 commits into from
Dec 5, 2023
2 changes: 1 addition & 1 deletion confluence/.env-template
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ CONFLUENCE_SPACE_NAME=
CONFLUENCE_SEARCH_LIMIT=10

# Connector Authorization
CONFLUENCE_CONNECTOR_API_KEY=
CONFLUENCE_CONNECTOR_API_KEY=
6 changes: 6 additions & 0 deletions dropbox/.env-template
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
DROPBOX_ACCESS_TOKEN=
DROPBOX_APP_KEY=
DROPBOX_APP_SECRET=
DROPBOX_SEARCH_LIMIT=5
DROPBOX_PATH=
DROPBOX_CONNECTOR_API_KEY=
87 changes: 87 additions & 0 deletions dropbox/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# Dropbox Quick Start Connector

This package is a utility for connecting Cohere to Dropbox, featuring a simple local development setup.

## Limitations

The Dropbox connector currently searches for all active files within your Dropbox instance. Note that new files added will require a couple minutes of indexing time to be searchable. Dropbox usually takes less than 5 minutes.
walterbm-cohere marked this conversation as resolved.
Show resolved Hide resolved

## Configuration

To use the Dropbox connector, first create an app in the [Developer App Console](https://www.dropbox.com/developers/apps). Select Scoped Access, and give it the access type it needs. Note that `App folder` access will give your app access to a folder specifically created for your app, while `Full Dropbox` access will give your app access to all files and folders currently in your Dropbox instance.

Once you have created a Dropbox app, head over to the Permissions tab of your app and enable `files.metadata.read` and `files.content.read`. Then go to the Settings tab and retrieve your App key and App secret and place them into a `.env` file (see `.env-template` for reference):

```
DROPBOX_APP_KEY=xxxx
DROPBOX_APP_SECRET=xxxx
```

Optionally, you can configure the `DROPBOX_PATH` to modify the subdirectory to search in, or the `DROPBOX_SEARCH_LIMIT` to affect the max number of results returned.

## Authentication

#### Testing

To test the connection, you can generate a temporary access token from your App's settings page. Use this for the `DROPBOX_ACCESS_TOKEN` environ variable.

#### `DROPBOX_CONNECTOR_API_KEY`

The `DROPBOX_CONNECTOR_API_KEY` should contain an API key for the connector. This value must be present in the `Authorization` header for all requests to the connector.

#### OAuth

When using OAuth for authentication, the connector does not require any additional environment variables. Instead, the OAuth flow should occur outside of the Connector and Cohere's API will forward the user's access token to this connector through the `Authorization` header.

With OAuth the connector will be able to search any Dropbox folders and files that the user has access to.
tianjing-li marked this conversation as resolved.
Show resolved Hide resolved

To configure OAuth, follow the same steps in the Configuration section to create a Dropbox App. You will also need to register a redirect URI on that app to `https://api.cohere.com/v1/connectors/oauth/token`.

You can then register the connector with Cohere's API using the following configuration:
Note: Your App key and App secret values correspond to `client_id` and `client_secret` respectively.

```bash
curl -X POST \
'https://api.cohere.ai/v1/connectors' \
--header 'Accept: */*' \
--header 'Authorization: Bearer {COHERE-API-KEY}' \
--header 'Content-Type: application/json' \
--data-raw '{
"name": "Dropbox with OAuth",
"url": "{YOUR_CONNECTOR-URL}",
"oauth": {
"client_id": "{DROPBOX-OAUTH-CLIENT-ID}",
"client_secret": "{DROPBOX-OAUTH-CLIENT-SECRET}",
"authorize_url": "https://www.dropbox.com/oauth2/authorize",
"token_url": "https://www.dropbox.com/oauth2/token"
}
}'
```

## Development

Create a virtual environment and install dependencies with poetry. We recommend using in-project virtual environments:

```bash
poetry config virtualenvs.in-project true
poetry install --no-root
```

To run the Flask server in development mode, please run:

```bash
poetry run flask --app provider --debug run
```

The Flask API will be bound to :code:`localhost:5000`.

```bash
curl --request POST \
--url http://localhost:5000/search \
--header 'Content-Type: application/json' \
--data '{
"query": "charcoal"
}'
```

Alternatively, load up the Swagger UI and try out the API from a browser: http://localhost:5000/ui/
830 changes: 830 additions & 0 deletions dropbox/poetry.lock

Large diffs are not rendered by default.

33 changes: 33 additions & 0 deletions dropbox/provider/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
import logging
import os

import connexion # type: ignore
from dotenv import load_dotenv


load_dotenv()


API_VERSION = "api.yaml"


class UpstreamProviderError(Exception):
def __init__(self, message) -> None:
self.message = message

def __str__(self) -> str:
return self.message


def create_app() -> connexion.FlaskApp:
app = connexion.FlaskApp(__name__, specification_dir="../../.openapi")
app.add_api(
API_VERSION, resolver=connexion.resolver.RelativeResolver("provider.app")
)
logging.basicConfig(level=logging.INFO)
flask_app = app.app
config_prefix = os.path.split(os.getcwd())[
1
].upper() # Current directory name, upper-cased
flask_app.config.from_prefixed_env(config_prefix)
walterbm-cohere marked this conversation as resolved.
Show resolved Hide resolved
return flask_app
35 changes: 35 additions & 0 deletions dropbox/provider/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import logging

from connexion.exceptions import Unauthorized
from flask import abort, current_app as app, request

from . import UpstreamProviderError, provider

logger = logging.getLogger(__name__)

AUTHORIZATION_HEADER = "Authorization"
BEARER_PREFIX = "Bearer "


def get_oauth_token() -> str | None:
authorization_header = request.headers.get(AUTHORIZATION_HEADER, "")
if authorization_header.startswith(BEARER_PREFIX):
return authorization_header.removeprefix(BEARER_PREFIX)
return None


def search(body):
try:
data = provider.search(body["query"], get_oauth_token())
except UpstreamProviderError as error:
logger.error(f"Upstream search error: {error.message}")
abort(502, error.message)
return {"results": data}
walterbm-cohere marked this conversation as resolved.
Show resolved Hide resolved


def apikey_auth(token):
api_key = app.config.get("CONNECTOR_API_KEY", "")
if api_key != "" and token != api_key:
raise Unauthorized()
# successfully authenticated
return {}
57 changes: 57 additions & 0 deletions dropbox/provider/client.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
from dropbox import Dropbox
from dropbox.exceptions import AuthError
from dropbox.files import FileStatus, SearchOptions # type: ignore
from flask import current_app as app

from . import UpstreamProviderError


class DropboxClient:
def __init__(self, token, search_limit, path):
self.search_limit = search_limit
self.path = path
self.client = Dropbox(token)

# Test connection
try:
self.client.users_get_current_account()
except AuthError:
raise UpstreamProviderError(
"ERROR: Invalid access token; try re-generating an "
"access token from the app console on the web."
)

def search(self, query):
results = self.client.files_search_v2(
query,
SearchOptions(
file_status=FileStatus.active,
filename_only=False,
max_results=self.search_limit,
path=self.path,
),
include_highlights=False,
)

return results

def download_file(self, path):
metadata, file = self.client.files_download(path)

return metadata, file


def get_client(oauth_token=None):
search_limit = app.config.get("SEARCH_LIMIT", 5)
path = app.config.get("PATH", "")
env_token = app.config.get("ACCESS_TOKEN", "")
token = None

if env_token != "":
token = env_token
elif oauth_token is not None:
token = oauth_token
else:
raise AssertionError("No access token or Oauth credentials provided.")

return DropboxClient(token, search_limit, path)
28 changes: 28 additions & 0 deletions dropbox/provider/provider.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
from typing import Any

from .client import get_client


def search(query: str, oauth_token: str = None) -> list[dict[str, Any]]:
dbx_client = get_client(oauth_token)
dbx_results = dbx_client.search(query)

results = []
for dbx_result in dbx_results.matches:
if not (metadata := dbx_result.metadata.get_metadata()):
continue

if not getattr(metadata, "is_downloadable", False):
continue

metadata, f = dbx_client.download_file(metadata.path_display)

result = {
"type": "file",
"title": metadata.name,
"text": str(f.content),
}
# TODO: decode file contents
tianjing-li marked this conversation as resolved.
Show resolved Hide resolved
results.append(result)

return results
24 changes: 24 additions & 0 deletions dropbox/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
[tool.poetry]
name = "dropbox-connector"
version = "0.1.0"
description = "Search provider for connecting Cohere with Dropbox."
authors = ["Scott Mountenay <scott@lightsonsoftware.com>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.11"
flask = "2.2.5"
connexion = {extras = ["swagger-ui"], version = "^2.14.2"}
python-dotenv = "^1.0.0"
dropbox = "^11.36.2"
requests = "^2.31.0"
tianjing-li marked this conversation as resolved.
Show resolved Hide resolved
gunicorn = "^21.2.0"


[tool.poetry.group.development.dependencies]
black = "^23.7.0"
mypy = "^1.4.1"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"