An educational project of a web service for uploading datasets to S3-compatible storage and retrieving information about them.
- Python 3.9
- FastAPI
- Uvicorn
- PostgreSQL
- Minio an S3-compatible storage
- Containers
├── app # Application logic
│ ├── handlers # Request handlers (similar to controllers or views in other frameworks)
│ ├── models # SQLAlchemy table definitions and additional data types like enums
│ ├── repositories # Modules containing SQLAlchemy query expressions → DB access interface
│ ├── schemas # Pydantic schemas validating input and output data
│ ├── settings # Application settings
│ ├── utils # Helpers that doesn't contain any business logic and can be extracted
│ ├── application.py # FastAPI entry point and it's configuration
├── main.py # Webservice entry point with additional settings
├── migrations # Alembic migrations
├── pip.conf # pip config for working with private package registry like Artifactory
├── setup.cfg # Python environment configs like linters, mypy rules, pytest etc.
├── tests # Test suite running via Pytest
make up
cp .env.example .env
make venv
# Check PostgreSQL logs to make sure
# LOG: database system is ready to accept connections
make migrate
make serve
Open Swagger UI in your favourite browser.
-
Download a few big datasets. You can find some at Kaggle. I've tried Nearby Social Network - All Posts and allposts.csv (~47 GB).
-
Generate MD5 hash for a file you want to upload:
md5 allposts.csv
MD5 (allposts.csv) = 148a68b39a273bfda5ece7d868c9c1c8
- Make a request:
curl -X 'PUT' \
'http://localhost:8000/datasets' \
-H 'accept: application/json' \
-H 'content-md5: 148a68b39a273bfda5ece7d868c9c1c8' \
-H 'Content-Type: multipart/form-data' \
-F 'dataset_name=Nearby Social Network - All Posts' \
-F 'dataset_file=@allposts.csv;type=text/csv'
Try another one to simulate simultaneous uploads. See how some of your CPU cores start to load
(thanks to ThreadPool in Minio SDK and GIL limitations). Memory consumption doesn't grow fast,
but a bandwidth of disk reads and writes grows, sometimes twice as much as an uploaded file because
of SpooledTemporaryFile
.
No doubt this kind of naive testing does not show how the implemented web service will work in the production environment. But at least it shows how you can upload large files to any S3-compatible storage in your FastAPI application.
© Andrey Krisanov, 2021