Project for a receipt analysis of a dataset from kaggle.
Full report (CS only) is here
- This project has not been optimized to run on a PC with any RAM size, 32Gb is thus recommended to analyze 100k of records.
- For more records, optimization or bigger RAM is necessary.
- Python v3.11.x (used through pyenv)
- Clone the repository
- Register to Kaggle
- Download dataset eCommerce purchase history from electronics store
- Place the dataset (kz.csv) into /data directory
- Install poetry
- Install depencences
cd /path/to/ReceiptAnalysis && poetry install
- Run preprocessing
poetry run preprocessing
- Run clustering
poetry run clustering
- Generate associative rules
poetry run associative_rules
- Saved numpy matrices are in /data
- Saved matplotlib figures are in /images
- Saved text outputs of the scripts are in /outputs
- Saved clusters and cluster rules are in /rules