This project provides tools and scripts for generating JSONL files for fine-tuning GPT models. The goal is to create high-quality datasets that can be used to improve model performance on specific tasks.
-
Clone the repository:
git clone https://github.com/awaisakram64/finetune-data-generator.git cd finetune-data-generator
-
Install dependencies:
pip install -r requirements.txt
-
Preprocess raw data:
python finetune/data_preprocessing.py --input data/raw --output data/processed
-
Generate JSONL files:
python finetune/data_generation.py --input data/processed --output data/generated
Please read CONTRIBUTING.md
for details on our code of conduct and the process for submitting pull requests.
This project is licensed under the MIT License - see the LICENSE
file for details.