This repository was created as part of the Data-zoomcamp ML engineering course by Andrew Tsai. This project has been submitted as the final capstone for the course.
In today's complex and dynamic legal landscape, the integration of Artificial Intelligence (AI) into legal assistance services has become increasingly imperative. The multifaceted nature of legal matters often poses challenges for individuals seeking clarity on applicable laws and potential sentencing outcomes. The advent of AI legal assistance, capable of predicting both applicable laws and potential imprisonmen, could addresses these challenges with unparalleled efficiency and accuracy.
Starting from a Taiwanese legal judgement dataset that targets specifically drug-related crimes, which I crawled from the open sourced website and put on hugginface, the goal is to have a model infer a judgment and:
- classify the legal articles that are violated by the defendant
- predict the length of the imprisonment
Example screenshot of the Gradio interface: The predict function is deployed on Hugginface Space with Gradio. This end point will remain available until the end of the evaluation period.
git clone https://github.com/AndrewTsai0406/AI_Judge.git
I advise using a virtual environment for running this project, below are instructions for doing so using Conda which helps one manage multiple envirnoments.
# create virtual environment
conda create -n project-legal python=3.10
# start the virtual environment
conda activate project-legal
# install requirements
pip install -r requirements.txt
The dataset can be downloaded here on the Hugginface. One specific dataset is present for training: finalized data, which should be put under a './data' directory, is used for training two classifier for prediction.
We are in a context of multi-label classification problem, with 25 predicted classes being the legal articles violated by the defendant and a multi-class classification problem for the prediction of the length of the imprisonment. All features are categorical.
Five models have been tested with a tuning of their hyperparameters using the Lighting Flash library.
All these steps are described in much details in the train_flash.ipynb
.
To run the training script and save the mdoels, use the one script inside train.py
with the command:
python train.py
The final models will be saved in the ./models
directory.
-
Exploratory Data Analysis
The characteristics of the judgement that are used to predict the judgement are explored in the exploratory data analysis (EDA) part of the notebook. I ran one notebook to do the analysis. A copy of it is in the repository. -
Training The logic for training the model is exported to a separate script in train.py, which runs the training for the final models.
-
Deployment
Predict: Predictions can be ran with a Gunicorn local service (the predict function can be found within app.py).To run, simply type the following command on the terminal
gunicorn app:app -b 0.0.0.0:8000 -k uvicorn.workers.UvicornWorker --timeout 300
Then, navigate to your browswer and put
http://0.0.0.0:8000/gradio/