Label Efficient AnnotatioN framework that allows for LearnIng From Explanations
- Added model training support for Trigger Explanation for Named Entity Recognition task.
- Added support for uploading labeled dataset.
- Added new UI for annotation.
- FastAPI added for asynchronous model training.
- Added model training support for Natural Language Explanations for Relation Extraction and Sentiment Analysis tasks.
LEAN-LIFE is an annotation framework for Named Entity Recognition (NER), Relation Extraction (RE) and Sentiment Analysis (SA)/Multi-Class Document Classification. LEAN-LIFE additionally enables the capture and use of explanations during the annotation process. Explanations can be seen as enhanced supervision, providing reasoning behind the labeling decision, and thus help speed up model training in low-resource environments.
Our initial frontend code is based on the Doccano project and the AlpacaTag project however we differentiate ourselves in these ways:
-
Triplet Capture: Allows the building of a dataset that is (datapoint, label, explanation), unlike the standard (datapoint, label) tuple.
-
Explanation Supported Model Training: Taining of both TriggerNER and NExT models, for both model deployment and recommendations (coming soon)
-
Relation Extraction Task Supported: Using the output of the Named Entity Extraction task, our system allows for the creation of relation extraction datasets.
Due to refactoring efforts and a desire to create a more stable framework the following features are not supported yet, but will be supported soon:
-
Active intelligent recommendation: Iteratively train an appropriate backend model for the selected task using explanations and labels to both provide enhanced annotations to annotators and ensure annotators are not asked to provide annotations on documents that the model already understands.
-
Real-time model extraction: Users can extract the trained recommendation model at any point for deployment purposes, without having to wait till all documents are labeled.
-
Model Interaction API: Seperate API for model training (batch), prediction(online and batch), and extraction--this functionality will be built seperately from our annotation framework
-
User Roles: Differentiating between a project creator and project annotators, allowing for a creator to set up a project, while allowing annotators to configure more local settings like what types of recommendations they would like, and how often their backend model should be trained.
Reference our website for more information.
For information on how to use the annotation framework and supported data formats, please look at our wiki
We strongly encourage community engagement, please refer to our contribution section for more on how to contribute!
Next Release's Goals: 4-6 Weeks
- Docker: Add docker files for easier onboarding/setup
- Code cleanup - remove unwanted files of django templates and add more inline documentation/explanations
- Improve CUDA management policy. Add support for per request modification of gpu device. Currently, set using environment variable.
Release notes can found here.
- Install Python 3.6.5. For detailed environment setup follow
Setup Environment
step. - Follow FastAPI/Model training installation instructions here.
- If you're a user who wants to use the annotation UI please follow
Setup Frontend
step. If you are a user who justs want interact with the model training API, you can skip this step and follow the detailed guide mentioned in the next step. - For detailed guide on how to use API calls. Checkout the jupyter notebooks in example_notebooks directory
Setup Environment [Optional]
Note: All paths are relative to being just outside the `LEAN-LIFE` directory. Please adjust paths accordingly.
-
Please install Python 3.6.5 (if you use
conda
you can ignore this step) -
Open a new terminal window after installing the above
-
Clone this repo:
git clone git@github.com:INK-USC/LEAN-LIFE.git
-
Create a virtual environment using:
- annaconda:
conda create -n leanlife python=3.6
(annaconda doesn't have a stable 3.6.5 version, so we use 3.6) - virtualenv:
python3.6.5 -m pip install virtualenv
python3.6.5 -m venv leanlife
- annaconda:
-
Activate your environment:
- annaconda:
conda activate leanlife
- virtualenv:
source leanlife/bin/activate
- annaconda:
Setup Frontend [Optional]
If you are user who wants to interact with the system using web portal, then please setup Django (for annotation backend) and Vue.js (for frontend). Otherwise, you can skip this step.
Potential Errors
- Wrong version of python is being used.
- To check: if you're getting installation errors, it could be that your machine is running the wrong version of python and/or installed packages. To check run
which python
and make sure the returned folder is the path to theleanlife
virtual environment folder. To check that python is looking in the right places check this example here. Again the path should be the site-packages folder in yourleanlife
virtual environment - To Fix: Re-create virtual environment: -
deactivate leanlife
-rm -rf leanlife
- make sure no other virtualenvs are running - open up terminal/command prompt and see if there are paranthesis at the start of each line, ex:(base) user@...
- if this is the case deactivate that environment:deactivate environment-name
, in the above example it would bedeactivate base
- Go to step 4 of installation instructions
- To check: if you're getting installation errors, it could be that your machine is running the wrong version of python and/or installed packages. To check run
Directory overview
annotation_backend/
src/
- django applications root directoryui_sample_data/
- sample datasets for testing more description here
example_notebooks
-- Contains interactive jupyter notebooks for each tasks with sample data, processing and explanation to various input parameters.
- execute
jupyter notebook
in terminal and access them
frontend/
- Vue.js frontend project directorymodel_api/
- consists of code related to model training and fast api routes. More details can be found here
We love contributions, so thank you for taking the time! Pushing changes to master is blocked, so please create a branch and make your edits on the branch. Once done, please create a Pull Request and ask a contributer from the INK-LAB to pull your changes in. You can refer to our PR guidelines and general contribution guidelines here.
Feedback is definitely encouraged, please feel free to create an issue and document what you're seeing/wanting to see.
To get notifications of major updates to this project, you can join our mailing list here
For updates on this project and other nlp projects being done at USC, please follow @nlp_usc
Rahul Khanna, JiaMin (Jim) Gong, Dongho Lee, Jamin Chen, Seyeon Lee, Akshen Kadakia, Raghavendra Vedula
@inproceedings{
LEANLIFE2020,
title={LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation},
author={Lee, Dong-Ho and Khanna, Rahul and Lin, Bill Yuchen and Chen, Jamin and Lee, Seyeon and Ye, Qinyuan and Boschee, Elizabeth and Neves, Leonardo and Ren, Xiang},
booktitle={Proc. of ACL (Demo)},
year={2020},
url={}
}