Skip to content

INK-USC/LEAN-LIFE

Repository files navigation

LEAN-LIFE

Label Efficient AnnotatioN framework that allows for LearnIng From Explanations

Website      Paper     


Latest release (v1.1) Highlights

  • Added model training support for Trigger Explanation for Named Entity Recognition task.
  • Added support for uploading labeled dataset.
  • Added new UI for annotation.
  • FastAPI added for asynchronous model training.
  • Added model training support for Natural Language Explanations for Relation Extraction and Sentiment Analysis tasks.

Contents:

Quick Intro:

LEAN-LIFE is an annotation framework for Named Entity Recognition (NER), Relation Extraction (RE) and Sentiment Analysis (SA)/Multi-Class Document Classification. LEAN-LIFE additionally enables the capture and use of explanations during the annotation process. Explanations can be seen as enhanced supervision, providing reasoning behind the labeling decision, and thus help speed up model training in low-resource environments.

Our initial frontend code is based on the Doccano project and the AlpacaTag project however we differentiate ourselves in these ways:

  • Triplet Capture: Allows the building of a dataset that is (datapoint, label, explanation), unlike the standard (datapoint, label) tuple.

  • Explanation Supported Model Training: Taining of both TriggerNER and NExT models, for both model deployment and recommendations (coming soon)

  • Relation Extraction Task Supported: Using the output of the Named Entity Extraction task, our system allows for the creation of relation extraction datasets.

Due to refactoring efforts and a desire to create a more stable framework the following features are not supported yet, but will be supported soon:

  • Active intelligent recommendation: Iteratively train an appropriate backend model for the selected task using explanations and labels to both provide enhanced annotations to annotators and ensure annotators are not asked to provide annotations on documents that the model already understands.

  • Real-time model extraction: Users can extract the trained recommendation model at any point for deployment purposes, without having to wait till all documents are labeled.

  • Model Interaction API: Seperate API for model training (batch), prediction(online and batch), and extraction--this functionality will be built seperately from our annotation framework

  • User Roles: Differentiating between a project creator and project annotators, allowing for a creator to set up a project, while allowing annotators to configure more local settings like what types of recommendations they would like, and how often their backend model should be trained.

Reference our website for more information.

For information on how to use the annotation framework and supported data formats, please look at our wiki

We strongly encourage community engagement, please refer to our contribution section for more on how to contribute!

Release Plan:

Next Release's Goals: 4-6 Weeks

  • Docker: Add docker files for easier onboarding/setup
  • Code cleanup - remove unwanted files of django templates and add more inline documentation/explanations
  • Improve CUDA management policy. Add support for per request modification of gpu device. Currently, set using environment variable.

Release notes can found here.

Getting Started:

  • Install Python 3.6.5. For detailed environment setup follow Setup Environment step.
  • Follow FastAPI/Model training installation instructions here.
  • If you're a user who wants to use the annotation UI please follow Setup Frontend step. If you are a user who justs want interact with the model training API, you can skip this step and follow the detailed guide mentioned in the next step.
  • For detailed guide on how to use API calls. Checkout the jupyter notebooks in example_notebooks directory
Setup Environment [Optional]

Note: All paths are relative to being just outside the `LEAN-LIFE` directory. Please adjust paths accordingly.

  1. Please install Python 3.6.5 (if you use conda you can ignore this step)

  2. Open a new terminal window after installing the above

  3. Clone this repo: git clone git@github.com:INK-USC/LEAN-LIFE.git

  4. Create a virtual environment using:

    • annaconda: conda create -n leanlife python=3.6 (annaconda doesn't have a stable 3.6.5 version, so we use 3.6)
    • virtualenv:
      1. python3.6.5 -m pip install virtualenv
      2. python3.6.5 -m venv leanlife
  5. Activate your environment:

    • annaconda: conda activate leanlife
    • virtualenv: source leanlife/bin/activate


Setup Frontend [Optional]

If you are user who wants to interact with the system using web portal, then please setup Django (for annotation backend) and Vue.js (for frontend). Otherwise, you can skip this step.

  • Follow Django annotation backend installation instructions here
  • Follow Vue.js frontend installation instructions here


Potential Errors

  • Wrong version of python is being used.
    • To check: if you're getting installation errors, it could be that your machine is running the wrong version of python and/or installed packages. To check run which python and make sure the returned folder is the path to the leanlife virtual environment folder. To check that python is looking in the right places check this example here. Again the path should be the site-packages folder in your leanlife virtual environment
    • To Fix: Re-create virtual environment: - deactivate leanlife - rm -rf leanlife - make sure no other virtualenvs are running - open up terminal/command prompt and see if there are paranthesis at the start of each line, ex: (base) user@... - if this is the case deactivate that environment: deactivate environment-name, in the above example it would be deactivate base - Go to step 4 of installation instructions


Directory overview

  • annotation_backend/
    • src/ - django applications root directory
    • ui_sample_data/ - sample datasets for testing more description here
  • example_notebooks -
    • Contains interactive jupyter notebooks for each tasks with sample data, processing and explanation to various input parameters.
    • execute jupyter notebook in terminal and access them
  • frontend/ - Vue.js frontend project directory
  • model_api/ - consists of code related to model training and fast api routes. More details can be found here


Contributing

We love contributions, so thank you for taking the time! Pushing changes to master is blocked, so please create a branch and make your edits on the branch. Once done, please create a Pull Request and ask a contributer from the INK-LAB to pull your changes in. You can refer to our PR guidelines and general contribution guidelines here.

Misc.

Feedback

Feedback is definitely encouraged, please feel free to create an issue and document what you're seeing/wanting to see.

Mailing List

To get notifications of major updates to this project, you can join our mailing list here

Twitter

For updates on this project and other nlp projects being done at USC, please follow @nlp_usc

Contributors

Rahul Khanna, JiaMin (Jim) Gong, Dongho Lee, Jamin Chen, Seyeon Lee, Akshen Kadakia, Raghavendra Vedula

Citation

@inproceedings{
    LEANLIFE2020,
    title={LEAN-LIFE: A Label-Efficient Annotation Framework Towards Learning from Explanation},
    author={Lee, Dong-Ho and Khanna, Rahul and Lin, Bill Yuchen and Chen, Jamin and Lee, Seyeon and Ye, Qinyuan and Boschee, Elizabeth and Neves, Leonardo and Ren, Xiang},
    booktitle={Proc. of ACL (Demo)},
    year={2020},
    url={}
}