TAIL

📄 Documentation

See our full documentation at https://yale-nlp.github.io/TAIL/.

💡 Introduction

TAIL is an automatic toolkit for creating realistic evaluation benchmarks and assessing the performance of long-context LLMs. With TAIL, users can customize the building of a long-context, document-grounded QA benchmark and obtain visualized performance metrics of evaluated models.

🚀 Quickstart

install the package from PyPi:

# (Recommended) Create a new conda environment.
conda create -n tail python=3.10 -y
conda activate tail

# Install tailtest
pip install tailtest

set yout OPENAI_API_KEY:

export OPENAI_API_KEY="..."

Prepare a source document you want to use to generate benchmark and organize in the format of json. [{"text": "Content of your document"}]

Benchmark Generation:

tail-cli.build --raw_document_path "/data/raw.json" --QA_save_path "/data/QA.json" --document_length 8000 32000 64000 --depth_list 25 50 75

Model Evaluation & Testing:

tail-cli.eval --QA_save_path "/data/QA.json" --test_model_name "gpt-4o" --test_depth_list 25 75 --test_doc_length 8000 32000 --test_result_save_dir "/data/result/"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PYPI_README.md

PYPI_README.md

TAIL

📄 Documentation

💡 Introduction

🚀 Quickstart

Files

PYPI_README.md

Latest commit

History

PYPI_README.md

File metadata and controls

TAIL

📄 Documentation

💡 Introduction

🚀 Quickstart