Skip to content

Commit

Permalink
Automated leaderboard update
Browse files Browse the repository at this point in the history
  • Loading branch information
actions-user committed Oct 1, 2023
1 parent 0ac9b14 commit 3825c08
Show file tree
Hide file tree
Showing 2 changed files with 2 additions and 5 deletions.
3 changes: 1 addition & 2 deletions docs/alpaca_eval_gpt4_leaderboard.csv
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ OpenBudddy-LLaMA2-70B-v10.1,87.67123287671232,1077.0,https://huggingface.co/Open
OpenChat V2-W 13B,87.1268656716418,1566.0,https://github.com/imoneoi/openchat,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/openchat-v2-w-13b/model_outputs.json,community
OpenBuddy-LLaMA-65B-v8,86.53366583541147,1162.0,https://huggingface.co/OpenBuddy/openbuddy-llama-65b-v8-bf16,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/openbuddy-llama-65b-v8/model_outputs.json,community
WizardLM 13B V1.1,86.31840796019901,1525.0,https://huggingface.co/WizardLM/WizardLM-13B-V1.1,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/wizardlm-13b-v1.1/model_outputs.json,community
Cohere Command,85.0560398505604,1715.0,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/cohere/model_outputs.json,community
OpenChat V2 13B,84.96894409937889,1564.0,https://github.com/imoneoi/openchat,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/openchat-v2-13b/model_outputs.json,community
Humpback LLaMa 65B,83.70646766169155,1269.0,https://arxiv.org/abs/2308.06259,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/humpback-llama-65b/model_outputs.json,community
UltraLM 13B V2.0,83.60248447204968,1399.0,https://github.com/thunlp/UltraChat,,community
Expand Down Expand Up @@ -55,8 +56,6 @@ Falcon 40B Instruct,45.71428571428572,662.0,https://huggingface.co/tiiuae/falcon
Alpaca Farm PPO Sim (GPT-4) 7B,44.099378881987576,511.0,https://huggingface.co/tatsu-lab/alpaca-farm-ppo-sim-gpt4-20k-wdiff,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/alpaca-farm-ppo-sim-gpt4-20k/model_outputs.json,verified
Pythia 12B SFT,41.86335403726708,913.0,https://huggingface.co/OpenAssistant/pythia-12b-sft-v8-7k-steps,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/pythia-12b-mix-sft/model_outputs.json,verified
Alpaca Farm PPO Human 7B,41.24223602484472,803.0,https://huggingface.co/tatsu-lab/alpaca-farm-ppo-human-wdiff,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/alpaca-farm-ppo-human/model_outputs.json,minimal
Cohere Chat,29.565217391304348,779.0,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/cohere-chat/model_outputs.json,community
Cohere,28.385093167701864,682.0,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/cohere/model_outputs.json,community
Alpaca 7B,26.459627329192543,396.0,https://huggingface.co/tatsu-lab/alpaca-7b-wdiff,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/alpaca-7b/model_outputs.json,minimal
Pythia 12B OASST SFT,25.962732919254663,726.0,https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/oasst-sft-pythia-12b/model_outputs.json,verified
Falcon 7B Instruct,23.60248447204969,478.0,https://huggingface.co/tiiuae/falcon-7b-instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/falcon-7b-instruct/model_outputs.json,verified
Expand Down
4 changes: 1 addition & 3 deletions docs/claude_leaderboard.csv
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ Guanaco 65B,62.60869565217392,1249,https://huggingface.co/timdettmers/guanaco-65
Vicuna 7B v1.3,62.54658385093168,1110,https://huggingface.co/lmsys/vicuna-7b-v1.3,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/vicuna-7b-v1.3/model_outputs.json,verified
Nous Hermes 13B,60.86956521739131,844,https://huggingface.co/NousResearch/Nous-Hermes-13b,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/nous-hermes-13b/model_outputs.json,verified
Guanaco 33B,57.88819875776397,1311,https://huggingface.co/timdettmers/guanaco-33b,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/guanaco-33b/model_outputs.json,verified
Vicuna 7B,57.329192546583855,1044,https://huggingface.co/lmsys/vicuna-7b-delta-v1.1,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/vicuna-7b/model_outputs.json,verified
LLaMA 33B OASST RLHF,57.329192546583855,1079,https://huggingface.co/OpenAssistant/oasst-rlhf-2-llama-30b-7k-steps-xor,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/oasst-rlhf-llama-33b/model_outputs.json,minimal
Vicuna 7B,57.329192546583855,1044,https://huggingface.co/lmsys/vicuna-7b-delta-v1.1,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/vicuna-7b/model_outputs.json,verified
LLaMA2 Chat 13B,56.14906832298136,1513,https://ai.meta.com/llama/,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/llama-2-13b-chat-hf/model_outputs.json,minimal
Guanaco 13B,53.36239103362392,1774,https://huggingface.co/timdettmers/guanaco-13b,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/guanaco-13b/model_outputs.json,verified
LLaMA2 Chat 7B,51.98757763975155,1479,https://ai.meta.com/llama/,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/llama-2-7b-chat-hf/model_outputs.json,minimal
Expand All @@ -26,8 +26,6 @@ Falcon 40B Instruct,46.70807453416149,662,https://huggingface.co/tiiuae/falcon-4
Alpaca Farm PPO Human 7B,46.45962732919255,803,https://huggingface.co/tatsu-lab/alpaca-farm-ppo-human-wdiff,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/alpaca-farm-ppo-human/model_outputs.json,minimal
Pythia 12B SFT,43.22981366459627,913,https://huggingface.co/OpenAssistant/pythia-12b-sft-v8-7k-steps,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/pythia-12b-mix-sft/model_outputs.json,verified
Pythia 12B OASST SFT,32.79503105590062,726,https://huggingface.co/OpenAssistant/oasst-sft-4-pythia-12b-epoch-3.5,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/oasst-sft-pythia-12b/model_outputs.json,verified
Cohere Chat,32.79503105590062,779,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/cohere-chat/model_outputs.json,community
Cohere,32.608695652173914,682,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/cohere/model_outputs.json,community
Alpaca 7B,32.298136645962735,396,https://huggingface.co/tatsu-lab/alpaca-7b-wdiff,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/alpaca-7b/model_outputs.json,minimal
Falcon 7B Instruct,29.565217391304348,478,https://huggingface.co/tiiuae/falcon-7b-instruct,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/falcon-7b-instruct/model_outputs.json,verified
Davinci001,21.490683229813666,296,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/text_davinci_001/model_outputs.json,minimal

0 comments on commit 3825c08

Please sign in to comment.