Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
  • Loading branch information
YannDubs committed Mar 5, 2024
2 parents 7a7782f + 63123c1 commit af0e9a9
Show file tree
Hide file tree
Showing 9 changed files with 81,875 additions and 1 deletion.
1 change: 1 addition & 0 deletions docs/data_AlpacaEval/alpaca_eval_gpt4_leaderboard.csv
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ GPT-4,95.27950311,1365,,https://github.com/tatsu-lab/alpaca_eval/blob/main/resul
Tulu 2+DPO 70B,95.03105590062113,1418,https://huggingface.co/allenai/tulu-2-dpo-70b,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/tulu-2-dpo-70b/model_outputs.json,community
GPT-4 0314,94.78260869565216,1371,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4_0314/model_outputs.json,verified
Mixtral 8x7B v0.1,94.78260869565216,1465,https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Mixtral-8x7B-Instruct-v0.1/model_outputs.json,minimal
Mistral-7B-ReMax-v0.1,94.39601494396015,1478,https://huggingface.co/ziniuli/Mistral-7B-ReMax-v0.1,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Mistral-7B-ReMax-v0.1/model_outputs.json,community
Yi 34B Chat,94.08468244084682,2123,https://huggingface.co/01-ai/Yi-34B-Chat,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Yi-34B-Chat/model_outputs.json,verified
GPT-4 0613,93.78109452736318,1140,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4_0613/model_outputs.json,verified
GPT 3.5 Turbo 0613,93.41614906832298,1328,,,verified
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Snorkel (Mistral-PairRM-DPO+best-of-16),34.8601328912795,2616,https://huggingfac
PairRM 0.4B+Yi-34B-Chat (best-of-16),31.24128294682124,2195,https://huggingface.co/llm-blender/PairRM,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/pairrm-Yi-34B-Chat/model_outputs.json,community
Snorkel (Mistral-PairRM-DPO),30.2200527006216,2736,https://huggingface.co/snorkelai/Snorkel-Mistral-PairRM-DPO,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Snorkel-Mistral-PairRM-DPO/model_outputs.json,community
Yi 34B Chat,29.65994671879504,2123,https://huggingface.co/01-ai/Yi-34B-Chat,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Yi-34B-Chat/model_outputs.json,minimal
Qwen1.5 72B Chat,26.49828339573589,1549,https://huggingface.co/collections/Qwen/qwen15-65c0a2f577b1ecb76d786524,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Qwen1.5-72B-Chat/model_outputs.json,community
Qwen1.5 72B Chat,26.49828339573589,1549,https://huggingface.co/Qwen/Qwen1.5-72B-Chat,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Qwen1.5-72B-Chat/model_outputs.json,community
Mixtral 8x7B v0.1 (verbose),24.61406305014672,2083,https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Mixtral-8x7B-Instruct-v0.1_verbose/model_outputs.json,dev
Claude 2.1 (verbose),24.354071090158502,1414,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-2.1_verbose/model_outputs.json,dev
GPT-4,23.576789314782605,1365,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4/model_outputs.json,minimal
Expand All @@ -23,6 +23,7 @@ XwinLM 13b V0.1,17.427934750214753,1894,https://github.com/Xwin-LM/Xwin-LM,https
Claude 2,17.188240356594065,1069,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-2/model_outputs.json,minimal
Claude,16.98534361252407,1082,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude/model_outputs.json,verified
Claude Instant 1.2,16.127399621587912,1112,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-instant-1.2/model_outputs.json,community
Mistral-7B-ReMax-v0.1,15.999331369052298,1478,https://huggingface.co/ziniuli/Mistral-7B-ReMax-v0.1,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/Mistral-7B-ReMax-v0.1/model_outputs.json,community
Tulu 2+DPO 70B,15.982854374136648,1418,https://huggingface.co/allenai/tulu-2-dpo-70b,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/tulu-2-dpo-70b/model_outputs.json,verified
GPT-4 0613,15.755038087701964,1140,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/gpt4_0613/model_outputs.json,verified
Claude 2.1,15.733506736409938,1096,,https://github.com/tatsu-lab/alpaca_eval/blob/main/results/claude-2.1/model_outputs.json,verified
Expand Down
11,272 changes: 11,272 additions & 0 deletions results/Mistral-7B-ReMax-v0.1/alpaca_eval_gpt4_turbo_fn/annotations.json

Large diffs are not rendered by default.

5,637 changes: 5,637 additions & 0 deletions results/Mistral-7B-ReMax-v0.1/model_outputs.json

Large diffs are not rendered by default.

64,947 changes: 64,947 additions & 0 deletions results/Mistral-7B-ReMax-v0.1/weighted_alpaca_eval_gpt4_turbo/annotations.json

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ gpt4,95.27950311,0.71628144,761,32,12,805,minimal,1365,
tulu-2-dpo-70b,95.03105590062113,0.7613100978662208,764,39,2,805,community,1418,
gpt4_0314,94.78260869565216,0.7489957601246771,756,35,14,805,verified,1371,94.78260869565216
Mixtral-8x7B-Instruct-v0.1,94.78260869565216,0.7793245403322182,762,41,2,805,minimal,1465,94.78260869565216
Mistral-7B-ReMax-v0.1,94.39601494396015,0.8121535187540114,758,45,0,803,community,1478,94.39601494396015
Yi-34B-Chat,94.08468244084682,0.8260116588728516,754,46,3,803,verified,2123,
gpt4_0613,93.78109452736318,0.8338571422122372,750,46,8,804,verified,1140,93.78109452736318
gpt-3.5-turbo-16k-0613,93.41614906832298,0.847714896903792,746,47,12,805,verified,1328,93.41614906832298
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ xwinlm-13b-v0.1,17.427934750214753,1.1450161466911328,129,672,4,805,16.273291925
claude-2,17.188240356594065,1.1748282561506294,131,673,1,805,16.335403726708076,minimal,1069
claude,16.98534361252407,1.168795979299477,129,676,0,805,16.024844720496894,verified,1082
claude-instant-1.2,16.127399621587912,1.1341036838299487,120,682,3,805,15.093167701863356,community,1112
Mistral-7B-ReMax-v0.1,15.999331369052298,1.1288683901419234,120,683,2,805,15.031055900621118,community,1478
tulu-2-dpo-70b,15.982854374136648,1.1457861368237434,119,683,3,805,14.96894409937888,verified,1418
gpt4_0613,15.755038087701964,1.0754642482299672,117,684,4,805,14.782608695652174,verified,1140
claude-2.1,15.733506736409938,1.120315865445773,115,688,2,805,14.409937888198757,verified,1096
Expand Down
13 changes: 13 additions & 0 deletions src/alpaca_eval/models_configs/Mistral-7B-ReMax-v0.1/configs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Mistral-7B-ReMax-v0.1:
prompt_template: "Mistral-7B-ReMax-v0.1/prompt.txt"
fn_completions: "huggingface_local_completions"
completions_kwargs:
model_name: "./Mistral-7B-ReMax-v0.1" # local path
model_kwargs:
torch_dtype: 'bfloat16'
max_new_tokens: 2048
temperature: 0.7
top_p: 0.9
do_sample: True
pretty_name: "Mistral-7B-ReMax-v0.1"
link: "https://huggingface.co/ziniuli/Mistral-7B-ReMax-v0.1"
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
[INST] {instruction} [/INST]

0 comments on commit af0e9a9

Please sign in to comment.