Skip to content

Actions: tatsu-lab/alpaca_eval

test format leaderboard

Actions

Loading...
Loading

Show workflow options

Create status badge

Loading
249 workflow runs
249 workflow runs

Filter by Event

Filter by Status

Filter by Branch

Filter by Actor

[NOTEBOOK] add length-corrected GLM
test format leaderboard #99: Pull request #237 synchronize by YannDubs
February 19, 2024 12:02 2m 14s yann/add_arena_openai
February 19, 2024 12:02 2m 14s
[NOTEBOOK] add length-corrected GLM
test format leaderboard #98: Pull request #237 opened by YannDubs
February 19, 2024 12:01 1m 58s yann/add_arena_openai
February 19, 2024 12:01 1m 58s
[DATA] add results from the Arena openai models
test format leaderboard #97: Pull request #234 opened by YannDubs
February 12, 2024 21:32 1m 51s yann/add_arena_openai
February 12, 2024 21:32 1m 51s
[DEV] Analyzing length-controlled metrics.
test format leaderboard #96: Pull request #231 synchronize by YannDubs
February 11, 2024 05:22 1m 40s yann/length_control
February 11, 2024 05:22 1m 40s
[DEV] Analyzing length-controlled metrics.
test format leaderboard #95: Pull request #231 synchronize by YannDubs
February 11, 2024 05:15 1m 43s yann/length_control
February 11, 2024 05:15 1m 43s
[DEV] Analyzing length-controlled metrics.
test format leaderboard #94: Pull request #231 opened by YannDubs
February 11, 2024 05:14 1m 45s yann/length_control
February 11, 2024 05:14 1m 45s
[DATA] Adding annotations for the arena models
test format leaderboard #93: Pull request #229 opened by YannDubs
February 7, 2024 22:32 2m 11s yann/models_arena_2
February 7, 2024 22:32 2m 11s
Add Qwen1.5-72B-Chat to AlpacaEval
test format leaderboard #90: Pull request #226 synchronize by Lukeming-tsinghua
February 7, 2024 05:26 1m 44s Lukeming-tsinghua:main
February 7, 2024 05:26 1m 44s
Add Qwen1.5-72B-Chat to AlpacaEval
test format leaderboard #89: Pull request #226 opened by Lukeming-tsinghua
February 6, 2024 07:53 1m 53s Lukeming-tsinghua:main
February 6, 2024 07:53 1m 53s
Add xwinlm-70b-v0.3 to AlpacaEval
test format leaderboard #88: Pull request #221 opened by nbl97
January 29, 2024 12:17 1m 53s nbl97:main
January 29, 2024 12:17 1m 53s
[RES] add 3 models for arena correlations
test format leaderboard #87: Pull request #218 opened by YannDubs
January 25, 2024 11:33 2m 9s yann/arena_eval
January 25, 2024 11:33 2m 9s
Add Snorkel-Mistral-PairRM-DPO (best-of-16) to Alpaca Eval 2.0
test format leaderboard #85: Pull request #215 opened by viethoangtranduong
January 23, 2024 04:34 1m 44s viethoangtranduong:main
January 23, 2024 04:34 1m 44s
Add PairRM 0.4B + Yi-34B-Chat to AlpacaEval 2.0
test format leaderboard #84: Pull request #210 opened by jdf-prog
January 17, 2024 21:14 1m 40s jdf-prog:main
January 17, 2024 21:14 1m 40s
[ENH] add outputs & configs form dolphin 2.2.1
test format leaderboard #83: Pull request #209 synchronize by YannDubs
January 16, 2024 19:31 1m 45s yann/dolphin_221
January 16, 2024 19:31 1m 45s
[ENH] add outputs & configs form dolphin 2.2.1
test format leaderboard #82: Pull request #209 opened by YannDubs
January 16, 2024 18:57 1m 40s yann/dolphin_221
January 16, 2024 18:57 1m 40s
[ENH] add internlm2-chat-20b-ppo
test format leaderboard #81: Pull request #207 synchronize by C1rN09
January 16, 2024 08:47 1m 41s C1rN09:add_internlm2
January 16, 2024 08:47 1m 41s
[ENH] add internlm2-chat-20b-ppo
test format leaderboard #80: Pull request #207 synchronize by C1rN09
January 16, 2024 06:18 1m 45s C1rN09:add_internlm2
January 16, 2024 06:18 1m 45s
[ENH] add internlm2-chat-20b-ppo
test format leaderboard #79: Pull request #207 opened by C1rN09
January 16, 2024 06:15 2m 3s C1rN09:add_internlm2
January 16, 2024 06:15 2m 3s
[ENH] add mistral-medium
test format leaderboard #78: Pull request #205 opened by YannDubs
January 11, 2024 20:01 1m 43s yann/mistral_medium
January 11, 2024 20:01 1m 43s
[ENH] add OpenHermes
test format leaderboard #77: Pull request #203 opened by YannDubs
January 10, 2024 23:12 1m 39s yann/openhermes
January 10, 2024 23:12 1m 39s
[WIP] precompute all leaderboard for AE2
test format leaderboard #76: Pull request #199 synchronize by YannDubs
January 10, 2024 21:02 1m 8s yann/alpaca_eval_2_all
January 10, 2024 21:02 1m 8s
[WIP] precompute all leaderboard for AE2
test format leaderboard #75: Pull request #199 synchronize by YannDubs
January 10, 2024 21:01 1m 12s yann/alpaca_eval_2_all
January 10, 2024 21:01 1m 12s