-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support phi2, Support SOLAR 10.7B LMCocktail #183
Conversation
The purpose of submitting phi2 here is that HF open llm leaderboard do not accept communutity model with trust remote code (though some Copoerate submission like Qwen also need remote code would pass which is a bit unfair) I am also ft Phi2 with alpaca gp4 en dataset using llama_factory, here is the result lora, which looks good |
Here is the model output made by updated instruction prompt that has mitigated some known phi2 issue:
But phi2 still ranks below alpaca 7B
The next step for me is to evalute my phi2 alpaca gpt4 |
Here the result for chatgpt_fn on myphi2 alpaca gpt4, which surpass davinci_003 baseline finally! A milestone for 2.7B small model
Here is model output for phi2 alpaca gpt4 |
Hi, is this PR stable now / ready for review? |
prompt_template: "phi-2/prompt.txt" | ||
fn_completions: "vllm_local_completions" | ||
completions_kwargs: | ||
model_name: "/media/hangyu5/Home/Documents/Hugging-Face/phi-2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for merging in the main repo it would be nice to have a more generic folder directory. e.g.
alpaca_eval/src/alpaca_eval/models_configs/openbuddy-llama2-70b-v10.1/configs.yaml
Line 5 in b7bcc50
model_name: "./openbuddy-llama2-70b-v10.1-bf16" # local path |
Thanks @yhyu13 for your contributions 💯 I left some comments. Note that you should also push the outputs / annotations / leaderboard results so that the result show in the main leaderboard. More details here: https://github.com/tatsu-lab/alpaca_eval?tab=readme-ov-file#contributing-a-model Let me know if you have any questions! |
Sure, let me run with gpt4 annotation instead |
Additional to Phi-2, there is another model https://huggingface.co/Yhyu13/LMCocktail-10.7B-v1 that is derived from SOLAR 10.7B which achieve pretty good score. I am also adding it to the leaderboard. both models are in this commit 857a817 |
One thing to note for my evaluation on phi-2 using vllm is that I changed the vocab size from 52100 to 50257 in phi-2's config.json, due to facing this error vllm-project/vllm#340 The number 50257 come from counting total number of tokens in phi-2's vocan.json file https://huggingface.co/microsoft/phi-2/blob/main/vocab.json |
Great, thanks @yhyu13 ! |
Changes made:
Evaluation result:
The first run that I used a chat prompt where phi2 sometimes create a contiuous chat involving user follow up.
For example
The final score annotated by chatgpt is also not ideal. I am not sure if it's a issue with chatgpt being a worse evaluator than gpt4 or not (guess not):
Here is the output json for my first run with chat prompt FYI
model_outputs.json
I am still using evaluation using chatgpt_fn with a better instruction prompt