Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何把我训练的奖励模型放到ppo的工作管线里 #6385

Open
1 task done
chcoo opened this issue Dec 19, 2024 · 0 comments
Open
1 task done

如何把我训练的奖励模型放到ppo的工作管线里 #6385

chcoo opened this issue Dec 19, 2024 · 0 comments
Labels
pending This problem is yet to be addressed

Comments

@chcoo
Copy link

chcoo commented Dec 19, 2024

Reminder

  • I have read the README and searched the existing issues.

System Info

我用llamafactory训练的奖励模型为啥在rlhf设置里加载不到路径,路径里为啥都是之前用sft微调的模型,我不理解是如何把我训练的奖励模型放到ppo的工作管线里,要不准备的奖励模型的数据集也没用上

Reproduction

webui rhlf训练

a3edd3d5b5f6434a0f734ff7dd520af

Expected behavior

No response

Others

No response

@github-actions github-actions bot added the pending This problem is yet to be addressed label Dec 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending This problem is yet to be addressed
Projects
None yet
Development

No branches or pull requests

1 participant