We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
我用llamafactory训练的奖励模型为啥在rlhf设置里加载不到路径,路径里为啥都是之前用sft微调的模型,我不理解是如何把我训练的奖励模型放到ppo的工作管线里,要不准备的奖励模型的数据集也没用上
webui rhlf训练
No response
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Reminder
System Info
我用llamafactory训练的奖励模型为啥在rlhf设置里加载不到路径,路径里为啥都是之前用sft微调的模型,我不理解是如何把我训练的奖励模型放到ppo的工作管线里,要不准备的奖励模型的数据集也没用上
Reproduction
webui rhlf训练
Expected behavior
No response
Others
No response
The text was updated successfully, but these errors were encountered: