如何把我训练的奖励模型放到ppo的工作管线里 #6385

chcoo · 2024-12-19T07:19:40Z

我用llamafactory训练的奖励模型为啥在rlhf设置里加载不到路径，路径里为啥都是之前用sft微调的模型，我不理解是如何把我训练的奖励模型放到ppo的工作管线里，要不准备的奖励模型的数据集也没用上

webui rhlf训练

No response

No response

github-actions bot added the pending This problem is yet to be addressed label Dec 19, 2024

Provide feedback