请问为什么奖励模型[reward trainer]使用AutoModelForCausalLMWithValueHead而非AutoModelForSequenceClassification #6455
luoqishuai
started this conversation in
General
Replies: 1 comment
-
没有特殊逻辑,在后续更新里可能会换掉 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
@hiyouga 因为看到trl官方给的示例是AutoModelForSequenceClassification[https://github.com/huggingface/trl].
也没有搜到相关的知识点
请问大佬,使用AutoModelForCausalLMWithValueHead是有什么特殊逻辑在里面吗?
Beta Was this translation helpful? Give feedback.
All reactions