We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
代码中使用Value Head来实现PPO中的critic,所定义的detach_value_head函数并没有被使用,也就是说训练过程中,value head之前的主干网络的部分能力还会被用于估计value,这样合理吗?
transformers_tasks/RLHF/trl/gpt2.py
Line 87 in 4978118
是否可以直接将此行替换成一个reward model的forward函数?
Line 120 in 4978118
也就是在GPT2HeadWithValueModel初始化时,同时加入reward model的模型接口,这样更合理?
Line 74 in 4978118
The text was updated successfully, but these errors were encountered:
No branches or pull requests
代码中使用Value Head来实现PPO中的critic,所定义的detach_value_head函数并没有被使用,也就是说训练过程中,value head之前的主干网络的部分能力还会被用于估计value,这样合理吗?
transformers_tasks/RLHF/trl/gpt2.py
Line 87 in 4978118
是否可以直接将此行替换成一个reward model的forward函数?
transformers_tasks/RLHF/trl/gpt2.py
Line 120 in 4978118
也就是在GPT2HeadWithValueModel初始化时,同时加入reward model的模型接口,这样更合理?
transformers_tasks/RLHF/trl/gpt2.py
Line 74 in 4978118
The text was updated successfully, but these errors were encountered: