withReinforcement

一、论文信息论文名称：TrainingaHelpfulandHarmlessAssistantwithReinforcementLearningfromHumanFeedback Github: GitHub-anthropics/hh-rlhf:Humanpreferencedatafor"TrainingaHelpfulandHarmlessAssistantwithReinforcementLearningfromHumanFeedback"作者团队：发表时间：2022年4月12日，比insturctgpt晚40天，比chatgpt发布早半年模型比较：InstructGPT、ChatGP