Scalable agent alignment via reward modeling
Authors
Abstract
This paper explores how to align agent behavior with human preferences in complex domains where reward functions are hard to specify.
Publication Details
Published:
November 19th, 2018
Venue:
arXiv
Added to AI Safety Papers:
March 16th, 2025
Metadata
Tags:
Reward ModelingReinforcement LearningAI Safety