AI Safety via Debate
Authors
Abstract
We explore debate as a training procedure for aligning AI systems with human values. In this approach, we train agents to debate about questions posed by a human, with the human judging the debate.
Publication Details
Published:
May 2nd, 2018
Venue:
arXiv
Added to AI Safety Papers:
March 16th, 2025
Metadata
Tags:
DebateAlignmentAI Safety