AI Safety via Debate

Authors

Geoffrey Irving, Paul Christiano, Dario Amodei

Abstract

We explore debate as a training procedure for aligning AI systems with human values. In this approach, we train agents to debate about questions posed by a human, with the human judging the debate.

Publication Details

Published:

May 2nd, 2018

Venue:

arXiv

Added to AI Safety Papers:

March 16th, 2025

Metadata

Tags:

DebateAlignmentAI Safety

Original Paper:

Link