Risks from Learned Optimization in Advanced Machine Learning Systems
Authors
Abstract
This paper analyzes the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization.
Publication Details
Published:
June 10th, 2019
Venue:
arXiv
Added to AI Safety Papers:
March 16th, 2025
Metadata
Tags:
Mesa-optimizationAI SafetyAlignment