Anthropic Revamps AI Safety Policy Amid Industry Pressure

What Happened

Anthropic unveiled Version 3.0 of its Responsible Scaling Policy (RSP), marking the most significant revision to the company’s safety framework since its inception. The update introduces a crucial distinction between what Anthropic commits to do internally versus what it believes the entire AI industry should adopt.

Under the previous RSP, Anthropic committed to implementing safety mitigations that would reduce their models’ absolute risk levels to acceptable standards, regardless of competitors’ actions. The new policy acknowledges that “from a societal perspective, what matters is the risk to the ecosystem as a whole.”

Jared Kaplan, Anthropic’s co-founder and Chief Science Officer, now serves as the company’s Responsible Scaling Officer, taking over from Chief Technology Officer Sam McCandlish who held the role for the past year.

Why It Matters

This policy shift represents a fundamental change in how one of AI’s most safety-conscious companies approaches risk management. Anthropic has long positioned itself as prioritizing safety over speed in the AI development race, often contrasting its approach with competitors like OpenAI and Google.

The changes suggest that even safety-focused companies are grappling with the practical challenges of unilateral safety commitments in a competitive market. By separating internal commitments from industry-wide recommendations, Anthropic appears to be acknowledging that some safety measures require coordinated industry action to be effective.

Background

Anthropic’s original Responsible Scaling Policy, introduced in 2023, was designed as a framework for safely developing increasingly powerful AI systems. The policy included specific capability thresholds and safety evaluations that would trigger enhanced security measures before training or deploying new models.

However, the company’s own reflection on the policy’s implementation revealed significant challenges. Capability thresholds “proved far more ambiguous than anticipated,” with biological risk assessments providing an example where models now pass most quick tests but results aren’t definitive enough to prove risks are high.

The policy did have notable successes: it forced internal teams to treat safety as a launch requirement, and competitors like OpenAI and Google DeepMind adopted similar frameworks within months of Anthropic’s announcement. ASL-3 safeguards were successfully activated in May 2025.

What’s Next

The new RSP introduces several key requirements that will shape Anthropic’s future development:

Frontier Safety Roadmaps: These will describe concrete plans for making progress across Security, Alignment, Safeguards, and Policy, with goals intended to be “ambitious yet achievable.”

Risk Reports with External Review: Detailed safety profiles of models will undergo external review, with critics gaining “unredacted or minimally-redacted access” to publicly critique Anthropic’s reasoning.

Critics have raised concerns about the updated framework, particularly noting that safety decisions are “no longer dependent on the outcomes of pre-specified evaluations, but on the personal judgment of Dario Amodei and Jared Kaplan.”

The changes come as the AI industry faces increasing pressure to accelerate development while maintaining safety standards, with various stakeholders pushing for both faster innovation and stronger safety measures.