Anthropic’s New Claude Fable 5 Has a Powerful, Unrestricted Twin for Vetted Partners

Anthropic has released its latest flagship model, Claude Fable 5, but it’s managing the AI’s power with a new two-tiered safety system. The public-facing version comes with significant guardrails, while an unrestricted underlying model, called Claude Mythos 5, is being kept under tight control for vetted partners. The model's capabilities are already proving to be state-of-the-art, with partners reporting it can compress months of software work into days. In one case, it reportedly handled a 50-million-line codebase migration in under 24 hours.

That power is why Anthropic is splitting access. The company is betting that a tiered approach can safely deliver cutting-edge AI without handing over the keys to its most dangerous capabilities.

How the Safety System Works

The core of Anthropic’s strategy is a system of AI classifiers that monitor prompts for high-risk content. When a query gets flagged, the request isn't simply blocked. Instead, it’s rerouted from Fable 5 to the next-best model, Claude Opus 4.8, and the user is notified about the switch. The goal is to contain misuse without completely disrupting a user's workflow.

The specific domains being sandboxed are:

Cybersecurity: This guardrail covers the development of exploits and other offensive tasks like system reconnaissance and discovery.
Biology and Chemistry: To address dual-use concerns, most queries in these fields are handled by the fallback model.
Distillation: This classifier is designed to prevent users from using Fable 5 to train their own powerful, unsanctioned AI models.

This setup has major implications for developers. A flagged request returns an HTTP 200 response but with a stop_reason of "refusal". Production systems will need logic to catch this and either retry the prompt on the fallback model or handle the rejection. Anthropic is offering tools to help automate this process.

A bigger hurdle for some companies will be the mandatory 30-day data retention policy for all traffic on Fable 5 and Mythos 5. Anthropic says this data will be used exclusively for safety monitoring, like detecting new jailbreaks, and will not be used to train its models. Still, organizations with zero-data-retention policies will need a thorough legal and procurement review before integrating the model.

Red-Teaming Revealed Alarming Skills

The tiered system is a direct response to internal tests that revealed powerful dual-use capabilities. During red-teaming, the unrestricted Mythos 5 model demonstrated an ability to design adeno-associated viruses (AAVs) for gene therapy, outperforming specialized protein models using only its general knowledge of biology. This emergent skill highlighted a major biosecurity risk in the wrong hands.

Its cybersecurity skills were even more pronounced. An earlier version, Claude Mythos Preview, was able to find and exploit zero-day vulnerabilities in every major operating system. In one exercise, it autonomously developed a remote code execution exploit for a 17-year-old bug in FreeBSD, showing it could drastically lower the resources needed to weaponize a vulnerability.

Anthropic says Fable 5’s public-facing safeguards are robust. An external bug bounty program running over 1,000 hours failed to produce a universal jailbreak. While the UK’s AI Safety Institute (AISI) has reportedly made some progress, Anthropic’s goal is to make any remaining exploits so costly and slow that they can't be used at scale.

Claude Fable 5 is priced at $10 per million input tokens and $50 per million output tokens and is now available through the Claude API. Access to the more powerful Mythos 5 is expanding slowly through its Project Glasswing for cybersecurity partners and a new program for biomedical researchers, signaling a clear strategy to balance raw capability with strict, risk-based controls.