Anthropic's definition of safety is too narrow

Anthropic treats safety as a model behavior problem. The last month shows it's also a product reliability, pricing, and communication problem. These aren't peripheral concerns. They are where trust actually gets spent.

April was a trust event, not just a quality event. Two visible bumps in a longer pattern set up the argument: a postmortem on Claude Code quality issues and a pricing experiment that briefly removed Claude Code from Pro plans^[1].

The OpenClaw clampdown^[2] had already primed developers to worry about unilateral platform changes, so the ground was soft when these landed. Hacker News sentiment moved from glowing to openly cynical. Fortune's coverage caught users using the word "gaslit"^[3]. That's not a vibes problem, it's a change in perception of what users believe Anthropic will do next time.

The Mythos contrast

Anthropic has held back broad access to Mythos due to its own safety concerns, restricting a frontier model with significant cybersecurity capabilities^[4]. That's a real cost absorbed for a real reason, and it's defensible on those grounds.

It's also the wedge for everything that follows. Anthropic has demonstrated they can take an expensive, principled stand when they classify something as a safety issue. The question is whether the classification scheme is too narrow.

The Mythos restraint is a deliberate, announced, strategic choice. The Claude Code degradation was unintentional. Nobody at Anthropic shipped those bugs on purpose. So the comparison isn't "discipline vs sloppiness" as a values judgment, or frontier cyber risk versus a bad product rollout as equivalent harms. The comparison is about what gets classified as worth being disciplined about. Mythos cleared the bar. Production changes and a pricing experiment did not.

The post-postmortem

Anthropic's writeup is genuine and unusually transparent for the category. It's worth a read to draw your own conclusions. It identifies three changes: a default reasoning effort drop, a thinking-cache bug, and a verbosity-reducing system prompt addition. These compounded into broad, hard-to-diagnose degradation across Claude Code, the Agent SDK, and Cowork.

The parts that don't fully land in the postmortem are structural. A deliberate product decision (the effort default) is grouped with two actual bugs, which muddies the lessons: bugs are a testing-and-review story, but a default that ships and gets reverted three weeks later is a product-judgment story. A system prompt change touching three production models shipped with no demonstrated evidence of a staged rollout; if so, that's a headline process failure filed as future remediation rather than owned up front. The note that Opus 4.7 found the bug when Opus 4.6 couldn't reads as marketing. It's oddly out of place in a postmortem.

I don't have an expectation of bug-free software. Each incident here is small, but the pattern isn't. The problem is the postmortem owns the outcomes without really digging into systemic process failures.

Trust is a safety boundary

The pricing flap is the cleaner example. Amol Avasare, Anthropic's Head of Growth, framed the Pro-plan change as a 2% test for new prosumer signups. That's almost certainly true. But the public pricing page and support docs were updated globally: a contradiction Avasare later acknowledged as "a mistake" without qualification. The communication arrived screenshots-first, explanation-second, and the defensive posture afterward did more damage than the experiment itself. The Fortune piece is largely a chronicle of users moving from confused to angry across that interval.

The concern isn't consumer entitlement. It's adoption risk: companies need to know the future cost of adding people to a toolchain before they build around it. Simon Willison's post on it summarizes this best^[5]. When pricing communication treats developers as an A/B cohort rather than a customer base making procurement decisions, the cost isn't measured in churned subscribers. The cost is a reduction in trust.

It leaves open the question: what else do they see as fair game for A/B testing? The postmortem already gestures at the answer -- a verbosity-reducing system prompt that touched three production models was, in effect, a live experiment too.

This speaks to a coherence problem. Anthropic's market position, its policy advocacy, and its public identity all rest on a story about caring how AI lands in the world. The Claude Constitution has an entire section on (avoiding) deception. I'm not so naive as to believe growth should follow the constitution, but it should be hyper aware of how much the brand promise anchors on it. Anthropic actively positions itself as the lab that takes trust seriously -- not as a marketing posture, but as a product promise.

If anyone should be aware of this, it's Anthropic's Growth team.

Anthropic just shipped its Org Chart

These events point at a division between "model safety" (which research owns) and "everything downstream" (which product and GTM treat as ordinary commercial territory). If your strategic asset is trustworthiness, the safety boundary is wherever trust happens. A pricing experiment, an unstaged prompt rollout, a postmortem that doesn't dig in -- these spend trust the same way a model behavior failure does. Users don't experience the constitution as a document. They experience the actions of the company. Despite themselves, they experience the Reddit screenshots.

The asymmetry with Mythos isn't hypocrisy. It's a category problem. Anthropic appears willing to absorb business pain when the threat model involves national security or catastrophic risk, and rather less willing to apply the same operational rigor when the cost lands on the workflows, budgets, and toolchain bets of working developers.

You can argue the two aren't comparable. I'll counter that Claude Code and its peers are upending careers. A whole profession (or two, or three) is in a tailspin at the moment. This isn't a single catastrophic event, but its impact is global, material, and significant. This product is currently the "pointiest end" of a massive economic shift.

Making the Mythos distinction is judgment that Anthropic can make. But I'll argue it's not so neat. A developer who saw silent pricing confusion and degraded Claude Code has less reason to trust Anthropic's claims about careful access control elsewhere. Trust isn't compartmentalized. It's the same balance sheet. Spending it on one side affects what you can claim on the other.

A wider safety

Model safety viewed as an aegis that covers other behaviors is a kind of category capture: the safety brand absorbing legitimacy that wasn't earned by safety work. That turns a strength into a hazard. Evals, staged rollouts, stable pricing, and clear communication are not just product discipline. For a lab whose core values and market position depend on being the trustworthy one: they are how trust is actually exercised in the real world.

I believe Anthropic is genuine in its goals and tries to live its values. That's exactly why the framing of safety matters. The public language falls into familiar categories: dangerous outputs, misuse, jailbreaks, alignment, catastrophic risk. The implication shouldn't be that everything else is "just product."

"We are compute-constrained and we're finding solutions" is a trustworthy answer. It's certainly more trustworthy than treating product, pricing, and rollout as outside the safety conversation. Safety is not a research department value if product, growth, rollout, and communication can quietly spend the trust that research earns.

Footnotes

The Register has the cleanest writeup: Anthropic tests how devs react to yanking Claude Code from Pro plan. Anthropic's own postmortem is An update on recent Claude Code quality reports. ↩
TechCrunch covered the OpenClaw change earlier in April. Not the same thing, but it's clearly on end-users' minds. ↩
Fortune's coverage of the postmortem includes the "gaslit" quote. A useful reminder that the trust gap was real before the postmortem tried to close it. ↩
Anthropic's Project Glasswing announcement describes Claude Mythos Preview as an unreleased frontier model with significant cybersecurity capabilities, available only through a limited partner program of around 50 organizations. ↩
Simon Willison's later update also captured Avasare's acknowledgment that the logged-out landing page and docs had been updated by mistake. ↩