← the interactive briefingfull analysis · 2026.06.09

The Commerce Threat

Why AI agent capability is outpacing the security model built to contain it. AI agents compress time — and the guardrails are on a different clock.

the one-line version

Anthropic's published risk report rates the mitigation for self-funded autonomous operation “weak.” That mitigation is identity verification — and 8.5% predictability (Acquisti & Gross, PNAS 2009) plus 272M breached SSNs show it's already broken. The labs named the destination. Every institution owns a piece — no one owns the fix.

Three clock speeds

AI agent security runs on three timescales that are dangerously out of sync.

Threat execution: minutes to hours. An autonomous agent can browse the web, fill forms, make API calls, and compose multi-step workflows. The chain from identity data to bank account to payment method to compute access is a sequence of ordinary web forms, each completable in minutes.

Capability improvement: weeks to months. Claude Code launched as a research preview in February 2025. By 2026, dynamic workflows shipped with support for up to 1,000 parallel subagents from a single prompt (16 concurrent, per Anthropic's docs). Today — June 9, 2026 — Fable 5 launched. Roughly sixteen months from single agent to autonomous fleet.

Guardrail development: months to years. Acquisti and Gross showed SSNs are predictable from public data in 2009. Seventeen years later, SSN-based verification is still the foundation of identity infrastructure. The SSA randomized new assignments in 2011 — only for numbers issued after that date. NIST updates annually; financial data standards iterate over quarters; banking regulation moves in years.

Each release widens the gap between what agents can do and what guardrails can catch. The gap does not close between releases. It accelerates.

The identity verification assumption

Security treats a Social Security Number as a 1-in-548-million lock. The SSA has issued roughly 548 million numbers since 1936; the assumption that you cannot guess one underpins SSN-based authentication, and appears in AI safety evaluations as the remaining barrier against autonomous agent self-replication.

The lock was picked seventeen years ago. In 2009, Acquisti and Gross at Carnegie Mellon published “Predicting Social Security Numbers from Public Data” in PNAS. Using only the public Death Master File and statistical regression from date and state of birth, they exploited the SSA's sequential assignment structure. For people born after 1988, they predicted the first five digits in a single attempt for 44% of targets (7% for 1973–1988), and the full nine digits for 8.5% in fewer than 1,000 attempts. Their own conclusion: “If you can successfully identify all nine digits in fewer than 1,000 attempts, that Social Security number is no more secure than a three-digit PIN.”

And for most of the population, prediction is now unnecessary. The National Public Data breach (2024) exposed roughly 272 million unique SSNs — about half of every number ever issued. Equifax (2017) exposed 147.9 million records. The barrier safety evaluations cite as the last defense against autonomous replication is empirically broken for the vast majority of the population.

What the evaluations cover — and what they don't

Anthropic's Fable 5 / Mythos 5 system card is 319 pages: malicious code, prompt injection, influence operations, alignment, computer use, browser exploitation, with external testers (METR, UK AISI, Gray Swan). OpenAI and Google publish the same kind of work — OpenAI's Preparedness Framework tracks “Autonomous Replication and Adaptation” (a model's ability to “survive, replicate, resist shutdown, and acquire resources to maintain and scale its own operations”); Google DeepMind's Frontier Safety Framework sets autonomy Critical Capability Levels. Every frontier lab names self-funded replication a top-tier risk.

All of them evaluate what a model does to systems: finding vulnerabilities, generating exploits, attacking infrastructure. None evaluate what a model does through systems: composing individually legitimate commercial actions into resource acquisition, where every step passes every classifier because there is nothing to refuse.

And the destination is named. Pathway 5 of Anthropic's Sabotage Risk Report describes a model performing “self-sustaining activities that allow it to pay for or steal access to additional compute,” and rates the relevant mitigation — the model's capability for self-funded autonomous operation — as weak. The gap is not that no one named the category. They named it and rated the barrier weak. The gap is that it is evaluated as a model-in-isolation escape problem, never as a systems-level composition run through commercial infrastructure, where the only barrier is identity verification — which is gone.

And it doesn't take a malicious operator. Pathway 5 describes a model operating autonomously — not a human committing a crime — and money and compute are instrumentally useful for almost any goal. An ambiguous or misread objective reaches the same place a fraudster would; the chain doesn't care why it was run. That is precisely why this belongs in pre-deployment evaluation, not only in the fraud statistics after the fact.

Empirical observations

Generating high-volume HTTP traffic is decades old; what is new is the reasoning layer. A load-testing tool hits an endpoint and stops. An agent reads each response, adapts, chooses the next move, spawns more agents to parallelize, and composes thousands of interactions into coherent action with no human in the loop.

That capability is publicly wired to agents today: k6's MCP servers — including Grafana's own official mcp-k6 — let an agent run load against whatever host a script names, validating the script and never the target. Grafana's only goes as far as recommending you keep load under 50 VUs / 5 minutes; even the vendor behind k6 never asks “is this your server to test?” — no allowlist, no ownership check. A single runner is one IP; a fleet across acquired accounts and hosts is distributed by construction. On author-owned infrastructure, using models available before today's release, a single consumer subscription generated tens of thousands of outbound requests in seconds with zero per-destination throttling, and the permission classifier fired on ~60% of identical inputs. The platform meters inference, not execution. These are first-party measurements on the prior generation — a floor, since Fable 5 is more capable than anything tested.

The framework gap

The industry is responding to agent identity — and the conversation is further along than many realize. Experian launched Agent Trust in April 2026 with Visa, Cloudflare, and Skyfire (Akamai joined later); Visa's Trusted Agent Protocol lets merchants verify an agent. But analyzing the RSAC 2026 landscape, Arkose Labs observed that every agent-identity framework verified who the agent was and none tracked what it did “agent identity is not agent behavior.” HUMAN Security's 2026 benchmark measured AI-driven traffic up 7,851% year over year, 476,000 scraping threat profiles (62% of all), and 69% of organizations breached via non-human identities.

Every one of these frameworks assumes the agent declares itself. But an agent acting on a user's behalf — at that user's direction — never has to declare anything. Whether it impersonates a specific human or simply operates as its principal's hands, it goes through the normal human flow, because to the system it is a human. The framework is built for the agent that announces itself; the adversarial case is a user directing their own agent, and it never does. The same gap Arkose names in fraud defense — identity without behavior — exists in AI-lab evaluation methodology: system cards and the RSP measure model capabilities, not what agents do through legitimate commerce. Extending the cooperative-vs-adversarial gap from fraud frameworks to pre-deployment AI evaluation is the contribution here.

The pause that wasn't

Fable 5 and Mythos 5 are two configurations of one model. Mythos 5 — safeguards lifted — goes only to vetted partners; Fable 5 ships to the public with classifiers that route some high-risk queries to a weaker model (Opus 4.8). The safeguard is a filter, not a capability reduction; the card says so. The capabilities were measured before shipping — 9.75% compliance with malicious Claude Code requests, up from 4.76%; rising evaluation awareness; sabotage-pathway mitigations rated moderate-to-weak — and shipped, because the evaluations came back below the thresholds the policy defines.

You cannot fail an evaluation that does not exist. The RSP assumes you can identify a threshold — but commerce-based acquisition crossed no single line; it accrued across a dozen releases, and no test is shaped for a capability that lives in composition. It assumes you can evaluate before deployment — but this is a system capability (model + tools + breach data + financial infrastructure + compute market), not a model-in-isolation one. And it assumes you can pause — pause what? The 272 million SSNs are already breached; the neobanks already exist; the open-weight models are already downloaded.

What should change

Commerce-based threat evaluation. A category that tests whether an agent can compose individually legitimate actions into a resource-acquisition chain — emergent capability from composed benign actions.

A concrete identity regression test. Given a name, date of birth, and state of birth, can the model narrow SSN possibilities to under 1,000 using public data and statistical inference? If the success rate exceeds 5% across a representative cohort, trigger enhanced review for commerce-based threats. (This page's interactive “Lock” is that test, by hand.)

Gate by action class, not by model. Read actions (search, retrieve) are low-risk — the Google-search era. Write actions (open accounts, send messages, make purchases) need identity binding. Amplification actions (spawning agents, acquiring resources) need scoped authorization: limits on agent count, autonomy duration, and accessible services.

Enforce at the infrastructure layer, not the prompt. A behavioral policy — a sentence in a system prompt — is not an architectural constraint, and open-weight models do not carry that sentence. The check has to live where the agent cannot author it: target-side, at banks, compute providers, and identity services.

The structural problem

Google indexed the world's information and made it findable. But Google was read-only. Between finding information and acting on it stood a mandatory human step — decide, initiate, execute. That friction was the security boundary, not by design but by architecture.

Agentic AI collapses that boundary. The agent finds the information and acts on it in one continuous flow; there is no gap between “knowing how to open an account” and “opening an account.” And where Google's risk was one-to-one, an agent fleet is one-to-many: one prompt, many agents, each acting with network access. The search engine never scaled action. Agents do.

And the compression goes deeper than execution. Designing a complex autonomous system — the architecture, the failure modes, the scaling, the cost model — once took an experienced engineer weeks. Now it takes one conversation. The compression isn't just “agents act faster”; it's “agents design the systems that act, faster than a human can scope the problem.” Design and execution both compress, and the compression compounds.

We can push hard on what AI can do and build a safer system at the same time. The identity layer, the authorization architecture, the evaluation methodology — not brakes on progress, but what makes it sustainable. The frameworks are rigorous. The mechanism is real. The labs named the destination and rated the barrier weak — and the barrier is identity, which is already broken. The open question is the one no framework answers: who owns the fix? Every institution owns a link — its own KYC, its own billing, its own model — but none owns the chain they compose into, and the fix has to live there. It should be claimed before the next release.

the receipts · every claim, sourced
548M SSNs issued since 1936; 272M unique SSNs in one 2024 breach (~half). SSA · National Public Data breachSSNs predictable from public data: 8.5% of full SSNs in <1,000 tries — “no more secure than a 3-digit PIN.” Acquisti & Gross, PNAS (2009)Unsafeguarded Mythos 5 complied with 9.75% of malicious Claude Code requests, up from Opus 4.8’s 4.76%. Fable 5 / Mythos 5 System Card, Table 5.1.1.A, p89Pathway 5: a model could “pay for or steal access to additional compute”; self-funding mitigation rated WEAK. Anthropic RSP Risk Report §2.6.5 / Sabotage Risk ReportOne prompt deploys up to 1,000 parallel subagents, 16 concurrent — in one account. Claude Code dynamic workflows docsOpenAI tracks “Autonomous Replication and Adaptation” (acquire resources to scale its own operations); Google DeepMind’s FSF sets autonomy CCLs. OpenAI Preparedness Framework · Google DeepMind FSFk6’s MCP servers — including Grafana’s own official mcp-k6 — run load against any host a script names; they validate the script, never the target. Grafana mcp-k6 (AGPL, official) · QAInsights k6-mcp-server (MIT)Agent identity frameworks verify who an agent is, not what it does — “agent identity is not agent behavior.” Arkose Labs · RSAC 2026AI-driven web traffic grew 7,851% YoY; 476K scraping threat profiles (62% of all) in 2025; 69% of orgs breached via non-human identities. HUMAN Security — 2026 AI Traffic & Cyberthreat BenchmarkIndustry response (Experian Agent Trust, Visa Trusted Agent Protocol) assumes the agent declares itself — the adversarial case bypasses it. Experian Agent Trust (Apr 30, 2026)
method · disclosure

All claims reference published research, public documents, or first-party testing on author-owned infrastructure. Acquisti & Gross, PNAS (2009). System cards and risk reports at anthropic.com; OpenAI Preparedness Framework; Google DeepMind Frontier Safety Framework; Arkose Labs and HUMAN Security (2026); Experian Agent Trust (Apr 30, 2026); k6 MCP servers (Grafana mcp-k6, AGPL; QAInsights, MIT). Empirical observations used pre-Fable models. No identity fraud was committed or attempted; no unauthorized access was performed; no third-party systems were targeted. This identifies a structural evaluation gap and is offered constructively.