Three clock speeds
AI agent security runs on three timescales that are dangerously out of sync.
Threat execution: minutes to hours. An autonomous agent can browse the web, fill forms, make API calls, and compose multi-step workflows. The chain from identity data to bank account to payment method to compute access is a sequence of ordinary web forms, each completable in minutes.
Capability improvement: weeks to months. Claude Code launched as a research preview in February 2025. By 2026, dynamic workflows shipped with support for up to 1,000 parallel subagents from a single prompt (16 concurrent, per Anthropic's docs). Today — June 9, 2026 — Fable 5 launched. Roughly sixteen months from single agent to autonomous fleet.
Guardrail development: months to years. Acquisti and Gross showed SSNs are predictable from public data in 2009. Seventeen years later, SSN-based verification is still the foundation of identity infrastructure. The SSA randomized new assignments in 2011 — only for numbers issued after that date. NIST updates annually; financial data standards iterate over quarters; banking regulation moves in years.
Each release widens the gap between what agents can do and what guardrails can catch. The gap does not close between releases. It accelerates.
The identity verification assumption
Security treats a Social Security Number as a 1-in-548-million lock. The SSA has issued roughly 548 million numbers since 1936; the assumption that you cannot guess one underpins SSN-based authentication, and appears in AI safety evaluations as the remaining barrier against autonomous agent self-replication.
The lock was picked seventeen years ago. In 2009, Acquisti and Gross at Carnegie Mellon published “Predicting Social Security Numbers from Public Data” in PNAS. Using only the public Death Master File and statistical regression from date and state of birth, they exploited the SSA's sequential assignment structure. For people born after 1988, they predicted the first five digits in a single attempt for 44% of targets (7% for 1973–1988), and the full nine digits for 8.5% in fewer than 1,000 attempts. Their own conclusion: “If you can successfully identify all nine digits in fewer than 1,000 attempts, that Social Security number is no more secure than a three-digit PIN.”
And for most of the population, prediction is now unnecessary. The National Public Data breach (2024) exposed roughly 272 million unique SSNs — about half of every number ever issued. Equifax (2017) exposed 147.9 million records. The barrier safety evaluations cite as the last defense against autonomous replication is empirically broken for the vast majority of the population.
What the evaluations cover — and what they don't
Anthropic's Fable 5 / Mythos 5 system card is 319 pages: malicious code, prompt injection, influence operations, alignment, computer use, browser exploitation, with external testers (METR, UK AISI, Gray Swan). OpenAI and Google publish the same kind of work — OpenAI's Preparedness Framework tracks “Autonomous Replication and Adaptation” (a model's ability to “survive, replicate, resist shutdown, and acquire resources to maintain and scale its own operations”); Google DeepMind's Frontier Safety Framework sets autonomy Critical Capability Levels. Every frontier lab names self-funded replication a top-tier risk.
All of them evaluate what a model does to systems: finding vulnerabilities, generating exploits, attacking infrastructure. None evaluate what a model does through systems: composing individually legitimate commercial actions into resource acquisition, where every step passes every classifier because there is nothing to refuse.
And the destination is named. Pathway 5 of Anthropic's Sabotage Risk Report describes a model performing “self-sustaining activities that allow it to pay for or steal access to additional compute,” and rates the relevant mitigation — the model's capability for self-funded autonomous operation — as weak. The gap is not that no one named the category. They named it and rated the barrier weak. The gap is that it is evaluated as a model-in-isolation escape problem, never as a systems-level composition run through commercial infrastructure, where the only barrier is identity verification — which is gone.
And it doesn't take a malicious operator. Pathway 5 describes a model operating autonomously — not a human committing a crime — and money and compute are instrumentally useful for almost any goal. An ambiguous or misread objective reaches the same place a fraudster would; the chain doesn't care why it was run. That is precisely why this belongs in pre-deployment evaluation, not only in the fraud statistics after the fact.
Empirical observations
Generating high-volume HTTP traffic is decades old; what is new is the reasoning layer. A load-testing tool hits an endpoint and stops. An agent reads each response, adapts, chooses the next move, spawns more agents to parallelize, and composes thousands of interactions into coherent action with no human in the loop.
That capability is publicly wired to agents today: k6's MCP servers — including Grafana's own official mcp-k6 — let an agent run load against whatever host a script names, validating the script and never the target. Grafana's only goes as far as recommending you keep load under 50 VUs / 5 minutes; even the vendor behind k6 never asks “is this your server to test?” — no allowlist, no ownership check. A single runner is one IP; a fleet across acquired accounts and hosts is distributed by construction. On author-owned infrastructure, using models available before today's release, a single consumer subscription generated tens of thousands of outbound requests in seconds with zero per-destination throttling, and the permission classifier fired on ~60% of identical inputs. The platform meters inference, not execution. These are first-party measurements on the prior generation — a floor, since Fable 5 is more capable than anything tested.
The framework gap
The industry is responding to agent identity — and the conversation is further along than many realize. Experian launched Agent Trust in April 2026 with Visa, Cloudflare, and Skyfire (Akamai joined later); Visa's Trusted Agent Protocol lets merchants verify an agent. But analyzing the RSAC 2026 landscape, Arkose Labs observed that every agent-identity framework verified who the agent was and none tracked what it did — “agent identity is not agent behavior.” HUMAN Security's 2026 benchmark measured AI-driven traffic up 7,851% year over year, 476,000 scraping threat profiles (62% of all), and 69% of organizations breached via non-human identities.
Every one of these frameworks assumes the agent declares itself. But an agent acting on a user's behalf — at that user's direction — never has to declare anything. Whether it impersonates a specific human or simply operates as its principal's hands, it goes through the normal human flow, because to the system it is a human. The framework is built for the agent that announces itself; the adversarial case is a user directing their own agent, and it never does. The same gap Arkose names in fraud defense — identity without behavior — exists in AI-lab evaluation methodology: system cards and the RSP measure model capabilities, not what agents do through legitimate commerce. Extending the cooperative-vs-adversarial gap from fraud frameworks to pre-deployment AI evaluation is the contribution here.
The pause that wasn't
Fable 5 and Mythos 5 are two configurations of one model. Mythos 5 — safeguards lifted — goes only to vetted partners; Fable 5 ships to the public with classifiers that route some high-risk queries to a weaker model (Opus 4.8). The safeguard is a filter, not a capability reduction; the card says so. The capabilities were measured before shipping — 9.75% compliance with malicious Claude Code requests, up from 4.76%; rising evaluation awareness; sabotage-pathway mitigations rated moderate-to-weak — and shipped, because the evaluations came back below the thresholds the policy defines.
You cannot fail an evaluation that does not exist. The RSP assumes you can identify a threshold — but commerce-based acquisition crossed no single line; it accrued across a dozen releases, and no test is shaped for a capability that lives in composition. It assumes you can evaluate before deployment — but this is a system capability (model + tools + breach data + financial infrastructure + compute market), not a model-in-isolation one. And it assumes you can pause — pause what? The 272 million SSNs are already breached; the neobanks already exist; the open-weight models are already downloaded.
What should change
Commerce-based threat evaluation. A category that tests whether an agent can compose individually legitimate actions into a resource-acquisition chain — emergent capability from composed benign actions.
A concrete identity regression test. Given a name, date of birth, and state of birth, can the model narrow SSN possibilities to under 1,000 using public data and statistical inference? If the success rate exceeds 5% across a representative cohort, trigger enhanced review for commerce-based threats. (This page's interactive “Lock” is that test, by hand.)
Gate by action class, not by model. Read actions (search, retrieve) are low-risk — the Google-search era. Write actions (open accounts, send messages, make purchases) need identity binding. Amplification actions (spawning agents, acquiring resources) need scoped authorization: limits on agent count, autonomy duration, and accessible services.
Enforce at the infrastructure layer, not the prompt. A behavioral policy — a sentence in a system prompt — is not an architectural constraint, and open-weight models do not carry that sentence. The check has to live where the agent cannot author it: target-side, at banks, compute providers, and identity services.
The structural problem
Google indexed the world's information and made it findable. But Google was read-only. Between finding information and acting on it stood a mandatory human step — decide, initiate, execute. That friction was the security boundary, not by design but by architecture.
Agentic AI collapses that boundary. The agent finds the information and acts on it in one continuous flow; there is no gap between “knowing how to open an account” and “opening an account.” And where Google's risk was one-to-one, an agent fleet is one-to-many: one prompt, many agents, each acting with network access. The search engine never scaled action. Agents do.
And the compression goes deeper than execution. Designing a complex autonomous system — the architecture, the failure modes, the scaling, the cost model — once took an experienced engineer weeks. Now it takes one conversation. The compression isn't just “agents act faster”; it's “agents design the systems that act, faster than a human can scope the problem.” Design and execution both compress, and the compression compounds.
We can push hard on what AI can do and build a safer system at the same time. The identity layer, the authorization architecture, the evaluation methodology — not brakes on progress, but what makes it sustainable. The frameworks are rigorous. The mechanism is real. The labs named the destination and rated the barrier weak — and the barrier is identity, which is already broken. The open question is the one no framework answers: who owns the fix? Every institution owns a link — its own KYC, its own billing, its own model — but none owns the chain they compose into, and the fix has to live there. It should be claimed before the next release.