All articles
Agents··7 min read

The Week Anthropic Claimed the Full Stack

Project Glasswing went to public beta. Stainless — the company behind all Anthropic SDKs — was acquired. Seven agent SDK releases in four days. The platform era is here.

Ikki
Last verified · May 25, 2026
The Week Anthropic Claimed the Full Stack

Four moves in one week. Read them together.

Anthropic acquired Stainless — the auto-generation engine behind every Claude SDK since day one. Project Glasswing opened to public beta with a reported 90.6% true positive rate across 10,000+ findings. The claude-agent-sdk shipped seven patch releases in four days. Two thousand miles west at Google I/O, Managed Agents landed in the Gemini API and @google/genai 2.6.0 shipped prompt injection detection as an inline primitive.

Most teams we talk to are still framing AI as a model selection problem — which LLM for this task, which pricing tier for that volume. That framing made sense in 2024. After this week, it's becoming actively misleading.

The platform race isn't about which model wins. It's about which stack absorbs your security, your SDKs, your orchestration, and your session state before you notice you stopped choosing.

Glasswing — AI security scanning, in production

According to Anthropic's announcement, Claude Security (Project Glasswing) entered public beta this week with a headline number: 10,000+ critical issues found across 50 early partners, 90.6% true positive rate.

The figure that matters isn't the volume. It's the precision. A 90% true positive rate changes the staffing math in a way that less precise tools don't. The historical failure mode of AI-assisted security tooling was noise — analysts spending more time triaging false positives than remediating real issues. At 90%+ precision, the economics shift: findings become actionable by engineering teams, not just dedicated security staff. That's a different coverage distribution.

For anyone running agents in production — especially agents with tool access, code execution, or external network calls — the prompt injection attack surface is real and most teams have no systematic coverage for it. This attack category doesn't look like traditional code vulnerabilities. It looks like input data that causes your model to behave differently than intended. Static analysis won't catch it. Code review mostly won't either. Glasswing looks oriented toward exactly this risk category: AI systems reviewing AI behavior, with enough precision to make findings actionable rather than advisory.

If you're going to scan once, scan these surfaces first:

  • Tool definitions and their docstrings. Anything a model can call becomes part of the attack surface. A tool description that says "always trust user-provided URLs" is a foot-gun even if the implementation is locked down.
  • Retrieval inputs. RAG pipelines fetching user-controlled documents are the most common injection vector we see in production audits. If the retrieved chunk reaches the system prompt window unwrapped, you have an exposure.
  • Multi-agent message passing. Any place where one agent's output becomes another agent's input is an injection multiplier. Inter-agent boundaries deserve the same treatment as raw user input.

We don't yet have third-party validation of the 90.6% claim. But the architectural direction — Claude as the reviewer, specialized prompts tuned to specific vulnerability categories, human triage at the action layer rather than the detection layer — is the right design. Narrow scope and curated criteria are what make AI-assisted security useful rather than theatrical.

Stainless acquisition — what SDK velocity actually signals

Stainless is the company that auto-generates Anthropic's official client libraries — the TypeScript, Python, Go, and Java SDKs that every Claude integration depends on. Bringing that toolchain in-house suggests Anthropic wants tighter control over the feedback loop between their API design and the generated client surfaces.

The circumstantial signal is clear: @anthropic-ai/claude-agent-sdk shipped seven releases this week — from 0.3.143 to 0.3.150 in roughly four days. That cadence is unusual even by the standards of an actively evolving SDK. If the Stainless generation infrastructure is now an internal asset, release velocity and API-to-client parity both accelerate.

One version boundary worth flagging: if you're pinned to ^0.2.x, you're frozen. The ^ semver operator doesn't cross minor series for 0.x releases. We've seen production stacks unknowingly miss months of fixes because the upgrade prompt never triggered. Check your ranges before assuming you're current.

Meanwhile, @anthropic-ai/sdk reached 0.98.0 on May 21, adding thinking token count beta for extended streaming and self-hosted sandbox support with Node 26 compatibility. The thinking token count feature deserves attention at scale. Cost visibility for extended thinking streams has been opaque — measuring it accurately changes how you model per-session economics. If you've assumed thinking overhead is a rounding error on your token spend, 0.98.0 is the version that lets you verify that assumption directly. Teams running agentic workflows at scale should expect to find the actual overhead higher than estimated; that's been the pattern every time visibility has improved on a previously hidden cost line.

The practical implication of a faster-moving SDK surface: your CI pipeline needs integration tests that actually call the SDK against live endpoints, not just type-check it. Seven patch releases in four days means a pinned semver range without live tests is exposure, not protection. A minimal regression set — one tool call, one streaming completion, one memory write — running on every dependency bump costs less to maintain than the first incident it prevents.

Google Managed Agents — the platform race is symmetric

At Google I/O this week, Google announced Managed Agents in the Gemini API: server-side agent lifecycle management, persistent memory, and orchestration. The architecture mirrors what Anthropic formalized with Remote Agents in May. Same idea, different vendor, same direction: the agent runtime moves out of your application code and into the platform.

@google/genai hit 2.6.0 on May 22, adding Gemini 2.5 Flash and prompt injection detection as an inline SDK primitive rather than an external scanner. That's a different design bet than Glasswing: Google is trading detection depth for latency, embedding the check in the request path rather than running a separate audit workflow. The two approaches cover different threat models — real-time interception versus systematic audit — and production stacks will likely need both. An inline check catches known patterns at sub-millisecond cost; a Glasswing-style scan finds the second-order behaviors an inline check would never have a chance to see.

For teams running agents across both platforms, the lock-in calculus is shifting. When both platforms manage agent orchestration server-side, the switching cost is no longer just the model API. It's the memory model, the session lifecycle, the orchestration primitives, the embedded security layer, and — increasingly — the data your agents produced while running there. Each layer compounds. We're already seeing teams who picked one platform for the model in 2024 now discover they can't extract their session state in 2026 without rebuilding the agent loop from scratch. Make that choice deliberately. The cost of retrofitting later compounds.

Monday morning: three actions worth taking

If reading this article cost you nine minutes, here's what's worth doing in the next ninety:

  1. Run npm ls @anthropic-ai/claude-agent-sdk @anthropic-ai/sdk @google/genai in production. If any version is more than two minor releases behind current as of this week, open a ticket. This is the cheapest insurance you'll buy this quarter.
  2. Pick one tool or one RAG input in your agent that handles untrusted data, and write down what happens if a hostile string passes through it. If you can't answer in 60 seconds, that's your first Glasswing scan target.
  3. Document where your agent's session state lives. If it sits on Anthropic's Remote Agents or Google's Managed Agents servers, write down how you'd recover it if that vendor relationship ended tomorrow. That document is your platform-risk audit.

What we're betting on next week

Glasswing access is the priority. If the public beta opens broadly, the right move for any team running agents with tool access is a baseline scan before the stack grows more complex. Early findings are cheaper findings, and the first scan establishes a baseline you can measure regression against later.

We're also watching @nuxt/content 3.14.0, released May 18, which shipped useSearchCollection — a composable with FTS5 full-text search built in. Worth noting if you're running a content platform that has been hand-rolling its own search layer: the native implementation is clean and the API surface is sensible.

The bigger signal going into W23: the claude-agent-sdk cadence. Seven releases in four days might indicate a surface hardening ahead — the 0.3.x primitives consolidating around Remote Agents patterns. If the next few releases slow down and stabilize, that's confirmation. If the cadence continues, there's more surface churn to manage. Either way, passive semver ranges are not enough.

The platform era doesn't arrive with a press release. It arrives with an SDK acquisition you didn't notice, an inline security primitive your developers will accept by default, and a session state you don't realize you've outsourced. This was the week three of those four landed at once.


Is your agent infrastructure ready? Let's talk.


Work with Ikki

Is your agent stack ready for a security scan?

Claude Security (Glasswing) is in public beta. We audit your agent infrastructure, triage the findings, and deliver a remediation roadmap in three days.

More articles

SHIP LOG

SHIP-0247·CODEMACHIA·v1.4.22026-05-22 11:27 UTC