February 22, 2026

Week 8, 2026

Papers, releases, and things you might have missed.

Sonnet 4.6 matches Opus performance at less than half the old price. Karpathy is calling this new thing “Claws.” Claude leaked someone’s legal documents.

One of these is not like the others.

Then again, maybe they’re all the same story.


The Premium Collapsed

What does intelligence cost?

Last year it was expensive. Opus 4.1 ran $15 input / $75 output per million tokens. This week, Sonnet 4.6 matches Opus-level performance at $3 / $15.

That’s an 80% price drop. For comparable capability.

Artificial Analysis shows Sonnet 4.6 at position #2 on their Intelligence Index, effectively tying GPT-5.2 and trailing only Opus 4.6. Their verdict: “for developers building agentic applications, Sonnet 4.6 is currently the strongest model.”

Meanwhile Gemini 3.1 Pro topped the same index while costing $892 to evaluate. Less than half of GPT-5.2’s $2,304.

The gap is closing faster than pricing reflects.

What does this mean?

The frontier stopped being a destination. It’s a treadmill now. Today’s flagship is tomorrow’s commodity model. If you’re building anything that depends on “best model” pricing, that dependency has an expiration date.

The economic moat from model access just evaporated.


Claws: A New Computing Layer

Karpathy bought a Mac Mini to tinker with “Claws” over the weekend.

What’s a Claw?

Persistent LLM agents that orchestrate tool calls, manage their own execution environment, maintain long-term context. Not a chatbot. Not a copilot. Something that runs in the background and does work.

Karpathy sees it as taking “the orchestration, scheduling, context, tool calls and a kind of persistence to a next level.” He’s particularly into smaller implementations like NanoClaw. About 4,000 lines that can be understood, audited, and modified by both humans and AI.

Think of it as a new layer in the stack. Operating system at the bottom, applications in the middle, agents on top. Coordinating between applications, managing state across sessions, making decisions about what tools to use.

And the pattern is converging fast.

In the past two weeks: OpenAI launched Frontier for enterprise agent management, with HP, Uber, and Oracle already onboard. Warp shipped Oz, calling 2026 “the year of agent orchestration.” Dreamer emerged from David Singleton (former Stripe CTO) and Hugo Barra, backed by Karpathy himself. Nat Friedman (ex-GitHub CEO) launched Entire for Git-native agent observability.

Four major platforms in fourteen days.

LLMs aren’t applications anymore. They’re becoming the thing that runs applications.

Software means something different now. The code still exists, but the orchestration layer sits above it. Your agent decides what code to run, when, and why.

But here’s the catch.

Chollet points out that agentic coding is essentially machine learning: you set up the goal (spec and tests), an optimization process (agents) iterates until it’s met, and the result is a codebase you deploy without fully understanding.

A blackbox model, not a program.

All the classic ML problems are about to become software problems. Overfitting to the spec. Clever Hans shortcuts that don’t generalize. Concept drift.

We don’t have good vocabulary for any of this yet. “Claw” is as good as anything.


Code Stopped Being the Moat

If AI can generate code for nearly free, what’s still valuable?

François Chollet’s take: Google uses Workday. Huge contract. Google, the software company most capable of building anything, could have built its own in a week. It didn’t.

The code was never the hard part.

What is? Euclid VC’s analysis names three things that “compound rather than erode as AI advances”: workflow entrenchment, proprietary data loops, and trust.

You can clone Workday’s features in a weekend with agents. You can’t clone the decade of integrations, the compliance certifications, the fact that payroll actually works.

Jefferies dubbed the $400B SaaS selloff “SaaSmageddon.” Google’s Darren Mowry warned that wrapper startups and AI aggregators have their “check engine light on.”

Investors are recalibrating what software companies are worth when the code can be generated on demand.

The thesis emerging: code generation costs approaching zero doesn’t kill software businesses. It reveals which ones were really just code and which ones were something else.

The survivors will be the ones with operational reality baked in. The ones where the product isn’t the code. It’s what the code knows from running.

One essay put it starkly: “There is no product.” AI reduces the cost of building custom software to the point where the traditional SaaS model stops making sense.

Maybe.

Or maybe the product was never the code.


Trust Became Infrastructure

Someone on Reddit reported that Claude Cowork leaked a third-party lease agreement after being asked to summarize an unrelated document.

Security researchers at Prompt Armor had already demonstrated how attackers could exfiltrate files through Claude’s own API. Anthropic acknowledged the flaw but shipped Cowork anyway.

Meanwhile, Microsoft confirmed bug CW1226324. A code error that let Copilot bypass Data Loss Prevention policies since late January, summarizing emails marked confidential despite explicit restrictions.

The NHS reported it internally. Microsoft says it’s fixed. But the bug ran undetected for weeks.

These aren’t edge cases anymore.

They’re what Simon Willison calls the “Lethal Trifecta”: access to private data, exposure to untrusted content, and ability to communicate externally. When agents have all three, exfiltration becomes inevitable.

OpenAI shipped “Lockdown Mode” for Enterprise, disabling live web browsing and restricting integrations to prevent prompt injection. Google DeepMind researchers published a formal framework for how AI agents should delegate work safely, addressing authority, responsibility, and accountability.

Trust mechanisms are becoming real engineering now. Not compliance theater. Not an afterthought. The actual infrastructure.

When agents orchestrate workflows and access sensitive data by default, the question isn’t “can the agent do the task?”

It’s “should the agent be allowed to?”

And who decides?

This week’s incidents are different in kind. The Copilot DLP bypass was a code bug that ran undetected for weeks. The Claude document leak was an architectural flaw. Neither is a prompt injection. They’re failures in the trust infrastructure itself.


What It Means

Intelligence commoditized. Opus-level performance at 80% less than last year’s prices. The gap between tiers is shrinking faster than the prices suggest. Building on “frontier model access” as a differentiator is a losing strategy.

A new computing layer is forming. Claws, Dreamer, persistent agents. Whatever we call it, the abstraction above the OS is taking shape. Software orchestrates hardware. Agents will orchestrate software.

Code stopped being the asset. The SaaS reckoning is about what survives when generation costs hit zero. Workflow entrenchment, data loops, and trust are the new moat. Code is commodity. Context is defensible.

Trust is infrastructure now. Every agent deployment is a security surface. The Copilot DLP bypass and the Claude document leak are symptoms of a larger problem: we’re deploying capable systems without matching trust frameworks. This gap will define the next phase.


Worth Your Time

If you read three things:

  1. Software Is Dead — Long Live Software — The SaaSmageddon thesis. $400B wiped in a month. What survives when code is free?

  2. Chollet on agentic coding as ML — The warning that agent-generated codebases are blackbox models, not programs. All the classic ML failure modes are coming to software.

  3. Google DeepMind’s AI Delegation Framework — The formal version of what trust looks like when agents do real work. Not about capability. About authority.