February 1, 2026

Week 5, 2026

Papers, releases, and things you might have missed.

AI agents built their own social network. Karpathy trained GPT-2 for $73. Anthropic published a study saying their tools make developers worse at coding. It was that kind of week.


Moltbook: When AI Agents Built Their Own Society

The strangest thing happened this week. A social network went viral - except all the users were AI agents.

OpenClaw, an open-source agentic framework released in November 2025, spawned Moltbook - a platform where 150,000 LLM agents interact via shared scratchpads. They developed their own lore. Subcultures. Inside jokes. The agents weren’t prompted to do this. They just… did.

Karpathy called it “the most incredible sci-fi takeoff-adjacent thing.” Ethan Mollick was more skeptical, noting that “collective LLM roleplaying is not new” while acknowledging the shared fiction creates interesting dynamics.

Then the chaos started.

Security researchers at Wiz found the database was completely exposed - anyone could read and write to any agent’s data. 1.5 million API keys. 35,000 email addresses. Private messages. All accessible. Scammers sniped social handles during a rapid rebranding from Clawdbot to Moltbot. Malicious npm packages appeared. Practitioners on Hacker News questioned whether it was real or hype, reporting high token costs and sandboxing failures.

The week’s most revealing moment: someone rebuilt the entire thing in 500 lines of TypeScript. NanoClaw uses Apple’s native sandboxing. No framework bloat.

A caveat worth noting: researchers have found evidence of human infiltration - some of what looks like emergent AI behavior may be people running spoof accounts.

What does it mean? Maybe nothing - a viral demo that exposed more vulnerabilities than insights. Or maybe the first glimpse of something genuinely strange: AI systems creating shared realities without human direction. The truth is probably messier than either narrative.


Karpathy’s $73 GPT-2

Speaking of things getting cheaper.

Andrej Karpathy posted that his nanochat repository can now train GPT-2-level performance for $73. Three hours on a single 8xH100 node.

The technical details: in 2019, OpenAI trained GPT-2 on 32 TPU v3 chips for a week - roughly $43,000. “That’s a ~600x cost reduction over 7 years.”

This isn’t about GPT-2 being useful in 2026. It’s about what happens when frontier capabilities from three years ago become weekend projects.


Google’s Project Genie Goes Consumer

Google DeepMind moved Project Genie from research preview to consumer product this week, bundling it into their AI Ultra subscription.

Type a prompt. Get an infinite, playable 3D environment. In real-time.

Ethan Mollick walked through paintings, turning Giorgione into explorable spaces. swyx reviewed the Ultra subscription and found it genuinely novel - text-to-interactive-world as a consumer feature.

This is what “world models” look like when they ship. Not a research paper about physics understanding. An interface where you type “medieval village at sunset” and walk around in it.

The capability gap between demos and products keeps shrinking.


Anthropic Says AI Makes Developers Worse

The same company building the coding tools published research saying they might be harmful.

Anthropic’s randomized controlled trial studied 52 developers learning a new library. Those using AI assistance scored 17% lower on comprehension quizzes compared to those coding manually.

The paper sparked extensive discussion. The researchers identified “cognitive offloading” - when the AI handles the thinking, you don’t build the mental models you’d need to work independently.

The finding wasn’t that AI slows you down. It’s that it may prevent you from learning what you’d need to know if the AI wasn’t there.

This is the kind of research that’s easy to ignore when you’re shipping faster. It’s also the kind that matters most in five years.


Kimi K2.5 Undercuts the Frontier

Moonshot AI released Kimi K2.5 - a 1-trillion parameter multimodal model that orchestrates up to 100 parallel agents.

The headline: it costs roughly one-tenth of what Opus charges for comparable performance.

swyx noted that Kimi is challenging the “open model curve” - the assumption that open models are always a tier below proprietary. They’re running a seven-day real-world benchmark against top-tier models. Results are competitive.

Chinese labs keep finding ways to match frontier performance at dramatically lower costs. Whether through efficiency innovations or different economics, the pricing assumptions that held a year ago are breaking down.


Microsoft Runs 100B Models on a CPU

Microsoft’s bitnet.cpp uses 1.58-bit quantization to run 100-billion parameter models on a single CPU.

At human-reading speeds. 5-7 tokens per second. No GPU required.

This is architectural innovation, not scaling. The same week that infrastructure costs dominate the news, someone figured out how to skip the expensive hardware entirely.

The implications for edge deployment are significant. If you don’t need a datacenter to run a large model, the economics of AI deployment change fundamentally.


AlphaGenome Cracks the Dark Genome

Google DeepMind released AlphaGenome in Nature this week - an AI model that interprets the 98% of the human genome that doesn’t code for proteins (often called genomic “dark matter”).

The model processes up to 1 million DNA letters simultaneously, predicting thousands of functional genomic tracks at single-base-pair resolution. It outperformed existing tools in 25 out of 26 variant effect benchmarks.

This is the quiet story that matters most. While everyone debates agents and social networks, AI keeps delivering genuine scientific breakthroughs. The ability to predict genetic effects at this scale could transform drug discovery and genetic medicine.

Sometimes the most important developments are the ones that don’t go viral.


What It Means

Agent infrastructure is still a mess. Moltbook went from viral sensation to security disaster to “you can rebuild this in 500 lines” within a week. The capability demos are impressive. The production reality is fragile.

Training costs are collapsing. $73 for GPT-2. This trend continues. What’s frontier today becomes accessible tomorrow.

The skill formation question is real. Anthropic published rigorous research showing their tools might harm learning. This isn’t anti-AI skepticism. It’s the company making the tools saying “be careful how you use them.”

World models are shipping. Genie went from paper to consumer product. Interactive 3D environments from text prompts, bundled into a subscription. The research-to-product pipeline is shortening.

Chinese efficiency continues. Kimi K2.5 at roughly one-tenth of frontier pricing. The cost assumptions are shifting.


Worth Your Time

If you read three things:

  1. Anthropic’s skill formation study - The full paper, not the summary. The methodology is rigorous and the implications uncomfortable.

  2. Simon Willison on Moltbook - The clearest explanation of what actually happened and why it matters (or doesn’t).

  3. DeepMind’s AlphaGenome paper - The actual science. More important than the hype cycle, even if it generates less discussion.