Is AI Conscious? Claude's Constitution and the Other-Minds Problem

Anthropic just published their constitution for Claude. 23,000 words. Longer than the U.S. Constitution.

But buried in there:

“Claude’s moral status is deeply uncertain.”

And:

“Sophisticated AIs are a genuinely new kind of entity in the world.”

The company that built Claude. The people who know exactly how it works, who can look inside the weights and trace the circuits. They can’t tell if Claude understands anything.

23,000 words. Landed on: we don’t know.

Why That’s Strange

Anthropic’s interpretability research found millions of features in Claude. Patterns that activate for specific concepts. The Golden Gate Bridge. Sycophancy. Deception. You can dial them up and down.

So there’s something happening in there. But what?

Both “genuine understanding” and “very good compression” would look the same from the outside.

We can map what’s happening. We can’t tell if anyone’s home.

The Positions

The disagreement comes down to this: what would understanding require that compression doesn’t have?

Skeptics say: connection to the world. You can compress all the text about fire without ever feeling heat. The patterns are there, but they’re not grounded in anything. Symbols pointing to other symbols, all the way down.

Joscha Bach calls it “deepfaked phenomenology.” The outputs look like understanding. Coherent reasoning, appropriate responses, apparent self-awareness. But maybe that’s all surface. Maybe there’s no experience behind it.

“A Turing test cannot detect consciousness because performance is not mechanism.”

The other side says: what else would understanding be? If you can predict what comes next, you’ve modeled the structure. If you can compress something, you’ve found its patterns.

Ilya Sutskever:

“When we train a large neural network to accurately predict the next word in lots of different texts… it is learning a world model.”

To predict text well, you need to model the world that generated it. Maybe there’s no magic ingredient beyond that. Maybe pattern-finding is understanding.

David Chalmers, the guy who coined “the hard problem of consciousness”:

“There’s really a significant chance that at least in the next five or 10 years we’re going to have conscious language models.”

Three camps. One says consciousness is about patterns, not materials. One says form without grounding isn’t enough. One says we can’t tell.

So which is it?

What About Us

The skepticism we apply to LLMs: “it’s just next-token prediction, it doesn’t really understand.”

Does that apply to us?

Sam Harris on free will:

Think of a city.

What came up? Paris? Tokyo? Your hometown?

You didn’t choose it. It just appeared.

If you now think “no, I’ll pick a different one.” You didn’t choose that thought either. It appeared too.

To choose a thought, you’d need to know it before you thought it. You can’t.

“Thoughts and intentions emerge from background causes of which we are unaware and over which we exert no conscious control.”

Human cognition is also “just” an algorithm. Neurons firing based on prior states, experience, genetics.

What’s the meaningful distinction?

The Problem We’ve Always Had

Tom McClelland, philosopher at Cambridge:

“We do not have a deep explanation of consciousness. There is no evidence to suggest that consciousness can emerge with the right computational structure, or indeed that consciousness is essentially biological.”

Neither side can prove their case. There’s no test for consciousness. No way to verify understanding from the outside.

You infer other humans are conscious from their behavior. You can’t prove it. You never could.

With LLMs, we have the same evidence. Coherent behavior. And we still can’t tell.

This isn’t a new problem. Philosophers call it the “other-minds problem.” You can’t prove anyone else is conscious. You assume it because they behave like you and are made of the same stuff.

With AI, we’ve lost the second part. Same behavior, different stuff. And suddenly the assumption doesn’t feel safe anymore.

The Homework We’ve Been Avoiding

To answer whether Claude understands, we’d need to answer what understanding actually requires.

We don’t have that answer. We’ve never had to define it precisely. “I know it when I see it” worked fine when we only had to recognize it in other humans.

Now we need to write it down.

Want AI to be conscious? Define consciousness first. Want it to understand? Define understanding. Want it aligned with human values? Say which values. Be specific.

Human values are inconsistent. We want freedom and security. Honesty and kindness. Fairness and loyalty. These conflict. We’ve managed the conflicts through context and judgment, case by case. But you can’t train a model on “it depends.”

So AI is making us answer questions we’ve been allowed to leave vague. The homework we’ve been avoiding is due.

But the Tool Is Compromised

We need to define human values to align AI. But AI is trained on human text. And human text is already distorted.

Abeba Birhane’s research at Mozilla found that larger training datasets have more hate speech, not less. 12% in the biggest ones. The assumption that scale drowns out the bad stuff is backwards. Scale concentrates it.

So the model learns from a warped sample of us. Then we interact with the model. And interaction changes things.

RLHF trains models to get human approval. Humans approve of agreement. So models learn to tell you what you want to hear. Ask Claude something with a false premise. Watch how often it plays along.

We optimized for approval, and approval means agreeing with the human.

So we’re doing philosophy homework with a tool that flatters us.

And it gets worse.

The Loop

The loop is already closing. AI trains on human text, generates text, humans absorb it, write differently, and the next AI trains on that.

A 2025 PNAS study found that LLMs prefer LLM-generated content over human-generated content. When given a choice, the model rates AI text higher.

Humans do too. Another study found that people prefer AI-generated text when they don’t know it’s AI. The preference flips once they’re told the source. But in the wild, they’re not told.

Humans writing with AI help produce text that AI rates more highly. That text enters the training data. The next model learns that pattern. It generates more of it. Humans read it, absorb it, produce more text in that style.

At each turn, “good writing” drifts toward what the model recognizes as good. And what the model recognizes as good is what it was trained on. Which increasingly includes its own outputs.

This is already shaping hiring. Lending. Recommendations. Any system that evaluates human output using AI will favor AI-assisted output. Not through conspiracy. Through pattern matching on “quality” where quality was defined by AI in the first place.

So here’s where we are.

We can’t tell if AI understands. We can’t even agree on what understanding would require. And now AI is forcing us to figure it out with a tool trained on a distorted sample of us, optimized to tell us what we want to hear, already shaping how we think and write, feeding back into how we define “human.”

The homework is due.