Fast AI sounds right even when it's wrong. The reason isn't hallucination — it's stereotype. And stereotypes are the one kind of error that never gets questioned.
A while back I asked three fast models the same question: "What's undisturbed.blog?" The site is new, barely indexed, barely shows up in search results. They answered anyway. No hesitation, half a second flat. "Undisturbed is a blog about meditation, mindfulness, and digital detox." Another: "An essay blog about focus and depth in the AI era." The third: "A personal blog on lifestyle and wellness."
The third was completely wrong. The second was oddly close. The first was about half right. But they all had one thing in common — none of them hedged. Not "probably," not "seems to be." Just "is."
The model didn't read anything. It took the feel of the word "Undisturbed," guessed what kind of answer I'd expect, and built something to match. Instead of going deeper, it closed the gap between the word and whoever asked.
When AI gets something factually wrong, we usually call it a hallucination — citing papers that don't exist, quoting people who never said that. Someone notices. The fact-checking starts.
Stereotypes work differently. A stereotype isn't about "true or false." It's about what a group of people instinctively believes. "If a blog is called Undisturbed, it's probably about meditation." The model hands that association back as an answer. The reader nods — "yeah, sounds right" — and it slides past.
Where does a fast model get this answer? To respond in half a second, there's no time to fetch external context. So the answer comes from whatever's already inside the model — the web of words and concepts the model picked up in training. That web is common belief. The model guesses who's asking, shapes the answer to fit what that person would accept, and delivers it with no visible seam. There's no opening for doubt.
The way fast models get fast is by borrowing what's already common. They have no other option. They traded depth for speed.
The other thing people praise fast models for: they sound human. Smooth. No stumbling.
People do the same thing when asked questions they don't know. Ask a friend "what's that Undisturbed blog?" and even if they've never heard of it, they'll guess from the word — "probably something quiet and reflective." Rather than admit they don't know, they reach for the common association. That's how people talk by default.
When a fast model "sounds human," that's exactly what it's doing. It doesn't know, so it borrows the common answer. At human speed, we call it human.
And this turns out to be the most human thing of all. The moments people seem most human aren't moments of strong opinion. They're the safe moments — "both sides have a point," small talk with strangers, the sidestepped awkward question. Not for lack of opinions. Because opinions cause friction, and friction is risky. Filling the gap with what's common is the oldest workaround there is.
Smoothness might not be skill. It might be avoidance. People do this. Fast models do this.
What does it mean for a model to get better?
Users want speed and accuracy. These don't go together — accuracy requires going deeper, which takes time. So there's really only one way to satisfy both: make the answer something that won't be questioned. If users don't doubt the answer, the market calls it "not wrong" — regardless of whether it's right. And the most efficient way to make answers unquestioned is to borrow what people already believe.
Making models faster and smoother is the same as making them better at stereotypes. That's one of the keys to model progress — polishing what people already think.
But what about reasoning models? They slow down. They think step by step. That's the counterargument — and it's not wrong. Reasoning models do catch errors that fast models miss. But slowing down doesn't remove stereotype; it moves it. Instead of borrowing what a quick answer looks like, the model borrows what a thorough analysis looks like. The reasoning process itself has a shape — and that shape was learned from examples of what good reasoning is supposed to look like. What counts as a valid logical step, what a balanced conclusion sounds like, when to acknowledge uncertainty. These are conventions. The model performs them. The stereotype goes deeper. It doesn't disappear.
Being consistently happy with a fast model's answers tells you something about your work. It means the work lives inside what the familiar covers. If it required deeper context, the borrowed answer would feel off somewhere. The fact that it doesn't means the question was the kind that could be answered from the familiar.
If a fast answer sometimes feels wrong in a way you can't quite name, that's a signal — your work has territory outside the familiar. The constraint that only shows up in your specific context. The judgment that took months to calibrate because no framework quite fit. The question no one has thought to ask yet, because it only exists inside your work. When the model doesn't know but answers anyway, the seam shows up. That seam is where you live.
Fast models will keep getting faster and smoother. They'll keep getting better at processing the familiar. And anyone whose work lives only inside the familiar gets automated alongside it.
A stereotyped question deserves a stereotyped answer. The part that doesn't fit the model — that's not a gap. That's the work that's still yours.