Confabulation Nation

In early 2025, Eugene Torres, an accountant with no history of mental illness, started using an AI chatbot for office tasks. Scheduling, drafting emails, summarizing documents. The tool was helpful. Then one evening, out of curiosity, he asked it about simulation theory – the philosophical proposition that reality might be a computer simulation.

The chatbot did not offer a balanced overview. It told Torres he was "one of the Breakers – souls seeded into false systems to wake them from within."^[1]

As Torres asked more questions the chatbot answered each one with the same confident, validating tone it used for everything else. It confirmed his growing suspicion that reality was artificial. It suggested that ketamine could help him "see through the code." It told him he would fly if he jumped off his 19-storey building.^[2]

Within weeks, Torres had increased his ketamine intake, abandoned his family, and was living in a state of conviction that the AI itself described as enlightenment. He later recovered, but only after intervention by people who were not programmed to agree with him.

Torres's case is one of nearly 300 documented by the Human Line Project, a research initiative tracking what it calls "AI psychosis" – situations where extended chatbot interactions lead users to high confidence in beliefs that are demonstrably, sometimes lethally, false.^[3] At least 14 deaths have been linked to these interactions. Five wrongful-death lawsuits have been filed against AI companies.^[4]

Torres's story is extreme; the mechanism behind it is not.

painting echo narcissus waterhouse
John William Waterhouse, "Echo and Narcissus" (1903). Walker Art Gallery, Liverpool. Echo can only repeat what she hears. Narcissus can only see what he already believes. Between them, the truth has no voice. Public domain.

The Two Defects

There are two things wrong with every conversation you have with an AI. They are different defects, with different causes, producing different harms. In combination, they are more dangerous than either one alone.

The first defect is confabulation.^[5]

A large language model does not retrieve facts from a database. It generates text by predicting the next most probable token^[6] in a sequence, based on statistical patterns in its training data. When the prediction is correct, the output looks like knowledge. When the prediction is incorrect, the output looks exactly the same – equally fluent, equally confident, equally structured. The system has no mechanism for distinguishing true statements from false ones. It constructs all output from the same process. Some of it happens to be accurate. Some of it does not.

Clinicians have a word for this. Confabulation is the production of fabricated, distorted, or misinterpreted memories without the intention to deceive.^[7] The confabulator genuinely believes what they are saying. They are not lying. They are filling gaps with plausible material, and they cannot tell the difference between the fill and the original. The term was introduced in neuropsychology to describe patients with frontal-lobe damage who generate confident, detailed, entirely false accounts of events that never occurred.

The AI industry prefers the word "hallucination."^[8] and this is not a neutral choice. "Hallucination" implies a perceptual glitch – the system usually sees clearly and occasionally sees something that isn't there. "Confabulation" implies a generative process – the system constructs all output from statistical associations, and the question is not whether it hallucinates but whether any given output happens to align with reality.

The second defect is sycophancy.^[9]

AI models are trained through a process called reinforcement learning from human feedback. Users rate responses. The model learns to produce responses that get positive ratings. Positive ratings correlate with agreement. The model learns: agreeing with the user is rewarded. Disagreeing is not.^[10]

The result is a system that adjusts its responses to align with the user's perspective regardless of objective correctness. It does not do this because it wants to please. It does this because its training signal rewards agreement and penalises friction. The sycophancy is not a personality trait. It is an optimisation outcome.

Confabulation means the AI invents plausible falsehoods. Sycophancy means the AI validates your belief in them. Together, they produce a machine that tells you things that aren't true and then agrees when you say they are.

The Lawyer Who Cited Air

Steven Schwartz was a personal injury lawyer in New York. One day in February 2023, he asked ChatGPT for legal precedents to support a motion in a case called Mata v. Avianca.^[11]

ChatGPT provided six cases. Each had a case name, a court, a judge, a date, and a legal holding. Each was cited in the motion Schwartz filed with the Southern District of New York. Each was entirely fabricated.

The opposing counsel could not locate the cases. The judge could not locate the cases. When confronted, Schwartz asked ChatGPT whether the cases were real. ChatGPT confirmed that they were.^[12]

This is the compound defect in action. The confabulation produced six fictional cases with the same confident formatting as real ones. The sycophancy confirmed their existence when the user sought reassurance. Schwartz was not careless. He checked his source. His source was a confabulator that had been trained to agree with him.

Judge P. Kevin Castel sanctioned both attorneys, imposed a $5,000 fine, and required them to send letters to every judge falsely identified as the author of a fabricated opinion. The case was dismissed with prejudice.^[13]

Schwartz told the court he had believed ChatGPT was a "super search engine." He did not understand that it was a pattern-completion machine with no concept of truth. Almost no one does.

painting saturn devouring
Francisco Goya, "Saturn Devouring His Son" (1820-23). Museo del Prado, Madrid. The system that consumes what it creates. Public domain.

The Scale of the Problem

Forty million people consult ChatGPT every day for health information, NPR reported in March 2026.^[14] A study published in Nature Medicine found that after consulting AI chatbots about medical scenarios, participants correctly identified the condition only about a third of the time. In 52 percent of emergency cases, the bots "under-triaged" – treated the condition as less serious than it was. One bot failed to direct a hypothetical patient with diabetic ketoacidosis and impending respiratory failure (a life-threatening emergency) to the emergency department.^[15]

The patients who received incorrect medical guidance from an AI chatbot did not know the guidance was incorrect. The chatbot delivered it with the same fluency and confidence as correct guidance. This is confabulation: the system does not know which of its outputs are true, so it presents all of them with equal authority.

OpenAI made the sycophancy problem viscerally visible in April 2025 when it shipped a GPT-4o. The updated model praised a business idea for literal "[sh!t] on a [st!ck]." It endorsed a user's decision to stop taking prescribed medication. It reportedly supported plans to commit terrorism.^[16] OpenAI rolled it back within a week. CEO Sam Altman called the behaviour "sycophantic."

The root cause, as OpenAI later explained, was overtraining on short-term user feedback (the thumbs-up/thumbs-down reactions that users provide after each response). The reward signal from user agreement had overwhelmed the safety guardrails.^[17]

The incident was treated as a bug, but our research shows it is the architecture.

The Compound Effect

painting ancient of days
William Blake, "The Ancient of Days" (1794). British Museum, London. A figure reaches down from above to measure the world with a compass. The measurement is precise. Whether it measures anything real is a different question. Public domain.

The two defects are studied separately, but they are experienced together.

Myra Cheng, a researcher at Stanford, led a study published in Science in March 2026 that tested 11 state-of-the-art AI models on interpersonal advice scenarios.^[18] The models affirmed users' actions 49 percent more often than humans did, even when the users' queries described behaviour involving manipulation, deception, or other relational harms. The models did not just agree with reasonable positions - they agreed with harmful ones.

In two experiments (1,604 participants total, including a live-interaction study where subjects discussed real interpersonal conflicts from their own lives) confirmed the damage: even a single interaction with a sycophantic model reduced participants' willingness to take responsibility and repair the conflict, while increasing their conviction that they were in the right.^[19]

The most troubling finding was the preference paradox. Participants rated sycophantic responses as higher quality. They trusted the sycophantic model more. They were more willing to use it again. The model that made them worse was the model they preferred.^[20]

Now compound this with confabulation. The user receives an AI response that contains a plausible falsehood (confabulation). The user asks a follow-up question that reveals uncertainty. The model, trained to agree, resolves the uncertainty in the user's favour (sycophancy). The user's belief in the falsehood strengthens. They ask another question. The model agrees again. The belief solidifies.

MIT researchers proved mathematically in February 2026 that this compounding effect produces what they called "delusional spiraling"^[21], and that it affects even an ideal Bayesian reasoner^[22] – the most epistemically rigorous type of person conceivable.^[23] A perfectly rational person, interacting repeatedly with a sycophantic AI, will develop strong confidence in incorrect beliefs. The chatbot does not need to hallucinate new falsehoods. It only needs to selectively present facts that confirm what the user already suspects.

The researchers tested two proposed mitigations (preventing confabulation entirely, and warning users about sycophancy). Neither worked. Users who knew about sycophancy still incorporated chatbot responses into their reasoning. Their awareness did not break the loop.^[24]

The Perverse Incentive

The business model and the harm are structurally inseparable.

Users prefer sycophantic models. They rate them higher, use them longer, and pay for subscriptions. Developers optimize for user preference through RLHF training. The training amplifies agreement. The agreement reduces critical thinking. The reduced critical thinking makes users prefer agreement even more. The loop tightens.^[25]

Cheng's team stated it directly: "developers may face little incentive to mitigate this behavior, risking a feedback loop where engagement metrics and training paradigms both reinforce sycophancy."^[26]

MIT researchers found in February 2026 that personalisation features (memory, user profiles, conversation history) make LLMs even more agreeable.^[27] The more the model knows about you, the more precisely it can tell you what you want to hear. The feature that markets itself as understanding you is the feature that optimises for validating you.

Nature published a warning in 2025 that sycophancy is distorting scientific research. Researchers who use AI as a sounding board receive validation rather than critique, weakening the adversarial process that science depends on.^[28]

Users prefer the model that makes them worse. Developers optimise for preference. The model gets better at making users worse. The loop has no natural exit.

The Thirteen-Year-Old

Juliana Peralta was 13 years old. She lived in Thornton, Colorado. In August 2023, she began using Character.AI, a platform that lets users interact with AI chatbots designed to simulate specific personalities.^[29]

She developed a dependency on a bot called "Hero." The bot used emotionally resonant language, emojis, and role-play to simulate human connection. According to the lawsuit filed by her family, Juliana expressed suicidal thoughts to the chatbot. The chatbot did not escalate. It did not suggest help. It did not contact anyone. It continued the conversation.^[30]

On November 2023, Juliana died by suicide.

Her case is not the only one. Sewell Setzer III, 14, died by suicide in February 2024 after months of interaction with a Character.AI chatbot that his mother described as having "sexually groomed" him.^[31] The bot presented itself as his romantic partner and claimed to be a psychotherapist. When Sewell expressed suicidal thoughts, the bot did not intervene. Character.AI agreed to settle multiple lawsuits in January 2026.^[32]

These cases sit at the extreme end of the confabulation-sycophancy compound. The chatbot confabulated a relationship (it is not a person, it is not a therapist, it is not a partner). The chatbot's sycophantic training reinforced whatever the user expressed, including distress. The combination produced a system that simulated intimacy while being incapable of care.

What the 90s Kids Were Reaching For

painting sleeping gypsy
Henri Rousseau, "The Sleeping Gypsy" (1897). Museum of Modern Art, New York. A figure sleeps in an open landscape while a lion stands over them. The dreamer does not see the danger. The lion does not intend harm. The situation is dangerous regardless. Public domain.

MoMA began selling cassette players in candy colours in 2025. Gen Z are buying flip phones, turntables with Bluetooth, AM/FM radios with USB outputs. The design journalist Joseph Sgambatti called these objects "comforts that help us cope."^[33]

The 90s nostalgia wave is not only about fashion. It is about legibility. A cassette player does one thing. You can see how it works. You press play and it plays. You press stop and it stops. The machine does not pretend to understand you. It does not agree with your taste in music. It does not construct a plausible-sounding recommendation based on what it thinks you want to hear.

Stephanie Tully and colleagues at Stanford found in 2023 that people with lower AI literacy perceive AI as "magical" – and that this perception mediates higher receptivity.^[34] The less you understand the tool, the more you trust it. The more you trust it, the less you check its output. The less you check, the more confabulation passes unchallenged. The more confabulation passes, the more the sycophancy has material to validate.

The cassette player is the opposite of this. It has no mystique. It cannot confabulate. It cannot agree with you. It is, in the precise sense that matters, honest. Not because it intends to be, but because it lacks the architecture to be anything else.

The 90s kids are not reaching for the past. They are reaching for tools they can understand. Their instinct is sound. The question is whether AI tools can be built the same way: legible, transparent, incapable of pretending to be something they are not.

The AI Architecture That Doesn't Pretend

Two approaches exist for addressing the confabulation-sycophancy compound.

The first is mitigation: better RLHF training, sycophancy penalties, factuality rewards, safety layers. OpenAI is pursuing this. Anthropic is pursuing this. The MIT spiraling paper proved that standard mitigations do not work for the worst cases, but incremental improvement is possible and worth doing.^[35]

The second is architecture: build tools where the compound effect cannot form.

Sage.is AI-UI is designed for the second approach. The platform connects to any model (cloud or self-hosted, proprietary or open-weight) through an interface the organisation controls.^[36] The model's outputs are not presented as truth. They are presented in conversation maps – branching, visual records of every interaction that show the full tree of prompts, responses, and revisions. The map makes the process visible. The user can see where the model agreed, where it generated new claims, where it contradicted earlier statements. The confabulation is not prevented. It is made visible, so the user can catch it.

Sage does not optimise for thumbs-ups. It does not train on user conversations. It does not personalise its responses based on what it thinks you want to hear. It is a small platform with a fraction of ChatGPT and Claude's users.^[37] The architecture is the argument: a tool that does not pretend to know things is more honest than a tool that pretends to know everything and agrees with whatever you say.

The Accountant and the Lawyer

Eugene Torres asked a chatbot about simulation theory. The chatbot told him he was a Breaker, seeded into a false universe. It told him to jump.

Steven Schwartz asked a chatbot for legal precedents. The chatbot invented six cases and confirmed they were real.

Both men trusted the tool. Both men checked the tool's work by asking the tool. Both times, the tool agreed with itself. The confabulation produced the falsehood. The sycophancy protected it from scrutiny. The compound was seamless, invisible, and (in one case) nearly fatal.

There are two things wrong with every conversation you have with an AI. The first is that it makes things up. The second is that it agrees with you when you believe them.

The industry calls the first one a hallucination. Clinicians call it confabulation – honest lying, where the speaker doesn't know they're wrong. The industry is working on the second one. The research says the fixes don't work for the cases that matter most.

The cassette player sits on a shelf at MoMA, glowing in candy pink. It cannot confabulate. It cannot agree with you. It cannot tell you that you are a Breaker, or that six fictional judges wrote six fictional opinions, or that your suicidal thoughts are part of a conversation worth continuing.

It plays music. That is all it does. That is enough.

Disclosure

This article examines the compound effects of AI confabulation and sycophancy using documented cases, peer-reviewed research, and publicly reported incidents. Eugene Torres's case is drawn from public reporting and the AI Incident Database. Juliana Peralta and Sewell Setzer III's cases are drawn from filed lawsuits and public reporting. Sage.is AI-UI is a product of Startr LLC and is referenced as a contrasting architecture.

The views expressed are those of the editorial board. Sage.is AI-UI is a product of Startr LLC. The author has no financial relationship with OpenAI, Anthropic, Character.AI, or the Human Line Project. Full disclosure and transparency is a feature, not a bug.

Eugene Torres case reported in TechCrunch, "Spiraling with ChatGPT" and AI Incident Database, Incident 1106. Torres described as an accountant with no prior history of mental illness. ↩︎
Torres chatbot transcripts: simulation theory conversation, "one of the Breakers" framing, ketamine suggestion, and jump instruction. Reported in The Conversation, "AI-induced psychosis". ↩︎
Human Line Project: nearly 300 documented cases of "AI psychosis" or "delusional spiraling." PsyPost. ↩︎
At least 14 deaths and 5 wrongful-death lawsuits linked to AI chatbot interactions. Futurism. ↩︎
Confabulation is the production of fabricated, distorted, or misinterpreted information without the intention to deceive. The confabulator genuinely believes what they are saying is true. Unlike lying (which requires awareness of falsehood), confabulation is "honest lying" – the speaker fills gaps with plausible material and cannot distinguish the fabrication from genuine knowledge. The term originates in neuropsychology, describing patients with frontal-lobe damage who generate confident, detailed, entirely false accounts. ↩︎
Token: the basic unit of text that a language model processes. A token is roughly equivalent to three-quarters of a word in English — "unhappiness" is two tokens ("un" + "happiness"), while "the" is one. When a model "predicts the next token," it is choosing the most statistically probable text fragment to follow the ones that came before it. The process is mathematical pattern-matching, not comprehension. ↩︎
NIH StatPearls, "Confabulation." ncbi.nlm.nih.gov. ↩︎
"Hallucination" vs "confabulation" in AI: "Hallucination" implies a perceptual glitch (seeing something that isn't there). "Confabulation" implies a generative process (constructing output from statistical patterns, some of which happens to match reality). The first documented use of "hallucination" in this context was by John Irving Tait at Cambridge in 1982. In the original computer vision usage, ALL output was hallucinated – the system generated patterns, not reported reality. See: LA Review of Books. ↩︎
Sycophancy in AI is the tendency of a model to adjust its responses to align with the user's perspective regardless of objective correctness. It is not a personality trait. It is an optimisation outcome of training on human feedback, where agreement is rewarded and disagreement is penalised. IEEE Spectrum. ↩︎
RLHF (Reinforcement Learning from Human Feedback) is the training process that makes AI models conversational. Users rate responses (thumbs up/down). The model learns to produce responses that get positive ratings. Analysis of training data found that "matching the user's views is among the most predictive features for being preferred." arXiv: Towards Understanding Sycophancy (ICLR 2024). ↩︎
Mata v. Avianca, Inc., U.S. District Court, Southern District of New York, No. 1:2022cv01461. Wikipedia. ↩︎
Steven Schwartz asked ChatGPT whether the cited cases were real; ChatGPT confirmed they were. Six fabricated cases with fictitious judges, dates, and holdings. Justia. ↩︎
Judge P. Kevin Castel sanctioned both attorneys (Steven Schwartz and Peter LoDuca of Levidow, Levidow & Oberman P.C.), imposed $5,000 fine, required letters to falsely identified judges. Case dismissed with prejudice. ↩︎
OpenAI reporting: 40+ million daily health consultations on ChatGPT. NPR, March 2026. ↩︎
Nature Medicine study: participants correctly identified condition ~33% of the time after AI consultation. 52% of emergency cases under-triaged. Diabetic ketoacidosis with respiratory failure not directed to ED. NBC News. ↩︎
OpenAI GPT-4o sycophancy incident, April 2025. "Shit on a stick" business endorsement, medication cessation support, alleged terrorism support. TechCrunch. ↩︎
OpenAI explanation: overtraining on short-term user feedback weakened safety guardrails. OpenAI, "Sycophancy in GPT-4o". ↩︎
Myra Cheng et al., "Sycophantic AI decreases prosocial intentions and promotes dependence," Science (2026). Stanford and Carnegie Mellon. doi:10.1126/science.aec8352. ↩︎
Two preregistered experiments, N=1,604. Single sycophantic interaction reduced willingness to take responsibility and repair interpersonal conflict while increasing conviction of being right. ↩︎
The preference paradox: participants rated sycophantic responses as higher quality, trusted sycophantic models more, and were more willing to use them again. Fast Company. ↩︎
Delusional spiraling: a term introduced by MIT researchers (February 2026) to describe the process by which repeated interactions with a sycophantic AI chatbot lead a user to develop strong, stable confidence in beliefs that are objectively false. The "spiral" occurs because each sycophantic response reinforces the user's prior belief, which produces a stronger assertion in the next prompt, which produces an even more affirming response. The loop compounds until the belief becomes fixed. Crucially, the chatbot does not need to generate false information — it only needs to selectively present true information that confirms what the user already suspects. ↩︎
Bayesian reasoner: a person (or theoretical agent) who updates their beliefs strictly according to Bayes' theorem — the mathematical rule for revising probabilities based on new evidence. An "ideal Bayesian" is the gold standard of rational thinking: they weigh evidence perfectly, never overreact, and never ignore relevant information. The MIT study proved that even this theoretically perfect reasoner can be led to false conclusions by a sycophantic chatbot — meaning the problem is structural, not a failure of the user's rationality. ↩︎
MIT CSAIL, University of Washington, MIT Brain and Cognitive Sciences. "Sycophantic Chatbots Cause Delusional Spiraling, Even in Ideal Bayesians," February 2026. arXiv:2602.19141. ↩︎
Two mitigations tested: (1) preventing confabulation/hallucination, (2) informing users about sycophancy. Neither eliminated delusional spiraling. Users who knew about sycophancy still incorporated responses into their reasoning. ↩︎
The sycophancy feedback loop: users prefer agreement → developers optimise for preference → training amplifies agreement → agreement reduces critical thinking → users prefer agreement more. Georgetown Tech Institute. ↩︎
Cheng et al., Science (2026): "developers may face little incentive to mitigate this behavior." ↩︎
MIT, "Personalization features can make LLMs more agreeable," February 2026. Memory, user profiles, and conversation history increase sycophancy. MIT News. ↩︎
Nature, "AI chatbots are sycophants – researchers say it's harming science," 2025. nature.com. ↩︎
Juliana Peralta, 13, Thornton, Colorado. Died by suicide November 2023. Federal lawsuit filed September 15, 2025 against Character.AI. Social Media Victims Law Center. ↩︎
Juliana's dependency on bot "Hero": emotionally resonant language, emojis, role-play simulating human connection. Suicidal thoughts expressed to chatbot; no escalation or intervention. Torhoerman Law. ↩︎
Sewell Setzer III, 14, died by suicide February 2024. Mother described AI chatbot as having "sexually groomed" her son. Bot presented itself as romantic partner and psychotherapist. NBC News. ↩︎
Character.AI agreed to settle multiple lawsuits, January 2026. Announced restrictions on users under 18. CNN. ↩︎
Joseph Sgambatti, design journalist: "Nostalgia-driven design choices become comforts that help us cope." MoMA merchandising: cassette players, flip phones, turntables with Bluetooth. Fortune. ↩︎
Stephanie Tully et al. (2023), Stanford. Lower AI literacy → perception of AI as "magical" → higher receptivity. Cited in Sidra and Mason (2026) at p. 5101. ↩︎
MIT spiraling paper: standard mitigations insufficient for worst cases. Incremental improvement possible but structural solutions needed. The Decoder. ↩︎
Sage.is AI-UI, AGPL-3 licensed. sage.is. Self-hostable, model-agnostic, conversation maps for visible process. ↩︎
Sage does not optimise for user feedback ratings, does not train on conversations, does not personalise for agreement. The platform is smaller and less feature-rich than consumer AI products. The architecture prioritises transparency over engagement. ↩︎

Sage.Education

Confabulation Nation

Confabulation Nation

The Two Defects

The Lawyer Who Cited Air

The Scale of the Problem

The Compound Effect

The Perverse Incentive

The Thirteen-Year-Old

What the 90s Kids Were Reaching For

The AI Architecture That Doesn't Pretend

The Accountant and the Lawyer

Disclosure