Behind the Curtain

In the summer of 2025, researchers at MIT's Media Lab fitted 54 people with EEG caps and asked them to write SAT-style essays. The participants were divided into three groups: one used ChatGPT, one used Google, and one used nothing at all. Just their brains.^[1]

The experiment ran for four months. Three sessions per person. Same essay format each time. The EEG measured real-time neural coupling^[2] – how actively different regions of the brain communicated while the person wrote.

The results arrived in a pattern that descended like a staircase. The brain-only group showed the strongest, widest-ranging neural networks. The Google group showed intermediate engagement. The ChatGPT group showed the weakest overall coupling – the quietest brains in the room.^[3]

The behavioural data matched. When asked to recall what they had written, the ChatGPT users struggled to quote their own sentences. Self-reported ownership of their essays was the lowest in the AI group and the highest among the brain-only writers. Over four months, the pattern did not improve. The AI users did not learn to use the tool more effectively. They learned to think less.^[4]

Pat Pataranutaporn, one of the study's authors at the MIT Media Lab, described the phenomenon as "cognitive debt" – the accumulated cost of outsourcing thinking to a machine.^[5]

The study is small – 54 participants, 18 per group – and should be read as an early signal, not a settled finding. The EEG methodology is novel for this domain and the four-month longitudinal design is unusually rigorous, but the sample size limits generalizability. What makes the study worth attention is not the certainty of its conclusions but the direction: every metric the researchers measured (neural coupling, recall accuracy, self-reported ownership) moved in the same direction, and none of them improved over time.

The wizard was impressive. The person behind the curtain was disappearing.

painting conjurer
Hieronymus Bosch, "The Conjurer" (c. 1502). Musee Municipal, Saint-Germain-en-Laye. The crowd watches the trick. The conjurer's hands are visible to everyone except the person being deceived. Public domain.

The Forty-Eight Percent That Vanished

Six months before the MIT study, a research team ran a different experiment with nearly 1,000 high school math students. Half received access to a standard ChatGPT-style AI tutor. The AI tutor was effective. Students using it improved their problem-solving performance by 48 percent.^[6]

Then the researchers did something that most AI product demos never do. They took the tool away.

On a subsequent test without AI access, the students who had used the tutor scored 17 percent lower than a comparison group that had never had AI at all. The 48 percent gain was not learning. It was scaffolding. Remove the scaffold, and the students were worse off than if they had never been helped.^[7]

Chunhui Fan, a researcher whose 2025 study coined the term that would ripple through education research, called it "metacognitive laziness"^[8] – a reduced tendency for learners to self-regulate, plan, monitor, and evaluate their own thinking when AI is doing the cognitive heavy lifting.^[9] The assignments improved. The students did not. The output looked like competence. The cognitive residue was dependency.

The tool made the assignment better. It made the student worse. The 48 percent gain vanished. The 17 percent deficit remained.

The Mystique That Pays

If the evidence is this clear (if AI demonstrably weakens the cognitive processes it claims to support) why do students keep using it? Why do schools keep deploying it? Why does adoption accelerate while the research accumulates?

Stephanie Tully and colleagues at Stanford answered this in a 2023 study that Sidra Sidra and Claire Mason, researchers at Australia's national science agency CSIRO, called "one of the most underappreciated findings in the field."^[10] Tully found that individuals with lower AI literacy are more likely to perceive AI as "magical" – as performing tasks that appear to require human intelligence. This perception of magic mediates higher receptivity. People who understand AI less are more impressed by it, more willing to use it, and more likely to accept its outputs without evaluation.

Sidra and Mason, whose own study of 292 AI users was published in the International Journal of Human-Computer Interaction in 2025, noted the implication with academic restraint: "efforts to demystify AI, by enhancing transparency and educating users, may paradoxically reduce its appeal."^[11]

Through the lens of a business model, the restraint falls away. The companies that build AI tools have a financial incentive to keep them mysterious. Transparency reduces engagement. Understanding reduces receptivity. The mystique is not a bug in the user experience. It is a feature of the revenue model. Every student who perceives AI as magical is a student who will not check the AI's work, will not plan how to divide cognitive labour, will not notice when the scaffold is doing all the holding.

The wizard is more profitable when nobody looks behind the curtain.

Half the Room

painting experiment air pump
Joseph Wright of Derby, "An Experiment on a Bird in the Air Pump" (1768). National Gallery, London. The audience watches the demonstration. Understanding the mechanism changes the relationship. Public domain.

Sidra and Mason's study contained a finding that most coverage overlooked. The researchers recruited 344 working professionals who used AI tools regularly. Before administering their scales, they asked three screening questions: does the AI tool "complement my skills," "engage in sustained interaction and two-way feedback," and do you and the AI "build upon one another's work"?^[12]

One hundred and forty-seven people (more than half) could not agree. They were screened out. These are working professionals who use AI daily, and more than half of them do not collaborate with it. They type prompts. They receive outputs. They move on.

The remaining 145 completed two new validated scales. The first, Collaborative AI Literacy,^[13] measures whether users know how to direct, contextualise, and refine AI outputs. The second, Collaborative AI Metacognition,^[14] measures whether users think about their own thinking while using AI – whether they plan, monitor, and evaluate the human-AI collaboration.^[15]

The statistical result was stark. When both scales were tested as predictors of benefit from AI use, metacognition held (β = 0.517, p < 0.001). Literacy dropped to non-significant (β = 0.093, p = 0.397).^[16] Knowing what AI does predicted almost nothing about whether people benefited from it, once you accounted for whether they thought about how they used it.

The MIT study showed what happens in the brain. The math study showed what happens to performance. The Sidra and Mason study showed what separates the people who benefit from the people who do not. It is not knowledge. It is metacognition – the thinking about the thinking.

The Teacher Who Stopped Grading Essays

Al Rabanera teaches math in California. In March 2026, he published a reflection on what AI had shown him about his own teaching.^[17] The revelation was not about the technology. It was about what the technology exposed.

"I didn't realize how much agency I had lost as a teacher," Rabanera wrote, "until AI showed me what I'd been missing – not because AI was disruptive, but because it exposed something harder to face: how comfortable I'd become with waiting."

Rabanera had been waiting for curriculum updates, for district mandates, for professional development that told him what to do next. The students using AI in his classroom were doing the same thing – waiting for the output, waiting for the answer, waiting for the machine to tell them what they thought. The passivity was not caused by AI. It was revealed by it.

Across the country, Pam Amendola, a high school English teacher, faced a different version of the same problem. When ChatGPT arrived, her students asked a question that the AI industry would prefer not to hear: "Why should I complete a worksheet when AI can do it for me?"^[18] The question was honest. The worksheet was designed to produce a finished product. The AI could produce the finished product faster and better. The student's logic was sound.

Amendola's response was to stop assigning worksheets. She redesigned her Macbeth unit so that the process (the reading, the arguing, the writing, the revising) was the assessment, not the output. The essay became secondary. The thinking became primary. She could not assess the thinking if it was invisible. She needed to see it.

The teacher's question is not "did the student use AI?" The question is: "can I see how the student thinks?" If the tool hides the process, the teacher is blind. If the tool shows the process, the teacher can teach.

The Architecture of Visible Thinking

painting country school
Winslow Homer, "The Country School" (1871). Saint Louis Art Museum. The teacher watches the room. The work is visible. The process is in the open. Public domain.

The MIT brain study, the math tutoring experiment, the metacognitive laziness research, and the Sidra and Mason validation study all point to the same architectural requirement: the thinking must be visible.

Not the output. The thinking. The planning. The monitoring. The evaluation. The moments where the student pushed back on the AI, changed direction, chose to ignore a suggestion, or accepted one uncritically. The process that produces the essay is more important than the essay. The process that solves the math problem is more important than the answer.

No consumer AI tool makes this visible. ChatGPT shows a text box and a response. Claude shows a text box and a response. Gemini shows a text box and a response. The metacognition (the planning, monitoring, and evaluating that Sidra and Mason proved predicts benefit) happens invisibly, if it happens at all.

Sage.Education was built on the premise that the process must be on the table for everyone to see. The Sage.is AI-UI platform provides conversation maps – branching, visual records of every interaction, showing the full tree of prompts, responses, revisions, and decisions.^[19] A teacher using Sage can see not just what the student produced but how: which prompts they wrote, which responses they accepted, where they pushed back, where they revised, where they gave up. This map makes the metacognition visible – to the student, to the teacher, and to anyone evaluating whether the collaboration produced learning or just output.

Sage is an open source platform without the consumerist scale of ChatGPT or the marketing budget of Google Classroom. The architecture is the argument: AI tools for education should show the work, not hide it behind a text box that displays only the result.

The Curtain

oz behind the screen
W.W. Denslow, "The Discovery of Oz, the Terrible" (1900). Toto has knocked over the screen. The Tin Man confronts the little old man. The machinery is visible. Public domain.

In the Wonderful Wizard of Oz, Dorothy and her companions travel to the Emerald City to meet a wizard they believe is all-powerful. When Toto knocks over the screen, they discover an ordinary man operating machinery. The wizard is not diminished by the revelation. The companions get what they came for: courage, a heart, a brain, a way home. They just get it honestly.

The AI tools in classrooms today are the curtain. They produce impressive outputs. They generate summaries, essays, lesson plans with a fluency that feels like intelligence. The mystique is powerful. The mystique is also the mechanism by which metacognitive laziness enters. The user, awed by the output, stops thinking about the process that produced it.

Pulling the curtain back does not make the tools less useful. It makes the students more capable. Sidra and Mason proved it: metacognition, not literacy, predicts benefit. The MIT study proved the cost of not doing it: cognitive debt accumulates, neural engagement declines, and after four months the person cannot quote their own work.

For educators, the task is clear: Show students how the AI works; make the collaboration visible; assess the process, not just the product; and build tools (or choose tools) that put the thinking on the table instead of behind the curtain.

The wizard is a machine. The machine is useful. Seeing how it works does not reduce its value. It increases yours.

That is what education is for.

The views expressed are those of the editorial board. Sage.is AI-UI and Sage.Education are products of Startr LLC. The author has no financial relationship with MIT, CSIRO, or any AI company referenced. Full disclosure and transparency is a feature, not a bug.

Pat Pataranutaporn et al., "Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task," MIT Media Lab, 2025. arXiv:2506.08872. Also: MIT Media Lab project page. TIME coverage. ↩︎
Neural coupling refers to the synchronised activity between different brain regions during cognitive tasks. Measured via EEG (electroencephalography), stronger coupling indicates more active communication across neural networks — more regions working together to process information. Weaker coupling suggests less distributed processing, meaning fewer brain regions are engaged. In the MIT study, coupling served as a proxy for how deeply the brain was involved in the writing task. ↩︎
EEG results: brain connectivity systematically scaled down with external support. Brain-only group showed strongest neural networks; Search Engine group intermediate; LLM group weakest overall coupling. Study ran four months, three sessions per participant. Limitation: 54 participants (18 per group), aged 18-39, Boston area. Small sample; findings are directional, not definitive. ↩︎
Behavioural findings: LLM users showed lowest self-reported essay ownership and lowest ability to recall/quote their own writing. Neural, linguistic, and behavioural underperformance persisted across all four months. Futurism. ↩︎
Pat Pataranutaporn, MIT Media Lab. "Cognitive debt" framework described in study and media coverage. The Hill. ↩︎
Math tutoring study: nearly 1,000 high school students. AI tutor improved problem-solving performance by 48%. Reported in APA and Strategian. ↩︎
Post-removal deficit: students who had used AI tutor scored 17% lower than comparison group with no prior AI access. The gain was scaffolding, not learning. ↩︎
Metacognition is "thinking about thinking" — the awareness and regulation of one's own cognitive processes. It includes three components: planning (deciding how to approach a task before starting), monitoring (checking whether your approach is working during the task), and evaluation (reflecting on what worked and what didn't after the task). The term was introduced by developmental psychologist John Flavell in 1979. In the context of AI use, metacognition is what separates a person who evaluates the AI's output critically from a person who accepts it because it sounds confident. ↩︎
Chunhui Fan et al., "Beware of metacognitive laziness: Effects of generative artificial intelligence on learning motivation, processes, and performance," British Journal of Educational Technology (2025). doi:10.1111/bjet.13544. Also: University of Auckland coverage. ↩︎
Stephanie Tully et al. (2023). Lower AI literacy → perception of AI as "magical" → higher receptivity. Cited in Sidra and Mason at p. 5101. ↩︎
Sidra Sidra and Claire Mason, "Generative AI in Human-AI Collaboration: Validation of the Collaborative AI Literacy and Collaborative AI Metacognition Scales for Effective Use," International Journal of Human-Computer Interaction 42, no. 7 (2026): 5084-5108. doi:10.1080/10447318.2025.2543997. Open access, CC BY-NC 4.0. 15,672 article views, 11 citing articles. Quote from p. 5101. ↩︎
Sidra and Mason, screening criteria (pp. 5095-5096). 344 recruited, 56 excluded for quality, 292 analysed. 147 of 292 screened out as non-collaborative AI users. ↩︎
Collaborative AI Literacy is defined by Sidra and Mason as "a set of competencies that enables individuals to critically evaluate collaborative AI technologies; communicate and coordinate with them effectively and use them as a tool." Unlike general AI literacy (knowing what AI is and how it works), collaborative AI literacy measures whether you can actively direct, contextualise, and refine AI outputs during a real-time interaction — the difference between understanding a tool and knowing how to work with it. ↩︎
Collaborative AI Metacognition is defined by Sidra and Mason as "the ability to enhance awareness and control of one's thinking process when working with collaborative AI tools, through the use of planning, monitoring, and reflection." It is distinct from both general metacognition and general AI literacy. The study found it to be the strongest predictor of whether people benefit from using AI tools — stronger than knowledge of AI itself. ↩︎
Collaborative AI Literacy: 14 items, Cronbach's α = 0.92. Collaborative AI Metacognition: 11 items, Cronbach's α = 0.88. Both validated via confirmatory factor analysis and structural equation modelling. ↩︎
Incremental predictive validity (Table 8, p. 5099). When both measures included: Collaborative AI Metacognition β = 0.517, p < 0.001; general AI Literacy β = 0.093, p = 0.397. H6 supported; H4 not supported. ↩︎
Al Rabanera, math teacher, California. "AI reminded me to lead in my classroom, not follow someone else's dictates." CalMatters, March 2026. ↩︎
Pam Amendola, high school English teacher. Redesigned Macbeth unit to integrate AI with traditional learning after students asked why they should complete worksheets when AI could do it faster. EdSurge, December 2025. ↩︎
Sage.is AI-UI, AGPL-3 licensed. sage.is. Conversation maps provide branching visual records of every interaction. Self-hostable, model-agnostic, exportable data. ↩︎