The Dialogue Is the Work

Last March, in a ninth-grade history class, the assignment was a five-hundred-word essay on the causes of the French Revolution. Both essays are competent, well-structured, correctly sourced. A plagiarism detector finds no matches. An AI detector returns inconclusive results. On paper, the work is indistinguishable.

Their ChatGPT conversation logs are not.

The first student's log runs to seventeen exchanges. It begins with a question about the economic conditions of the Third Estate, receives a response, and pushes back: "That makes it sound like the monarchy had no supporters. What about the provincial nobility?" The model adjusts. The student asks it to check a date. The model gets the date wrong. The student corrects it, citing a textbook. Three exchanges later, the student asks the model to reorganize the argument, then rejects the reorganization and writes a new outline by hand, pasting it back in. The final essay uses four of the model's sentences and discards the rest.

The second student's log contains one exchange. The prompt reads: "Write me a 500-word essay on the causes of the French Revolution. Include a bibliography." The essay that arrives is the essay that is submitted.

Both students used AI. One of them learned something. A teacher reading only the essays cannot tell which. A teacher reading the conversations does not need to guess.

The Arms Race That Already Lost

For the past three years, the education system has spent its AI budget on a question that turned out to be unanswerable: did the student write this?

Turnitin, the Oakland-based plagiarism detection company used by more than 16,000 institutions in 140 countries, claims its AI writing detector achieves 98 percent accuracy with less than one percent false positives.^[1] Independent testing tells a different story. A 2026 research review found Turnitin's overall accuracy at 72 percent, meaning roughly one in four judgments is wrong.^[2] On non-native English writing, heavily edited drafts, and technical prose, false positive rates climb to between five and twelve percent in peer-reviewed studies.^[3]

The damage falls unevenly. A 2023 Stanford Human-Centered AI Institute study tested seven major AI detection tools against student writing and found that 61 percent of essays written by non-native English speakers were flagged as AI-generated.^[4] Sixty-one percent. The students most likely to be falsely accused are the students least equipped to challenge the accusation: international students, ESL learners, first-generation college students whose writing has been shaped by translation apps and grammar tools for years before ChatGPT existed.

The cases are already accumulating. A doctoral student had her thesis introduction flagged as 67 percent AI-generated by her university's detection system. She had written every word herself, over four months, with no AI tools.^[5] Turnitin's own documentation now states that the tool "should not be used as the sole basis for adverse action against a student."^[6] The company that sells the detector is telling institutions not to trust it alone. The institutions are trusting it alone.

painting inquisition goya
Francisco Goya, "Tribunal of the Inquisition" (c. 1812-1819). Real Academia de Bellas Artes de San Fernando, Madrid. A tribunal judges the accused. The method is certain. The verdict is not. Public domain.

A 2026 paper in the Journal of Higher Education Policy and Management argued that "generative AI detection should not be used in education" at all, citing methodological imperfections, violations of procedural fairness, and the fundamental problem that AI detection relies on unverifiable probabilistic estimates rather than comparison against known sources.^[7] Plagiarism detection works because it answers a verifiable question: does this text appear in another document? AI detection attempts to answer an unverifiable one: does this text feel like it was written by a machine? The epistemology is broken. The answer cannot be checked. And every semester, students are sanctioned on the basis of a feeling.

The question was never "did the student write this?" The question is "did the student think during this?"

The Postplagiarism Turn

Dr. Sarah Elaine Eaton, a researcher at the University of Calgary's Centre for Advancing the Integrity of Education in the Age of Large-Scale Intelligence, has a name for what comes next: postplagiarism.^[8]

The term does not mean plagiarism no longer matters. It means the framework built to detect it no longer describes the world. When a student asks ChatGPT to draft an essay, revises every paragraph, adds their own sources, restructures the argument, and submits the result, the question "did they plagiarize?" has no coherent answer. The text did not exist before. It is not copied from another student. It is not purchased from an essay mill. It was produced in a collaboration between a human and a machine, and the boundaries of that collaboration are invisible in the finished product.

Eaton's argument, developed across a series of publications and a 2025-2026 speaker series at the University of Calgary, is that integrity cannot be enforced through surveillance after the fact. It must be designed into the learning itself.^[9] The shift is from policing products to designing processes, from asking "is this original?" to asking "did this teach you anything?"

The implications extend beyond convenience. Eaton and colleagues have documented how detection-based integrity systems disproportionately harm neurodivergent students, whose writing patterns, unconventional syntax, and non-standard cognitive approaches trigger the same statistical flags that AI detection uses to identify machine text.^[10] The surveillance apparatus does not merely fail to catch AI use. It actively punishes difference.

The Richest Record Nobody Is Reading

In January 2026, a team of researchers published a framework they called DHASRL: Dialogue-Based Human-AI Self-Regulated Learning.^[11] The paper's premise was simple and, in retrospect, obvious. Every time a student interacts with an AI chatbot, the conversation produces a transcript. That transcript contains something no exam, essay, or portfolio has ever captured: a real-time, turn-by-turn record of how the student thinks.

The transcript shows when a student pauses and reformulates a question. It shows when they accept an answer uncritically and when they push back. It shows whether they ask for definitions, whether they verify claims, whether they notice when the model contradicts itself. It shows the metacognitive architecture of the learner: how they plan, how they monitor their own understanding, how they reflect on what they have learned.

The DHASRL framework proposes embedding assessment directly within this dialogue, enabling what the researchers call "real-time monitoring and scaffolding of student regulation." The conversation is not evidence of cheating. It is evidence of cognition. And it is the most granular evidence of cognition that education has ever had access to.^[12]

The researchers also noted a gap: "transcript-based dialogue analytics remain scarce." Despite the fact that millions of students are generating AI conversation logs every day, almost no institutions are collecting them, reading them, or building assessment frameworks around them.^[13] The richest dataset of student thinking in human history is being generated at scale, in real time, and thrown away.

painting astronomer vermeer
Johannes Vermeer, "The Astronomer" (c. 1668). Louvre, Paris. A scholar reaches toward a globe, not for the answer but for the method of finding it. Public domain.

The Exam That Talks Back

If the transcript is the assessment, the logical extension is an exam that generates its own transcript. This idea, too, is older than it appears.

The viva voce, the oral examination in which a student defends their work in conversation with an examiner, has been a staple of doctoral assessment for centuries. Research consistently shows it achieves high validity and reliability, with Cronbach's alpha between 0.75 and 0.80.^[14] It fell out of use in undergraduate education for one reason: it does not scale. A professor cannot conduct three hundred oral exams in a week.

In March 2026, a team of researchers at TU Delft in the Netherlands demonstrated that AI can solve the scaling problem. Using a voice-enabled large language model as examiner, they administered conversational assessments to a course of more than six hundred students. The LLM asked contextually linked follow-up questions, probing for conceptual depth and evidence of authorship. A "Council of LLMs" scored the responses. The process transformed thirty hours of manual grading into a fifteen-dollar automated operation.^[15]

The University of Pennsylvania has begun pairing spoken exams with written papers, citing both academic integrity and the development of critical thinking.^[16] The format requires what no written submission can: real-time reasoning, application of knowledge to novel prompts, and the defense of specific decisions. A student who copied and pasted cannot defend what they did not think through. A student who wrestled with the material for seventeen exchanges can.

The Prompt Is the Assignment

The assessment revolution does not require conversational exams or transcript analysis platforms. It can begin with a single change: make the AI conversation the assignment.

Research published between 2024 and 2026 consistently shows that structured AI prompting develops metacognition, the awareness and regulation of one's own thinking processes.^[17] When students are taught to plan their prompts (set goals and strategies before engaging the model), monitor the responses (evaluate accuracy, check for contradiction, verify sources), and reflect on the exchange (assess what they learned, what they missed, what they would do differently), the prompting process itself becomes a cognitive exercise.^[18]

The prompt reveals the prompter. A student who writes "give me the answer" has demonstrated one level of understanding. A student who writes "I think the answer involves supply-side economics, but I'm not sure how to connect it to the trade deficit -- can you help me think through the causal chain?" has demonstrated another. The first prompt asks the machine to do the work. The second prompt uses the machine to do the thinking. The difference is visible in five seconds. No detection algorithm required.

Dr. Kelly Van Sande, the founder and CEO of Ignite Learning Academy, an online K-12 school serving gifted students and students with special needs, has spent two decades building assessment systems that accommodate cognitive diversity.^[19] The standard assessment paradigm, the essay, the multiple-choice exam, the timed test, has always been a poor fit for students who think differently. AI-mediated assessment offers something those students have rarely had: an environment where the path to the answer matters more than the answer itself, where unconventional reasoning is visible and legible, where the process can be evaluated on its own terms rather than reduced to a product that erases how it was made.

The Barrier

painting governess chardin
Jean-Baptiste-Simeon Chardin, "The Governess" (c. 1739). National Gallery of Canada, Ottawa. One person teaches another how to look closely. The lesson requires presence. Public domain.

The tools exist. The research exists. The framework exists. What does not exist is institutional will.

Schools are still buying Turnitin. Districts are still writing AI policies that begin with the word "prohibit." Professional development budgets, where they exist at all, fund sessions on how to catch students using AI, not on how to teach students to use it well.^[20] The Massachusetts Department of Higher Education published a GenAI Assessment Guidebook offering practical frameworks for redesigning assessment around AI.^[21] It is one of the few state-level documents of its kind. Most states have nothing.

Amir Nathoo, the CEO of Outschool, a San Francisco-based live education platform valued at three billion dollars with more than 140,000 classes offered across 180 countries, argues that AI should augment teachers, freeing them for mentorship, creativity, and emotional connection rather than replacing them.^[22] The argument is correct and insufficient. Augmentation requires infrastructure. It requires training. It requires time, the one resource teachers do not have. The future-of-assessment literature is written by researchers who do not teach five classes a day and read by administrators who do not set budgets.

Ben Dodson, the CEO of Dewey, a Bay Area AI data platform that raised 4.1 million dollars in 2025 and serves both K-12 and higher education, built his company on the premise that school data, attendance records, grades, engagement metrics, should be accessible to non-technical staff through natural language queries.^[23] The platform demonstrates what is possible when educational data is treated as legible rather than locked. But conversation transcripts are not yet in most data platforms. The infrastructure to collect, store, and analyze student-AI dialogues at institutional scale is only beginning to emerge.

Sage.Education is building one version of that infrastructure. The platform provides educators with step-by-step conversation map overviews of student-AI interactions: visual, structured records of how a student moved through a problem, where they asked for help, where they pushed back, and where they accepted an answer without question. The maps do not replace the teacher's judgment. They give the teacher something to judge. Instead of receiving a finished essay and guessing what happened before it arrived, an educator using the platform can see the process unfold, step by step, and intervene where it matters: the moment a student stops thinking and starts copying.

The goal is not surveillance. It is visibility. When a teacher can see how a student used the tool, AI stops being a shortcut and becomes a supported learning environment. The conversation becomes legible. The dialogue becomes teachable. And the institutions that need this most, the underfunded public schools serving the students most dependent on free AI tools, are the institutions Sage.Education was designed to reach.

Two Conversations

The plagiarism debate consumed three years. It produced an arms race between generation tools and detection tools, a race the detection tools lost, at the cost of thousands of false accusations against the students least equipped to defend themselves. The question that drove it, "did the student write this?", was always the wrong question. It assumed that the product was the point. It assumed that the essay, the artifact, the thing submitted and graded and filed, was the measure of learning.

It was never the measure of learning. It was a proxy, chosen because it was easy to collect and easy to grade. The actual measure of learning, the thinking, was invisible. It happened inside the student's head, and no assessment in history could see it.

Now the thinking happens in a conversation. It is recorded, turn by turn, in plain text. Every false start, every correction, every moment a student catches the machine in an error or catches themselves in a misunderstanding: all of it is there, timestamped, searchable, analyzable.

Two students submitted the same assignment. One generated a conversation that reveals seventeen acts of cognition. The other generated an essay that reveals nothing. The teacher who reads the conversations knows more about both students in five minutes than a semester of exams could tell them.

The dialogue is the work. The question is whether anyone will read it.

Disclosure: Sage.Education uses AI tools in its editorial and product workflows. This article was researched and drafted with AI assistance.

"AI Writing Detection Model," Turnitin Guides, guides.turnitin.com ↩︎
"How Accurate Is Turnitin's AI Detector? -- 2026 Research Review," Leap AI, tryleap.ai ↩︎
"How Accurate Are AI Detectors in 2026? We Tested 5 of Them," ProofreaderPro, proofreaderpro.ai ↩︎
Stanford HAI (Human-Centered Artificial Intelligence) is Stanford University's interdisciplinary research institute studying the impact of AI on society. The 2023 study tested seven major AI detection tools against student essays and found that non-native English writing was systematically misclassified as AI-generated due to shared statistical features: simpler syntax, more predictable word choices, and shorter sentences. ↩︎
"These Turnitin False Positives in 2025 and 2026 Show Why AI Detectors Can't Be Proof," Popular AI, popularai.org ↩︎
Turnitin Guides, ibid. ↩︎
"Heads We Win, Tails You Lose: AI Detectors in Education," Journal of Higher Education Policy and Management, 2026. doi.org/10.1080/1360080X.2026.2622146 ↩︎
Postplagiarism is a term developed by Dr. Sarah Elaine Eaton at the University of Calgary to describe the current era in which hybrid human-AI writing is becoming the norm. The concept argues that academic integrity must shift from detection and punishment to assessment design that makes misuse irrelevant. Eaton outlined six tenets of postplagiarism in 2023, which have been adopted as a framework by educators and institutions internationally. drsaraheaton.com ↩︎
"2025-2026 Postplagiarism Speaker Series: Navigating AI in Education," University of Calgary CAIELI, postplagiarism.com ↩︎
"Neurodiversity and Academic Integrity: Toward Epistemic Plurality in a Postplagiarism Era," Teaching in Higher Education, 2025. doi.org/10.1080/13562517.2025.2583456 ↩︎
DHASRL (Dialogue-Based Human-AI Self-Regulated Learning) is a framework proposed in a 2026 paper that embeds assessment of student self-regulation directly within AI conversation transcripts. Self-regulated learning refers to the ability to plan, monitor, and evaluate one's own learning process. The framework proposes treating student-AI dialogue as a primary data source for understanding how students think, rather than merely what they produce. Zhang, L., Lin, F., & Wang, W. (2026). "What Can Student-AI Dialogues Tell Us About Students' Self-Regulated Learning?" arxiv.org ↩︎
Zhang et al., ibid. ↩︎
Zhang et al., ibid. ↩︎
Cronbach's alpha is a statistical measure of internal consistency, commonly used to evaluate how reliably an assessment measures what it claims to measure. A score of 0.70 or above is generally considered acceptable; 0.75-0.80 is strong. The viva voce (oral examination) consistently achieves this range, indicating that trained examiners arrive at similar conclusions about student understanding through conversational assessment. ↩︎
"The Conversational Exam: A Scalable Assessment Design for the AI Era," 2026. arxiv.org; "Fighting Fire with Fire: Scalable Oral Exams with Voice AI," Allard de Winter, allarddewinter.net ↩︎
"Oral Exams: The Perfect AI-Proof Assessment Method," Edvisor AI, blog.edvisor.ai; University of Toronto, "Viva Voce Oral Exam (AI-Resistant Approach)," teaching.utoronto.ca ↩︎
"Students' Perceptions and Applications of Metacognitive Awareness Levels in Problem Solving with ChatGPT," Educational Process: International Journal, 2025. eric.ed.gov ↩︎
"Teaching Students AI Strategies to Enhance Metacognitive Processing," The Scholarly Teacher, scholarlyteacher.com ↩︎
Dr. Kelly Van Sande holds a Doctorate in Educational Leadership and Administration (Ed.D.), an MBA, and is a certified teacher, principal, and school superintendent. Ignite Learning Academy serves K-12 students nationally with programmes for general education, gifted learners, and students with special needs through its FIRE programme. Named a Top School by Niche.com for three consecutive years. ignitelearningacademy.com ↩︎
"Rethinking Higher Education Assessments in the Age of AI," UCL Centre for Education and AI Innovation, April 2025, reflect.ucl.ac.uk ↩︎
"GenAI in Assessment: A Practical Guidebook," Massachusetts Department of Higher Education, mass.edu ↩︎
"Redefining Education in the Age of AI: A Conversation with Outschool CEO Amir Nathoo and Eric Ries," Outschool Press, press.outschool.com ↩︎
Ben Dodson is the co-founder and CEO of Dewey (Doowii), an AI-first data platform that enables educators and administrators to query educational data using natural language. Dodson previously worked as an AI/ML engineer at Google, Snapchat, and Mux. The company raised $4.1 million and serves both K-12 and higher education. literalhumans.com ↩︎