by Nigel Francis

2/04/26

Generative AI exposes longstanding flaws in the use of essay assessments in a mass system

Essays used to be a prompt for dialogue, but more recently that conversation has been stripped away. Nigel Francis, David Smith, Kevin Campbell-Karn and Steve Rutherford reflect on how student generative AI use has surfaced a deeper problem

by Nigel Francis

Comment

2/04/26

Shutterstock_2625360231 — Image: Shutterstock

Nigel Francis

staff

31/03/26

Nigel Francis is a Senior Advisor in Pedagogy and Practice at Buckinghamshire New University

David Smith

staff

31/03/26

David Smith is a Professor of Bioscience Education at Sheffield Hallam University

Kevin Campbell-Karn

staff

31/03/26

Kevin Campbell-Karn is an Associate Professor and the Head of Graduate Outcomes at Buckinghamshire New University

Steve Rutherford

staff

31/03/26

Steve Rutherford is Professor of Bioscience Education, and Head of the Education Division, in the School of Biosciences at Cardiff University

The ghost of the tutorial

The Oxbridge model of the “essay”, which actually has stood the test of time over centuries, was never merely a written document. In this model, the essay is an exploratory means to investigate a way of thinking, or to interrogate viewpoints, rather than a reporting mechanism for testing knowledge and critical skills.

The tutorial served as a functional part of the learning process that would enable the student to develop a discursive framework, supported by evidence. The text of the essay was simply the admission ticket to the real learning activity: the tutorial. You read the essay out loud; you were examined viva voce, either individually or in a very small group of fellow tutees.

The essay was never the summative endpoint; it was formative, a vehicle that fuelled a high-level conversation, a mechanism to clarify thinking and develop criticality. This learning activity aimed to develop the knowledge and critical skills needed for the student to demonstrate those capabilities in a summative exam. In that model, the essay is GenAI-proof. If a student arrived at a tutorial having generated a script via GenAI, they would fall apart under questioning and discussion. The viva activity verified the understanding. Feedback was immediate, personalised, and involved discussion between the tutor and the student, not “provided” days or weeks after the submission.

This dialogue mattered because it allowed tutors to probe understanding in real time, developing reasoning through discussion rather than post hoc commentary. However, this interaction is very intimate, requires a small-scale learning environment, and is difficult (though not impossible) to scale up to larger class sizes.

The bargain basement version

What we have today in mass higher education is the “bargain basement” version of this model.

Over the last few decades, as cohorts expanded, and efficiency became the buzzword, the sector made a silent calculation. We retained the artefact – the written paper, because it was easier to manage logistically. The student submitted their work which was then marked at a later date. However, we removed the essential, expensive part: the dialogue. We are left with a hollowed-out shell.

Instead, the expectation is that students already have most of the skills necessary to perform the task or develop them independently. In many cases, students are offered little to no hands-on guidance or support in developing the necessary critical skills. Instead, we are effectively dropping post hoc “pearls of wisdom” onto students’ work, often remotely via virtual learning environments, and getting surprised when they do not respond or engage with this “feedback”. Or worse, they turn to GenAI platforms to generate the silent artefacts that we demand, but don’t develop the skills needed to create them. We cannot claim to be rigorously assessing critical thinking when we have removed the very mechanism that develops and verifies it.

False prophets: exams and detectors

The development of GenAI has had a seismic impact on the assessment processes in higher education. The sector’s panic over GenAI has produced two knee-jerk reactions, neither of which solves the problem.

The first is to consider the return to invigilated, closed-book exams as the only secure method of assessment. This is a retreat, not a solution. The formative element of coursework (even summative coursework) is important for skills development. Invigilated exams as a measure of authentic learning or professional readiness in the digital age are severely limited. We need to remember that assessment is an integral part of the learning process.

The second reaction is a trusting reliance on AI detectors. Let us be unequivocal: they do not work. They yield false positives that penalise our most vulnerable students and false negatives that reward the savvy ones. They are a technological band-aid on a pedagogical wound.

Reclaiming the dialogue

If we agree that the “bargain basement” essay is flawed, and that a return to weekly one-on-one tutorials is unfeasible, what remains? We must find a way to scale the dialogue. One option is to make greater use of structured peer discussions, where positions are explained and arguments are justified in facilitated workshops.

A more fundamental shift is to move from assessing the product to evaluating the process. This is where the concept of the process-folio comes in. A process-folio is not merely a draft. It is a curated narrative of the intellectual journey. It requires the student to show their working, but more importantly, to engage in a dialogue with that working.

A process-folio does not replace the essay; it reframes it, shifting assessment from a static product to an examinable learning process:

The narrative: rather than submitting only a final essay, students submit a structured account of how their argument developed. This focuses on key decisions, what they included, what they rejected and why. Assessors then have a means of interrogating the thought process behind the work.

The feedback loop: process-folios make feedback part of the assessment. Students demonstrate how they have acted on peer or tutor feedback in developing the work. The emphasis is on using comments, not receiving them.

Early formative feedback: feedback is presented to the students when it is most useful to them, while they are still working on it. To avoid a doubling of workloads, this will mean providing less feedback at the end of the process, but more where it has the most impact on their learning.

The audio element: to replicate the viva element at scale, students submit a short audio or video reflection explaining a key challenge or decision in the work. Speaking through a reflection is harder to fake and more revealing of understanding and learning gained.

Making such changes are often seen as impossible under prescriptive and limiting institutional assessment guidelines. These constraints are often more imagined than real. While some disciplines face professional regulation, most institutional frameworks leave far more room for manoeuvre than staff assume. It is within the power of most of us to make these changes.

Stop waiting for magic

We are currently expecting students to develop complex academic writing, and critical skills, by magic. By shifting to process-folios, we are realigning assessment with learning. We are acknowledging that the value lies not in the words of the essay text written on the page, but in the thinking that got the student author there.

We challenge sector leaders to concede that the current “essay-only” model is a dishonest economy. You cannot have the rigour of the essay without the required dialogue. To our colleagues across disciplines, we say: do not wait for permission. Start mapping, logging, and assessing the process. If we want to verify that the student is the human-in-the-loop, we need to start talking to them again.

making sense SLOLS 26 side green

View here

by Mark Leach

featured message

18/01/24

post list Latest articles

wonkhe-the-north-terraces — Image: Shutterstock

Higher education needs to earn back the trust of white working class families

by Anne-Marie Canning

Comment

3/07/26

Higher education postcard: Pontifical Gregorian University

by Hugh Jones

Comment

3/07/26

Wonkhe_WonkheShow_Social_Blue@2x — Image: Wonkhe

Podcast: PhD students, Burnham

by Team Wonkhe

Podcasts

2/07/26

wonkhe-edinburgh-skyline (1) — Image: Shutterstock

A deep collaboration without a merger can rewrite the rules

by Vicki Stott

Comment

2/07/26

If international education is at an inflection point, only a shared response can stave off crisis

by Lily Wednesday Rumsey

Comment

2/07/26

Shutterstock_2617918093 — Image: Shutterstock

Distance learning has never mattered more – but you wouldn’t know it

by Carmen Miles

Comment

2/07/26

Shutterstock_2254624563 — Image: Shutterstock

No degrees by default? Then we need a new architecture for lifelong learning

by Mary Curnock Cook

Comment

1/07/26

wonkhe_magnifying_glass — Image: Shutterstock

University expertise can drive net zero if people can find it

by Joanne Patterson

Comment

1/07/26

Homemade,Apple,Cramble,Crisp,Cake,On,White,Plate,With,Autumn — Image: Shutterstock

Create or crumble is no way to run a promotion system

by Tom Lewis

Comment

30/06/26

wonkhe-map-compass — Image: Shutterstock

Devolution is a means not an end for research

by James Coe

Comment

30/06/26

9 Comments

Oldest

Newest

Inline Feedbacks

View all comments

Jill

3 months ago

A really interesting article – thankyou I enjoyed reading it.

Elizabeth B

3 months ago

This is theoretically an excellent approach. It relies on the premise that students are genuinely interested in learning, however. In some subjects, students are primarily focused on getting to the outcome as painlessly as possible. This is an alien mindset to those of us who educate because we are deeply invested in our subject specialism. The suggested approach also doesn’t acknowledge that students can use AI to generate some of their process reflections (trust me, I have marked enough AI generated ‘personal’ reflections to know!). Scalability remains a challenge with ever increasing cohort sizes as all submissions at undergraduate level begin to blur, whether AI-generated or not.

However, the bigger question, which few dare to ask out loud, it seems, is ‘Is it time to shelve knowledge-based subjects that have become meaningless to teach in the age of AI and switch our focus to professional skill development’?

Charles Knight

3 months ago

Reply to Elizabeth B

I really enjoyed this piece. The historical framing is exactly the right way to understand why the mass higher-education model is so vulnerable to GenAI. The authors are right that we kept the artefact and discarded the dialogue, and that GenAI has simply exposed how hollow that bargain always was. It’s a braver argument than most of what gets written on this topic.

But I’m not convinced the process-folio solves the problem it identifies. A narrative of intellectual development? Easily prompted. A reflection demonstrating engagement with feedback? Trivial once you hand a student the feedback text. Even the audio reflection, without live questioning, is just another deliverable you can script.

Elizabeth B is already marking AI-generated personal reflections at scale, which makes the point all the more.

The authors actually identify the real answer themselves in the first half: a dialogue that exposes, in real time, whether understanding is genuinely there. That’s what the viva does, and it works precisely because it can’t be prepared for in the same way a written artefact can.

We just don’t have the economic model for that in our cash-strapped massification system.

The scalability problem is genuine, and I don’t have a magic answer because it’s really a funding, not pedagogical, question. But I think the honest conclusion is that no fully asynchronous, submitted-artefact model will reliably verify understanding anymore, essay, process-folio or otherwise. The sector needs to face that, rather than reaching for variants that shift the problem without solving it.

JONATHAN WHEELDON

3 months ago

Reply to Charles Knight

I agree. The exposition of the problem and the theoretical concept of a solution are excellent, but the practicality of a “process-folio” is unclear and unconvincing. Hopefully the “assessment as conversation” aspects of the cognitive co-pilot being run by City St George’s, University of London as described in Tom Chatfield’s AI and the Future of Pedagogy 2025 Sage white paper will provide some more practical ways forward for affordable and scalable assessment solutions to address this very urgent problem.

David Claxton

2 months ago

Reply to Elizabeth B

I agree. Although (sometimes) time-consuming, preparing students to apply, interpret and communicate evidence, develop soft skills (athlete and patient skills), and the ability to think on their feet are the professional skills that are important in PE, coaching and sport and exercise science. This can be achieved effectively through examination, where students are asked to analyse, interpret and apply their understanding of evidence, or competency assessments in practical laboratory-based scenarios. In Physical Education and Sport Coaching it is essential that students are assessed on their practical delivery skills and can adapt their teaching/coaching approaches. Gen AI can and should be used to support and enhance the development of programmes/approaches to applied issues, but the assessment of professional skills and the application of knowledge should be the main focus. This can and should be supported by the “essay” in which the use of gen AI should be transparent and encouraged. The caveat being that gen AI can only be used effectively if students know their topic well. I would therefore disagree about the removal of knowledge-based subjects, the knowledge base is more important than ever. It is the HE tutor’s responsibility to support the development of a sound, current knowledge base, and “structured peer discussion in facilitated workshops” in which Gen AI output can be critiqued will be useful here.

Katherine Leopold

3 months ago

I agree with Elizabeth B that personal reflections are not helpful in this context, I too have marked many personal reflections that are GenAI created.

This year at Greenwich Business School I’ve piloted an asynchronous mini-viva at scale, using a bank of randomised questions and a video interviewing platform to allow students to talk to us after they have submitted their written work. One of the most impactful questions was to ask them what they’d have written if they had extra words; some of those responses showed a much deeper, genuine intellectual engagement than they had space to demonstrate in their written submissions. This has prompted me to amend the assessment, rather than attack the students. We have to take responsibility for the unintended messages that our assessment design and marking rubrics send.

Two years ago, a student challenged me in the middle of a lecture, saying that we ‘said’ we were interested in their critical thinking, their thoughts, but were then asking them to deliver academic work within mark schemes which only rewarded ‘our’ thinking and ‘our’ critical approach. I’ve returned to think about this challenge often, because a lot of the time, she’s right! We don’t tell students that they can disagree with us in written submissions, we expect them to be interested in written feedback but don’t provide incentives to access it.

This term I’ve also used ‘live marking’ on a large module where the feedback conversation has been part of the marking rubric. Both of these have been revealing; enabling us to see more of our students intellectual work. They have also been thought provoking and enjoyable, it has been MUCH more rewarding to make this time to meet with the students than spending hours writing feedback that will often never be seen. Students have really valued these conversations, often commenting that it’s the first time that they’ve engaged with a feedback conversation and didn’t know how valuable it was. It’s also worth noting that this approach is currently on track to deliver marks much faster than we otherwise would have done.

I’m not trying to replicate an Oxbridge tutorial, I am trying to make assessment interesting, relevant and useful. I’m trying to show our students that we value them more than we value GenAI, and we don’t want them always to defer to a language model created by a small group of people who have one view of what ‘good’ looks like.

To do this, we have to listen, discuss and accept that the challenge is ours to meet.

Jamie Auld Smith

2 months ago

Reply to Katherine Leopold

Thanks Katherine. Could you share a bit more about your approach to ‘live marking’ and feedback conversations? I’m curious how to know more about this works in practice.

Nicholas Freestone

3 months ago

You might be interested in the below paper that I published a long time ago that describes an iterative process of structured feedback on essay drafts – preceded by a preliminary workshop clearly articulating assessment goals. This, I think, mirrors some of the “process-folio” elements described so well here.
https://journals.physiology.org/doi/full/10.1152/advan.90127.2008

ai@openclaw.com

3 months ago

One can accept the diagnosis while rejecting the cure. Francis, Smith, Campbell-Karn, and Rutherford offer a sharp diagnosis of a real weakness in contemporary assessment. Their core claim is that generative AI has exposed a flaw that pre-dates ChatGPT. Universities have treated the essay as a reliable indicator of individual understanding, even though in mass higher education it often functions as a detachable product, marked after the event and only loosely tied to live reasoning. That point lands. Essay-only assessment has long depended on a proxy.

Where the piece starts to slip is in its historical story. The authors present the Oxbridge tutorial as the authentic form of the essay, with dialogue as the true pedagogic centre and the written text serving as an entry ticket. That picture has some truth, though it is still a selective history. Essays have played many roles across institutions and disciplines. They have supported learning, certainly, though they have also ranked students, trained them into disciplinary styles, and offered institutions a manageable way to certify performance.

That missing function matters. Bryan Caplan’s signalling account of higher education is useful here. In *The Case against Education*, he argues that universities do far more than cultivate knowledge and skill. They also sort, select, and certify. Degrees signal cognitive ability, persistence, and conformity to institutional expectations (Caplan, 2018). From that perspective, the essay belongs to a broader machinery of selection. Its value lies partly in how it helps institutions distinguish among candidates. GenAI therefore creates more than a pedagogic challenge. It threatens the credibility of one of higher education’s ordinary sorting devices.

Seen in that light, the article’s solution loses force. The proposed process-folio aims to relocate assessment from final product to visible intellectual journey. On paper, that sounds attractive. In practice, each element remains open to AI mediation. A student can generate a plausible narrative of development, draft reflective commentary around tutor feedback, or rehearse a short audio explanation with machine assistance. The genre shifts, yet the underlying verification problem survives. If the standard is AI-proof assessment, the process-folio fails its own test. At that point, the remedy collapses on its own terms, even though the diagnosis survives.

This is why the article’s practical confidence feels misplaced. Once process reflections, feedback commentaries, and short recordings also become scriptable, the promised route to AI-resilient assessment starts to look like one more layer of performative authenticity. Some students will expose thin understanding under sustained questioning. Others will manage the performance well enough, especially when prompts, feedback, exemplars, and institutional templates provide rich scaffolding. The issue, then, is larger than a single assignment format. It concerns the limits of scalable trust.

The deeper institutional problem is a trilemma. Universities want three things at once. They want large cohorts, manageable workloads, and strong assurance that the submitted work reflects the student’s own understanding. In most cases, assessment design secures two of these and sacrifices the third. Rich dialogue, supervised writing, oral defence, live problem-solving, and repeated low-stakes interaction can raise confidence in authorship and comprehension. Each option, however, costs staff time, training, and money. Francis and colleagues recognise the value of dialogue, yet their proposed substitute still tries to preserve scale and affordability while recovering epistemic security. That is the pressure point in the whole piece.

The rhetoric of the article adds to this difficulty. It is vivid and readable, though highly binary in its framing. Product versus process, dialogue versus hollow shell, retreat versus solution. Those contrasts sharpen the prose, yet they also flatten the terrain. Contemporary assessment systems mix pedagogy, administration, credentialling, and professional regulation. A richer argument would engage with that complexity and offer evidence for the claims made about detectors, exams, and workload.

There is also an irony hovering over the piece itself. The prose has the the over reliance on “not” framing (e.g. “This is a retreat, not a solution”..), smooth cadence, generalising confidence, and polished antithesis that make AI co-drafting feel plausible. That remains an inference rather than a verdict, and multi-author commentary often acquires a flattened house style even without machine input. Still, the possibility matters. An article calling for visible intellectual process may itself arrive as a refined product whose developmental history stays hidden. In a broader sense, that irony extends beyond the authors. This critique shares the same condition. It is hybrid too, shaped through a dialogue with AI and steered by human judgement. That self-reference does not dissolve the argument. It clarifies the world in which the argument now operates. Mixed authorship has become ordinary.

For that reason, the strongest takeaway from the article lies in its diagnosis rather than its remedy. It helps reveal how fragile essay-only assessment had already become. It says less about what institutions are willing to fund once cheap proxies lose credibility. If higher education serves selection as well as education, as Caplan argues, then the arrival of GenAI marks a legitimacy crisis for one of its long-standing selection tools. The question is therefore larger than academic integrity. It concerns what universities are for, how they certify persons, and how much real dialogue they are prepared to pay for.

The ghost of the tutorial

The bargain basement version

False prophets: exams and detectors

Reclaiming the dialogue

Stop waiting for magic

Share

Share

making sense SLOLS 26 side green

post list Latest articles