It smells like AI, but how do I prove it?

The Royal Statistical Society (RSS), Institute of Mathematics and its Applications (IMA), and London Mathematical Society (LMS) have published a punchy joint statement expressing concerns about academic integrity in assessment.

Jim is an Associate Editor (SUs) at Wonkhe

It’s a tricky business, especially given that generative AI tools can produce “mathematically coherent answers to structured questions in seconds, often indistinguishable from student-generated work.”

The statement reminds us of how quaint earlier concerns about contract cheating now feel. Unlike traditional cheating services that produced reused answers, generative AI creates unique responses each time, making detection significantly more difficult, if not impossible through conventional methods. The technology’s widespread accessibility and free availability have also significantly lowered the barriers to unauthorised use.

It comes on the same day that the OIA publishes new guidance and a set of case summaries on AI-related academic misconduct cases – where it turns out that when universities properly implement students’ procedural rights (like actually sharing evidence in advance, allowing meaningful responses, or providing detailed reasoning) it becomes even harder to prove and punish AI misconduct.

The casework exposes the weaknesses in AI detection – software bias against certain writing styles, false positives for non-native speakers, and the inability to distinguish between legitimate writing tools and generative AI. When students can properly challenge the evidence and universities have to justify their conclusions, plenty of cases fall apart. Who knew?

We all did, of course. It has been very clear for a couple of years now that depending on the asynchronous assessment of a(ny) digital asset – produced without supervision – as a way of assessing a students’ learning no longer works. There’s no way to prove they made it, and even if they did, it’s increasingly clear that it doesn’t necessarily signal that they’ve learned anything when they did. Some have wasted a couple of years (and some a lot of money) attempting a daft cat and mouse game in pursuit of proof, only to find themselves at the vanishing point that was obvious all along.

What’s been less clear is what should be done instead.

The societies’ carefully worded statement strongly suggests they’re scooping up and responding to reports from academics who are being prevented from using traditional exams. The emphasis that invigilated examinations “should be retained as options” and their call for universities to support departments in “maintaining access to the full range of assessment methodologies” implies these options are being removed or restricted by institutional policies.

The language suggests departments are losing autonomy over assessment design, likely driven by institutional policies favouring online or take-home assessments even as AI makes these increasingly problematic. What the statement doesn’t do is reflect on why that’s been happening.

It’s partly been aimed at narrowing awarding gaps – research tends to show some groups perform worse in high-pressure, timed exam settings. Open-book and take-home formats can offer a more supportive environment, reducing stress and allowing students to better demonstrate understanding and critical thinking. The move also reflects efforts to address mental health concerns and manage institutional resource pressures through scalable and continuous assessment methods.

There was, in other words, a set of good reasons for moving away from exams. And while academic integrity issues might pose problems for the alternatives, it doesn’t seem especially helpful for the societies to ignore those reasons while calling for a return to the exam hall.

Notwithstanding the importance of actually deciding, subject by subject, what it is that we think it’s important for humans to be able to do on their own in the future, it is pretty clear that assessment integrity from here is likely to involve some element of watching people doing things. That brings with it all sorts of problems that need anticipating from the outset – both in terms of the high-stakes drama involved in “performance” and the potential discrimination and prejudices involved on the part of the part of the people doing the watching and/or listening – but does feel inevitable.

But perhaps the other thing to think about is why exams feel so “high stakes”. I’d be tempted to review the research on why students that hate exams hate them – and if I did so, I’d find the systems in use across Europe that give students multiple goes at end-of-module exams, without caps. That accounts for the sorts of extenuating circumstances that can come in when there’s a problem on the day without a bunch of paperwork, and says “when you hit the standard, we’ll record that” instead of “we’re judging your ability to master the material at exactly the same pace as every other student”.

In other words, in the age of AI, academic integrity concerns trigger a debate about assessment reliability. But what they ought to trigger is a debate about what we want students to learn – and whether, when they get that final grade, it’s supposed to signify reaching a standard in general, or doing so at a specific predetermined time.

In reality, we know that adults learn at a different pace to each other. And while the deadline or the exam date might be being clung to as a signal of pace and cohort comparison, in truth they’re there for tuition fees and marking conveniences.

They used to give our daft and dated degree classification system a veneer of credibility insofar as they sorted students, but in a mass system it’s knowing what a student can do that is much more important than knowing that they’re better than others.

5 responses to “It smells like AI, but how do I prove it?

  1. The pace at which generative AI is developing is such that if this is a problem now, it will be catastrophic within a handful of years.

    I don’t think anyone has grasped how serious this is yet for higher educational contexts, and the sector will need to be quick and bold to essentially prevent the hollowing out of our societies intellectual capabilities through the mass outsourcing of language creation to a model. Stochastic parrots indeed.

    The answer can only be a swing back towards observed assessment of some kind becoming the norm, so as you say every effort needs to go into reimagining how that can work whilst minimising unnecessary mental health impacts and ensuring students are supported to show their best. And also, dare i say it, supporting students resilience to cope with periods of performance and stress: the rest of their life will involve moments like that – being able to cope with it is an important skill not just to be completely designed out.

  2. I recognise the concerns re AI but lets not pretend that in-person examinations dont have their own problems with regards to academic integrity. Its a myth that they are ‘AI or cheat proof’

    1. Of course they have problems. I can’t think of a single assessment method that doesn’t.

      But nothing like the risks posed by synthetic text extruding machines.

      Resolving this is urgent and serious. A regulator truly acting in the ultimate interests of students would have this issue as one of the most important issues to resolve, with the sector.

  3. It’s been a common refrain across the last decade, but more than ever we need to start thinking more ‘programmatically’ about assessment design. Ensuring we have appropriate observed methods to verify the knowledge, skills and capabilities that students must have on leaving a programme, and a more open approach for everything else. Yes students still need to be able to think for themselves, but also they need to be able to be supported to understand how use AI tools sensitively, ethically and professionally too – they’re not going away.

    I really like the University of Sydney model that supports this with ‘two lanes’: https://educational-innovation.sydney.edu.au/teaching@sydney/program-level-assessment-two-lane/

    There’s lots more information/presentations available too beyond the link above.

    1. They may not be going away but in their current guise GenAI LLMS are frequently wrong, environmentally catastrophic, and developed on the backs of vulnerable human labour undertaking some of the most psychologically damaging work for little pay or security. They threaten one of the very core cognitive development processes – original thinking and original writing. Yes there are lots of transformative benefits possible – particularly from more limited, focussed algorithms designed for particular purposes – but thats a very different matter to the wholesale outsourcing of basic learning related tasks to a neural network, such that individual learning is massively undermined.

Leave a reply