There’s no point comforting ourselves over AI and cheating when we don’t know what cheating is

I spent some time yesterday thinking about the polling that came out on students’ use of AI in higher education.

Jim is an Associate Editor (SUs) at Wonkhe

The headline was that a new HEPI/Kortext Policy Note found that “more than half of students” have used generative Al for help on assessments – but “only 5 per cent” were likely to be using Al to cheat.

The 51 per cent using it as a private tutor is fascinating – it’s always available for a start!

But I’m not convinced that framing the five per cent who just copy and paste Large Language Model (LLM) output as cheating, but the 13 per cent who get the LLM output and rewrite it to get it past detection as not cheating, is especially helpful.

What do rules and policies and survey questions even mean by work “produced” by AI?

Here’s the thing. If ask an LLM to explain something to me, that’s probably not cheating.

If I give it a chunk of text and ask it to correct grammar and spelling is that cheating? If I then ask it to rewrite that chunk of text in line with its feedback, is that cheating?

If I ask it to look at a chunk of text and give me feedback on it in terms of clarity or strength of argument, is that cheating? If I then ask it to rewrite that chunk of text in line with that feedback, is that cheating?

If I ask it for feedback on whether my argument is compelling or original, is that cheating?

If I ask it to suggest other arguments I could make that are related or other angles to explore, is that cheating?

If I ask it to find related research topics or debates or other authors on an issue, and then rinse repeat the above, is that cheating?

If I send it the marking rubric and ask it to grade my draft against said rubric, ask for suggestions to improve the mark and then ask it go ahead and redraft to improve the mark, which of those are cheating?

I’m astonished by 65 per cent of students thinking their university could spot work produced by AI. Unless we’re talking raw output from GPT 3.5 with no additional style prompts, that’s likely a boast by staff to deter that can’t be substantiated, and will potentially harmfully deter some students from using AI all.

And when we say “produced” by AI either in surveys or academic integrity policies, I assume everyone thinks about copying and pasting raw LLM output instead of all the other things I said.

Even where students are told to”declare” AI usage in their work, do they think they have to declare all use in all of the cognitive processes involved in putting an assignment together (see above) or just the “production” of a final “essay”?

And would they disclose, and would students share an understanding on what to disclose?

If we think about the process of tackling a written assignment, I think it’s clear that we are a million miles away from a shared understanding of what we expect students to be able to do without AI.

As long as we are there, those determined to cheat will, and those determined not to will rob themselves of vital tools – all when they see others using them and either “not getting caught” or “using AI tools effectively” and getting potentially better marks.

As long as staff insist that they can “spot it” is as long as a proper dialogue between staff and students about how they’re using it will be prevented.

I figure the three questions for students are:

  1. Is doing X within or against the rules
  2. Could I ever be caught if not
  3. Does not doing it disadvantage me?

If 1 isn’t clear, and 2 can’t be, the weigh up in three means deep unfairness. And where some staff or some universities regard chatting with the AI tools as cheating, it’s completely undetectable anyway – so test 2 doesn’t apply.

One test that we might usefully use is to ask if doing the same thing with another human would be cheating:

If I ask my module leader whether my argument is compelling, would that be cheating?

In theory, for all my examples above, we might say no.

But what has changed is that the ease of access to that isn’t human. With humans, I can chat when I can – and most of the time only really get feedback sporadically.

These days you can rinse-repeat the feedback process instantly, rapidly, and in the night – with so much ease that it has the potential to render grading meaningless.

And once I get feedback I can rarely say to the person “can you re do it for me implementing that feedback” – whereas LLMs will.

What is clear is that using AI in this way is very different to using an essay mill. And you can argue that knowing how to use an LLM to produce text in a way that leaves you with a coherent artefact – takes a lot of skill – one that will be useful to employers!

But it takes a lot less work and a lot less understanding and knowledge the better the tech gets. Even the understanding of the underpinning coherent artefact thing can be done for you. As such it’s still not clear what we want humans to do on their own.

And while you have to reach for some pretty dated assessment formats to get to a point where it could produce a complete (and passable) output for someone who didn’t actually know the subject, they are developing fast.

It quickly gets to a point where the only bit that’s “me” is the final rewrite to get past the standard rule that I didn’t use it to “create” an essay.

The HEPI recommendations also assert:

False positives– students being falsely accused of using generative AI against the rules – have been raised as a concern, but the number of cases where this has occurred appears vanishingly small.

It references this LSE policy blog on the invisible cost of resisting AI in higher education blog to back up the claim – but in fact the blog includes references to research demonstrating quite a significant problem with false positives. If there’s anything in the blog to back up the “vanishingly small” claim, I can’t find it.

And the recommendations in the LSE blog on assessment reform, inquiry-based objectives, problems and projects and performance-based assessments feel like a much better bet to focus on than the fairly obvious recommendations in the HEPI piece.

If we do stick with essays, playing with some of the tools in this thread or tools like these or these suggest that the main way that students are using AI these days isn’t to produce work – but as (as Microsoft brands it), a “co-pilot”.

But outside of exams, collusion is like a speed camera with no film in it – a rule that everyone knows is flouted, meaningless, unenforceable and confusingly simultaneously promoted as good for team/transferable skills. And it’s differentially available based on your belonging and access to others.

(And when done with an AI tool, unlike chatting to your mate, is confidential and so risk free)

Yet it seems to me that what we still can call collusion (co-pilot etc) is exactly what students are using G-AI for – rather than plagiarism.

Making our minds up about whether we regard that as collusion rather than teamworking and collaboration is hard – and relates back to the debate about what we want humans to be able to do on their own.

But if nothing else, when HEPI says “some were predicting the end of assessment as we know it” but in reality “higher education institutions have upheld standards of rigour”, I detect high doses of comforting copium usage rather than reality.

As long as the artefact is “an essay” then there are endless reasons to be questioning if the assessment is fit for purpose.

And while I like many would like to see a move away from a focus on assessment and cheating and move towards a focus on innovative teaching and helping students to use it all, that’s easier said than done when I hear of the spikes in formal allegations of AI usage being levelled at students.

Where there are policies right now that that say that AI can’t have been used in a way that “undermines learning outcomes”, says “your work must be your own” or advises that “asking AI to improve your grammar and writing is OK, but don’t submit that as your work, use it as a way to learn how to improve your own work”, I can see the logic and what authors are trying to do but, but ultimately it all becomes meaningless.

The idea in too much of what I’ve seen seems to be that what is submitted must have been “produced” by the student on their own. But that concept looks increasingly meaningless once feedback availability is (almost) synchronous during “production”. Policies and guidance of that sort need revisiting – fast.

2 responses to “There’s no point comforting ourselves over AI and cheating when we don’t know what cheating is

  1. Absolutely Jim. And the other thing that needs rewriting / rethinking fast is the whole concept of assessment, the purpose of assessment, and assessment design in higher education if we actually want to prepare our graduates for their working lives (and lives generally) in the 21st century… so many in HE are still complacent or think on campus exams is the way to go! *sigh*

Leave a Reply