The long-awaited REF-AI report prompts the sector to imagine an increasingly automated REF, but also leaves several important questions unanswered about what such a future might mean for the people and practices that underpin research assessment. Before we embed AI more deeply into REF2029, we need to pause and reflect on three issues that deserve much greater attention, starting with the long-term risks to disciplinary expertise.
Long-term impacts: Efficiency gains and the risk of skills erosion
Recommendation 15 in the report proposes that: “REF assessments should include a human verification step… confirming that final judgements rest on human academic expertise.”
This feels sensible on the surface. But the longer-term implications warrant more attention. Across many sectors, evidence shows that when automation takes on tasks requiring expert judgement, human expertise can slowly erode as roles shift from analysis to oversight. The report itself recognises this trend when discussing labour substitution and task reallocation.
REF processes already rely heavily on signals, heuristics and proxies, particularly under time pressure. Introducing AI may further reduce opportunities for deep disciplinary reading in panel work. If this happens, then by the 2030s or 2040s, the experts needed to meaningfully verify AI-generated assessments may become harder to sustain.
This is not an argument against using AI, but rather a suggestion that we need to consider the long-term stewardship of disciplinary expertise, and ensure that any AI integration strengthens, rather than displaces, human judgement. We don’t yet have expertise in how to collaborate effectively with AI systems and their outputs. This needs to be developed as a conscious endeavour to ensure that AI supports research assessment responsibly.
Learning from Responsible Research Assessment (RRA)
Over more than a decade, frameworks such as DORA, CoARA, the Hong Kong Principles and the Leiden Manifesto have laid out clear principles for responsible use of quantitative indicators, transparency, equity, and disciplinary diversity. The REF-AI report notes that in the interviews conducted: “Seldom was mention made of responsible research assessment initiatives such as DORA and CoARA… There is no clear view that the deployment of GenAI tools in the REF is antithetical to the ambitions of such initiatives.” But the absence of discussion in the focus groups does not necessarily mean a positive alignment, it may simply indicate that RRA principles were not a prominent reference point in the design or facilitation of the project.
A fuller analysis could explore how AI intersects with core RRA questions, including: i) How do we assess what we value, not just what is machine-readable? ii) How do we prevent AI from amplifying systemic inequities? iii) How do we ensure transparency in systems underpinned by proprietary models? and iv) How do we avoid metrics-by-stealth re-entering the REF through automated tools? These considerations are essential, not peripheral, to thinking about AI in research assessment.
Representation: A report on bias that overlooks some of its own challenges
Finally, representation. As the authors have acknowledged themselves, it is hard to ignore that the authorship team comprises four men, three of which are senior and white. This matters, not as a criticism of the individuals involved, but because who examines AI uptake shapes how issues of bias, fairness and inclusion are framed. Generative AI systems are widely acknowledged as being trained on text that contains gendered, racialised and geographical biases; the report also notes that: “Concerns of bias and inaccuracy related to GenAI tools are widely acknowledged…” What is less evident, however, is a deeper engagement with how these biases might play out within a national research assessment exercise that already shows uneven outcomes for different groups.
A similar issue arises in the dataset. Half of the interviewees were from Russell Group institutions, despite the Russell Group representing around 15 per cent of REF-submitting HEIs. The report itself notes that experimentation with AI is concentrated in well-resourced institutions: “Variation in experimentation with GenAI tools is mainly influenced… by institutional resource capacity.”
Given this, the weighting of the sample will skew the perspectives represented. This does not necessarily invalidate the findings, but it does raise questions about whether further, broader consultation would strengthen confidence in the conclusions drawn.
Doing it better?
The report does an excellent job of surfacing current institutional anxieties. Larger, well-resourced universities appear more open to integrating AI into REF processes; others are more cautious. Survey findings suggest notable scepticism among academics, particularly in Arts, Humanities and Social Sciences. Despite this, the report signals a direction of travel in which REF “inevitably” becomes AI-enabled and eventually “fully automated.” Whether this future is desirable, or indeed equitable, remains an open question.
The REF-AI report is therefore best read as an important starting point. For the next phase, it will be vital that Research England broadens the conversation to include a wider diversity of voices, including experts in equality and inclusion, disciplinary communities concerned about long-term skills, those with deep experience in RRA, smaller institutions, and early career researchers who will inherit whatever system emerges.
This more diverse team must be given licence to make bold decisions about not just what’s inevitable but what’s desirable for the research ecosystem the REF ultimately seeks to monitor and shape. We cannot simply pay lip service to principles of responsible research assessment, equity, diversity and inclusion, and ignore the resulting outcomes of the decision-making processes shaped by those principles.
AI will undoubtedly shape aspects of future research governance and assessments. The challenge, now, is to ensure that its integration reflects sector values, not just technological possibility.