This article is more than 9 years old

by James Wilsdon

18/04/16

Consensus and conflict: what do responses to Stern tell us about the future of the REF?

James Wilsdon has been analysing responses to the Stern review of the REF from different parts of the sector and asks what various trends and conflicts across the responses tells us about the REF review and its likely future proposals?

This article is more than 9 years old

by James Wilsdon

Analysis

18/04/16

pull-push-small — Image: Gary Waters/Ikon

James Wilsdon

by Mark Leach

staff

6/11/15

James Wilsdon is professor of research policy at University College London and executive director of the Research on Research Institute (RoRI)

Four areas of consensus

The centrality of peer review

There is overwhelming support for maintaining peer review by expert panels as the core method of REF assessment. The Royal Society acknowledges “its administrative burden” but insists that “peer review should remain at the heart of the REF.” The Russell Group notes that a key factor in the REF’s success is “the rigour of the assessment process, making it essential that expert peer review is retained at the core of the exercise.” Universities UK calls for an “unequivocal commitment to peer review as underpinning quality judgements in the next iterations of the REF” and University Alliance says that peer review is “globally recognized as a gold standard of research assessment.”

A discordant note is struck by the Council for the Defence of British Universities (CDBU), which notes that its membership “finds the previous methods used in the REF inadequate,” because of concerns about the volume of outputs assessed and the inadequate expertise of panel members. But it acknowledges that no easy alternative lies between “the devil of simple metrics….and the deep blue sea of ‘peer review’”.

Others point to sampling as one option that would enable peer review to be retained, even if the volume of outputs to be assessed increases (which would be an automatic consequence of reducing selectivity and submitting all staff). The British Academy points out that sampling “is a prime research methodology in many social science subjects and there is much academic expertise on sampling methods in the UK.” It encourages BIS to commission research on how sampling could be more widely used in the next REF. The Academy of Social Sciences agrees, noting that sampling “might be applied differentially to high volume UoAs, or might be stratified in a number of ways (by self-nominated excellence score, categories of research, interdisciplinary research etc.).” The Russell Group is more cautious on this issue, but suggests that a pilot should be carried out to determine whether “an algorithm could be designed to implement sampling in a fair way.”

There’s only limited scope for metrics

The corollary of this staunch defence of peer review is a uniform resistance to a metrics-based approach. I was encouraged to see the vast majority of responses (including the Russell Group, Royal Society, British Academy, UUK, GuildHE, and several HEIs) citing The Metric Tide and endorsing our conclusion that “no metric can currently provide a like-for-like replacement for REF peer review.” GuildHE expresses particular concern over the application of metrics to arts and humanities research “where there are few standard practices for citations and a greater diversity of research methodologies and modes of outputs” while the British Academy argues that “using quantitative measures to assess research quality is fraught with danger.”

Another Russell Group member warns that a move to metrics “carries risks for the reputation of REF and the quality assessments it makes.”

So it’s fair to say that a metrics-based REF is roundly rejected. Only one HEI endorses their use as a primary mode of assessment, and even then, only for Panels A and B. HEFCE itself notes in its response: “we have concluded that metrics should not replace peer review as the primary approach to the assessment in the next REF.” There is also widespread recognition that robust metrics for impact don’t yet exist, such that narrative case studies, assessed by peer review, remain the best option. Faced with such unanimity from the sector, and the weight of evidence in The Metric Tide, I would be very surprised if Lord Stern comes out in favour of an all-metric REF.

This isn’t to say that quantitative data can’t play a greater role in aspects of the next exercise. The Royal Society supports “judicious use of metrics” to inform and supplement peer review for certain panels. And there is widespread support for the use of metrics in the environment section of the REF (also recommended in The Metric Tide), particularly if this draws on data that is already being collected by HEIs, or through HESA. One HEI offers a detailed list of indicators that could be used, including demographics (age and career stage); proportion of academic staff with a PhD, or a background in professional practice; the diversity of a unit, drawing on Athena Swan; research income; PhD awards; outputs with national and international collaborators. Another notes that “by ensuring information on total eligible staff are included…game playing on income per FTE etc. will be minimized.”

If metrics are to be more widely used in the environment section, several HEIs draw attention to the need for greater consistency of data definitions and interoperability between data collection processes – particularly between the Research Councils and the REF. As one Russell Group member puts it: “integration of data across all research agencies is required to…ensure efficiency across the system. Greater interoperability across existing Government, funder and institutional systems (e.g. Researchfish) is the logical place to start.” HEFCE agrees and says that it hopes to work through a new Forum for Responsible Metrics (proposed in The Metric Tide) to agree which indicators should be used, and how interoperability could be advanced. The new High-Level Landscape Mapping Group, chaired by Jane Elliott, chief executive of the ESRC, also has an important role to play here.

While there’s consensus over the limited use of metrics in the next REF, several responses suggest that the option should remain open beyond that. Universities UK says that “the potential and efficacy of metrics continues to evolve and their use as a research assessment tool should be periodically reviewed.” Another London-based university encourages further experimentation with metrics, combined with “vigilance as to unintended consequences”. In a “rapidly developing field”, it concludes that “regular reviews of the use of metrics…would be sensible.”

Impact is now a valued part of the exercise

Despite the trials and tribulations that accompanied the introduction of impact in the last REF, the sector has grown firmly attached to it. The University of Sussex speaks for several HEIs in arguing that: “The inclusion of impact in 2014 has demonstrated the breadth and depth of socio-economic benefits and value across all subject areas…has also spurred engagement…[and] is now becoming embedded in research practice and institutionalised in policies and processes, such as recognition and promotion criteria.”

Most responses favour maintaining the impact weighting at twenty per cent of the exercise, although Million+ suggests it “should be increased” and the Russell Group says that twenty-five per cent “could be considered.” More attention is paid to how the definition of impact could be broadened. The British Academy points out that impact “is often achieved through a ‘web of influence’ rather than a linear progression.” It criticizes the way that REF2014 “implied a ‘but for’ model of causation which is not applicable to the whole range of wider benefits of research, particularly in HSS.” And it argues that the rigid link to underpinning research needs to be relaxed, so that impacts can be based “on a body of research, knowledge and expertise, rather than discrete outputs.” More controversially, the Royal Society wants to extend the scope to include impacts on other academic research. The CDBU agrees, calling it “perverse to exclude scientific or academic impact.”

For the mechanics of impact assessment, there’s strong support for absorbing the impact template (which units had to submit alongside the case studies) into the environment part of the REF, so that it is considered as one aspect of the wider research environment. A few responses suggest reducing the number of case studies required, to one per 25 FTE staff. Others emphasise the need to allow the resubmission of impact case studies in the next cycle, providing there is continued evidence of impact since REF2014.

The balance of units of assessment is about right

The number of units of assessment was reduced from 67 in RAE2008 to 36 in REF2014. There is little appetite for further change. Universities UK suggests that the REF “has achieved a reasonable balance between size of a UoA, rigour of peer review and manageability of the panel’s workloads.”

Displaying an unusually relaxed attitude to numerical precision, the Royal Society says that the number of UoAs is “about right.” The Russell Group agrees, and worries that further reductions in the number of UoAs “may make it hard to identify excellence within individual fields and could undermine the confidence of some parts of the academic community…that their research will be assessed by sub-panels without appropriate expertise.” University Alliance argues that the present “granularity of the REF is its greatest strength and must be maintained.”

While the main message here is one of continuity, a handful of units are earmarked for modest tweaks. Four responses mention the need to fine-tune the scope of the four engineering units (UoAs 12-15), and two say the same about UoA 3 (Allied Health Professions, Dentistry, Nursing, and Pharmacy).

Four areas of disagreement

The purposes of the REF

As I discussed recently on Wonkhe, the approach that you adopt to reform of the REF is shaped by how narrowly or broadly you define its purposes. Responses to Stern reflect a spread of opinion on this issue.

A vocal minority want to scale the exercise back to a single core purpose. The Royal Society is a proponent of this view, arguing that the REF “should be designed around its primary purpose – the allocation of QR funding – with consideration for the culture it can create.” The CDBU goes further, suggesting that “problems of the REF arise because of the mission creep that has occurred over the years” and calling for “a return to the pre-REF/RAE days of the 1980s when the funding allocation was institution-based and…not bundled in with other functions.”

Most responses insist on a more plural interpretation of purposes. There is pushback against the way the Stern consultation frames this issue, by implying that REF “is mainly a tool to allocate QR at institutional level.” University Alliance “rejects the premise of this question” and refers instead to the three stated purposes of the exercise. Another leading HEI wants to “strongly challenge the view that allocation of QR is the primary purpose.” The Russell Group acknowledges “several important purposes” but says the REF should not be used as a regulatory or compliance tool.

HEFCE endorses my suggestion that the REF now serves five distinct purposes. The University of Sussex agrees, summarising these as “to inform allocation; to provide accountability; to provide benchmarking information; in influencing research cultures and behaviours; and in supporting institutional management.”

Several HEIs describe how they make use of REF information in management processes. A leading Scottish university says: “We use the REF results at an institutional level to focus our research. As the world’s largest single peer-review exercise, it is highly rigorous and gives us data that inform our research strategies and indicates growth areas, areas for retraction, where more focus is needed to increase quality and where we should invest selectively to our and the UK’s benefit.” Another London HEI describes how the REF process has become “an integral part of our strategic planning processes…It is used to shape our research outcome, and our significant improvement in performance, compared with others, was a result of using the REF data as a management tool.”

The appropriate level of assessment

The sharpest disagreements concern the level at which assessment should occur, and the respective role of individuals, units, and institutions. From the Russell Group, there’s a strong push for greater emphasis on “higher concentrations” and “critical mass” at the institutional level. The Royal Society agrees, and proposes that outputs should be determined “at the institutional rather than individual level”, and then assessed by discipline-specific panels. It says that this would allow the overall number of outputs being assessed to be reduced, with volume determined “through an algorithm applied to HESA data.”

Others fear that moves in this direction would result in an elite tier of universities scooping up an even larger share of QR funding. University Alliance warns that “aggregation of assessment at institutional level would destroy dynamism” and Million+ adds that it would “mask the performances of universities with discrete areas of research in only some areas.” University of Sussex says that “shifting…to a simple, institutional funding mechanism would remove the ability to use [the REF] to incentivize behaviours.”

For outputs, a compromise position which enjoys wider support would involve decoupling outputs from individuals, and determining them at the UoA (rather than institutional) level, using an appropriate ratio of outputs-per-FTE (just as the impact case studies were selected for REF2014). Following a recommendation in the REF Manager’s Report, HEFCE planned to float this as a serious option in its unpublished autumn 2015 consultation.

In its response to Stern, HEFCE notes that decoupling would “remove the need for arrangements to account for individual staff circumstances” – a significant slice of the administrative burden of REF2014. And it suggests that this could also lead to an “increased focus on the submitting unit as a whole, removing the current consequences for morale of non-submission.” Several universities signal support for this change, as does GuildHE, which describes it as “an attractive proposition…[which] would allow a more coherent assessment of quality across research areas.” Another HEI suggests that decoupling should be accompanied by a rule change to make all outputs which carry the address of the submitting institution eligible, irrespective of whether the author has moved on at the time of the REF. This, it suggests, would “reduce the game-playing process of head-hunting stars prior to REF deadlines.”

The case against decoupling is made most forcefully by the British Academy, which says that it “would demand complex rules to ensure a spread of researchers were producing outputs for assessment; there was no appetite for this in our community.”

For impact, most responses support continued assessment of case studies at the unit – rather than institutional – level, although a handful suggest that they could be aggregated and assessed by the main panels, instead of the sub-panels.

For the environment element, there is more enthusiasm for institutional-level reporting; particularly on issues like PhD training, equality and diversity, where HEIs tend to have uniform policies and frameworks. But others have concerns that granularity of the exercise would be reduced. As one Scottish university argues: “The assessment of an institutional-level environment, although undoubtedly more time-efficient to collate than UoA-level data, would not have the advantage of guiding HEIs to address local weaknesses…or indeed to identify unappreciated areas of excellence. For this reason, institutional-level reporting would not be recommended.”

Selection or inclusion of all staff

A related issue which divides respondents is whether HEIs should retain the right to select which staff are submitted, or whether all eligible staff should automatically be included. This provoked fierce debate after REF2014, in light of the hyper-selective approach taken by some HEIs, to drive up their average scores.

Several responses favour a shift to a fully inclusive model. As one Russell Group member puts it: “an approach based on the submission of all eligible staff would significantly increase the transparency of the exercise, reduce the potential for gaming and spin, end uncertainty for staff over their inclusion, and provide a robust measure of volume for funding purposes.” The British Academy agrees, noting that the REF selection process “is divisive within an institution, can be damaging to individual research careers, and is potentially discriminatory.”

A counter argument is made by University Alliance, which says “aggregation through a whole-staff approach would be expensive and would undermine research culture, career progression and research-informed teaching.” A leading Welsh university, which took a selective approach to REF2014, notes that “the rules for REF 2014 encouraged institutions to make informed strategic choices about the shapes of their submissions. This is distinct from ‘game-playing’ and such an approach will continue to be a feature of preparations for any future exercises.”

A particular concern about a fully inclusive model is that it might incentivise HEIs to move more staff onto teaching-only contracts, and so weaken the links between research and teaching across the system. Universities UK weighs up the pros and cons in its response and concludes that: “On balance, we are in favour of retaining the selective approach…While requiring submission of 100% of research staff (or a minimum proportion) has many merits…this could undermine the ability of some institutions to develop capability and capacity, and nurture pockets of excellence that are important to the overall health, diversity and vitality of the research base.”

The scale of reform required

A final point on which there’s limited consensus is the overall scale of reform required. Taken as a package, the proposals being made by organisations like the Royal Society and the CDBU would amount to a radical overhaul of the REF. Others caution against anything more than a set of incremental tweaks.

According to Universities UK, any fundamental change “to the form, structure and focus of the REF risks incurring significant transaction costs, and loss of the confidence of the research community.” GuildHE cautions against “unnecessary tweaking of the parameters used in assessment or destabilising of the system”, which could prove expensive as HEIs “have already begun preparations.”

Several responses encourage the government to clarify its plans as soon as possible so that preparations for the next REF can proceed with certainty. The Russell Group notes that “the cumulative impact of the recent HE Green Paper, Nurse Review and Stern Review will be a disruptive one and time will be needed for the outcome of these consultations to be implemented and embedded into a new overarching architecture.”

This prompts the Russell Group – and several of its members in their own responses – to call for a further relaxation of the timetable for the next REF, with 2022 given as the preferred date to give enough time for a post-general election spending review to take place in late 2020 or early 2021. There is scant enthusiasm for an interim assessment at the mid-point of the cycle, because of concerns over the robustness of any methodology (although one HEI does outline a framework of “multi-stage assessment” that it thinks could be made to work).

Where next?

These eight points give a good flavour of the responses that I’ve read. And they highlight the task that now confronts Lord Stern, as he weighs up all the evidence and tries to design a framework that is sensitive to diverse views while meeting the demands for simplification and burden reduction that prompted his review in the first place.

There are some other interesting ideas contained in the mix that I hope Lord Stern will consider. To limit REF-related spin, one university calls for “the endorsement by the Government of a single metric (e.g. research intensity) that could be used for comparison of outcomes by the HE sector and for public information, at the point of publication of the REF results.” University Alliance makes an eye-catching proposal to transfer the expertise and analytical capacity that HEFCE has built up around the REF into “the nucleus of a new national research analysis unit”, based in Research UK. And the CDBU suggests including a broader set of indicators, including staff satisfaction, within the exercise (although this sits somewhat oddly with their opposition to student satisfaction measures, including the NSS, within the TEF).

So where are we likely to end up? Prediction is a mug’s game, but if I was trying to craft an exercise that reduces burden, keeps most of the sector onside, and blends the best of these proposals, it might look something like this:

Peer-review based, with a similar mix of UoAs to REF2014;
Fully inclusive of all research-active staff, with outputs decoupled from individuals to minimize negative consequences for staff;
Modest use of metrics to supplement peer review, accompanied by a wider drive to improve data coverage and interoperability;
The introduction of sampling to handle an increased volume of outputs in a more manageable way;
Impact to remain case study based on a weighting of twenty per cent, with a broader set of definitions and looser links to underpinning research;
Environment moving to institutional-level, becoming more reliant on metrics and absorbing the impact template;
A more relaxed timetable, with the next assessment in 2022, allowing for pilots of the sampling method (and for any delay in the Stern Review itself, which is now rumoured to be pushing back to autumn 2016).

We’ll know soon enough.

post list Latest articles

Student engagement does not work if institutions are stuck in survival mode

by Jonathan Eaton

Comment

28/11/25

Wonkhe-Scaffold-Framework — Image: Shutterstock

Skills England has a new way to talk about skills, and the sector needs to listen

by David Kernohan

Analysis

28/11/25

Higher education postcard: Peterhouse, Cambridge

by Hugh Jones

Comment

28/11/25

Wonkhe_WonkheShow_Social_Blue@2x — Image: Wonkhe

Podcast: Budget, R&D, Scotland’s tertiary bill

by Team Wonkhe

Podcasts

27/11/25

Universities now need to be much clearer about the total cost of a course

by Jim Dickinson

Analysis

27/11/25

Red,And,Blue,Pill,Choice,As,A,Person,At,A — Image: Shutterstock

The post-matrix university – trust, relevance, and the politics of plugging back in

by Amanda Broderick

Comment

27/11/25

Budget 2025 for universities and students

by Team Wonkhe

Policy Watch

26/11/25

Commuter students at station — Image: Shutterstock

The future of financial hardship support needs to be flexible

by Peter Gray

Comment

26/11/25

Wonkhe-Wales-Parliament-Roof — Image: Shutterstock

Five challenges faced by the Welsh tertiary sector

by Vikki Howells

Comment

26/11/25

Shutterstock_587475092 — Image: Shutterstock

A change in approach means research may never be the same again

by James Coe

Comment

25/11/25

8 Comments

Oldest

Newest

Inline Feedbacks

View all comments

Stuart Harvey

9 years ago

“A leading Welsh university, which took a selective approach to REF2014” – not particularly anonymous!

Peter Wilson

9 years ago

Thank you, James, an excellent and very helpful summary. But also depressing. Where does academic freedom fit into the equation? With all this talk of the use of the REF ‘as a management tool’, to ‘influence research agendas’, to ‘indicate research areas for growth and retraction’, to ‘shape research outcomes’ and inform ‘strategic planning’ are we not witnessing a creeping authoritarianism in university life? Academic freedom is the source of everything valuable in academic life. But those who produce these institutional responses seem largely unaware of its centrality, and also its delicacy. In our increasingly economistic and instrumentalist age they… Read more »

Stephen Kemp

9 years ago

How remarkably prescient – correct on most counts!

Four areas of consensus

The centrality of peer review

There’s only limited scope for metrics

Impact is now a valued part of the exercise

The balance of units of assessment is about right

Four areas of disagreement

The purposes of the REF

The appropriate level of assessment

Selection or inclusion of all staff

The scale of reform required

Where next?

Share

Share

post list Latest articles