Is it time for a 5* category in the REF?

Overall, most of the Stern Review recommendations are sensible. Perhaps the only major concern is the possible effect on those starting on the academic career ladder not being able to tout their REF-able outputs as confidently when looking for a new job, although this of course may still be finessed during the funding councils’ consultation to come.

There is, however, a different issue. It may be necessary for policy makers to give some thought to differentiation at the top, particularly when it comes to outputs. These are still expected to be worth just shy of two-thirds of the overall profile, and presumably funding in future QR.

So what is the problem? Well, with around half as many research outputs per person now needed (two or less instead of an average of nearly four in the last REF), and more flexibility in how they are selected (up to a maximum six per person), will those institutions who already submit a high proportion of staff be overloaded with 4* submissions?

As an extreme case consider the 23 unit-of-assessment submissions to REF 2014 that scored 4* on 50% or more of their outputs. Halving the number needed to be submitted could easily result in a 100% 4* profile, particularly when we consider the added flexibility in selecting submitted outputs.

This might be counteracted by the new requirement to submit all research active staff. However, even if we assume that those additional staff are not somehow made ‘ineligible’ due to sleight of hand around contractual status, there may not be that many extra staff available anyway. The aforementioned 23 submissions are primarily from institutions that were fairly intensive (non-selective), including Oxford, Cambridge, LSE, Southampton, Birmingham and Bristol. Some of the additional 75 unit-of-assessment submissions that scored 4* on over 40% of their outputs might also be able to approach a 100% 4* submission. There is now little incentive to continually improve for the next REF, but instead just grow bigger.

So, you might counter: that is nearly 100 of the 1,911 unit-of-assessment submissions in REF 2014, or 5%. That’s an edge effect, right?

Well, if we are talking about 100% 4* submissions, then yes. But at present REF scores are funded based on the 4:1:0:0:0 ratios, so there are three variables in play: 4*, 3* and less than 3*. If we extend the above example to those REF submissions with 50% 3* and above in the last REF, and so infer they might attain 100% 3* submissions in the next REF, what do we have?

Well, 385 of the 1,911 UOA submissions… failed to score at least 50% 3*. Even if we round up a bit for all the assumptions and unknown – and change our threshold at 60% 3* or more to have a reasonable chance of having close to 100% 3* or more in REF 2021 – we only lose another 312 units of assessment. Or to put it another way, in the 2014 REF, 1,198 of the 1,911 unit-of-assessment submissions (62%) for REF 2014 submissions scored 60% or over at 3* and 4*. We have not even considered any real-terms improvement in UK research quality, and yet there is a very real chance that well over half the unit-of-assessment submissions will be able to attain 100% 3* and 4* outputs sub-profiles.

This means that the funding formula for the majority of submissions will be restricted to using two variables: the numbers of FTE and the balance of 3* and 4*. This would be unprecedented and does not allow much nuance. As the proportion of 4* approaches 100% the formula may as well just be based only on FTE.

How do we avoid such a situation? We may need to look at issues not covered by Stern, such as increasing differentiation in the quality profiles: how about adding a 5* rating? Comparisons with previous exercises would be more problematic, but not impossible, and the “all staff” change has already created a much bigger backwards-compatibility issue.

The biggest problem might be what to call 5*. 4* is “world-leading”, so 5* is “world-beating”, “solar system-leading”, “galactic defender”? In any case, the issue of non-differentiation at the top seems to be a real risk.