Dear REF, please may we have a SEP?

What should replace the REF? Elizabeth Gadd is looking to the Netherlands

Among all the recently research-related news, we now know that UK universities will be making their submissions to the Research Excellence Framework on 31 March 2021.

And a series of proposals are in place to mitigate against the worst effects of COVID-19 on research productivity. This has led to lots of huffing and puffing from research administrators about the additional burden and another round of ‘What’s the point?’ Tweets from exasperated academics. And it has led me to reflect dreamily again about alternatives to the REF and whether there could be a better way. Something that UKRI are already starting to think about.

Going Dutch

One of the research evaluation approaches I’ve often admired is that of the Dutch Standard Evaluation Protocol (SEP). So when I saw that the Dutch had published the next iteration of their national research evaluation guidance, I was eager to take a look. Are there lessons here for the UK research community?

I think so.

The first thing to say of course, is that unlike REF, the Dutch system is not linked to funding. This makes a huge difference. And the resulting freedom from feeling like one false move could plummet your institution into financial and reputational ruin is devoutly to be wished. There have been many claims – particularly at the advent of COVID-19 – that the REF should be abandoned and some kind of FTE-based or citation-based alternative used to distribute funds. Of course the argument was quickly made that REF is not just about gold, it’s about glory, and many other things besides. Now I’m no expert on research funding, and this piece is not primarily about that. But I can’t help thinking, what if REF WAS just about gold? What if it was just a functional mechanism for distributing research funds and the other purposes of REF (of which there are five) were dealt with in another way? It seems to me that this might be to everybody’s advantage.

And the immediate way the advantage would be felt perhaps, would be through a reduction in the volume and weight of guidance. The SEP is only 46 pages long (including appendices) and, perhaps with a nod to their general levity about the whole thing, is decorated with flowers and watering cans. The REF guidance on the other hand, runs to 260 pages. (124 pages for the Guidance on Submissions plus a further 108 pages for the Panel Criteria and Working methods and 28 pages for the Code of Practice – much of which cross-refers and overlaps).

And if that’s not enough to send research administrators into raptures, the SEP was published one year prior to the start of the assessment period. Compare this to the REF where the first iteration of the Guidance on Submissions was published five years into the assessment period, and where fortnightly guidance in the form of FAQs continues to be published, and where we are still yet to receive some of it months before the deadline.

Of course, I understand why the production of REF guidance is such an industry: it’s because they are enormously consultative, and they are enormously consultative because they want to get it right, and they want to get it right because there is a cash prize. And that, I guess, is my point.

But it’s not just the length of course, it’s the content. If you want to read more about the SEP, you can check out their guidance here. It won’t take you long – did I say it’s only 46 pages? But in a nutshell: SEP runs on a six-yearly cycle and seeks to evaluate research units in light of their own aims to show they are worthy of public funding and to help them do research better. It asks them to complete a self-evaluation that reflects on past performance as well as future strategy, supported by evidence of their choosing. An independent assessment committee then performs a site visit and has a conversation with the unit about their performance and plans, and provides recommendations. That’s it.

Measure by mission

The thing I love most about the new SEP is that whilst the ‘S’ used to stand for ‘Standard’, it now stands for ‘Strategy’. So unlike REF where everyone is held to the same standard (we are all expected to care 60% about our outputs, 15% about our research environment and 25% about real-world impact), the SEP seeks to assess units in accordance with their own research priorities and goals. It recognises that universities are unique and accepts that whilst we all love to benchmark, no two HEIs are truly comparable. All good research evaluation guidance begs evaluators to start with the mission and values of the entity under assessment. The SEP makes good on this.

And of course the benefit of mission-led evaluation is that it takes all the competition out of it. There are no university-level SEP League tables, for example, because they seem to have grasped that you can’t rank apples and pears. If we really prize a diverse ecosystem of higher education institutions, why on earth are we measuring them all with the same template?

Realistic units of assessment

In fact, I’m using the term ‘institutions’ but unlike the REF, the SEP at no time seeks to assess at institutional level. They seek only to assess research at the level that it is performed: the research unit. And the SEP rules are very clear that “the research unit should be known as an entity in its own right both within and outside of the institution, with its own clearly defined aims and strategy.”

So no more shoe-horning folks from across the university into units with other folks they’ve probably never even met, and attempting to create a good narrative about their joined-up contribution, simply because you want to avoid tipping an existing unit into the next Impact Case Study threshold. (You know what I’m talking about). These are meaningful units of assessment and the outcomes can be usefully applied to, and owned by, those units.

Evaluate with the evaluated

And ownership is so important when it comes to assessment. One of the big issues with the REF is that academics feel like the evaluation is done to them, rather than with them. They feel like the rules are made up a long way from their door, and then taken and wielded sledge-hammer-like by “the University”, AKA the poor sods in some professional service whose job it is to make the submission in order to keep the research lights on for the unsurprisingly ungrateful academic cohort. It doesn’t make for an easy relationship between research administrators and research practitioners.

Imagine then if we could say to academic staff, we’re not going to evaluate you any more, you’re going to evaluate yourselves. Here’s the guidance (only 46 pages – did I say?) off you go. Imagine the ownership you’d engender. Imagine the deep wells of intrinsic motivation you’d be drawing on. Indeed, motivational theory tells us that intrinsic motivation eats extrinsic motivation for breakfast. And that humans are only ever really motivated by three things: autonomy, belonging and competence. To my mind, the SEP taps into them all:

  • Autonomy: you set your own goals, you choose your own indicators, and you self-assess. Yes, there’s some guidance, but it’s a framework and not a straight-jacket and if you want to go off-piste, go right ahead. Yes, you’ll need to answer for your choices, but they are still your choices.
  • Belonging: the research unit being assessed is the one to which you truly belong. You want it to do well because you are a part of this group. Its success and its future is your success and your future.
  • Competence: You are the expert on you and we trust that you’re competent enough to assess your own performance, to choose your own reviewers, and to act on the outcomes.

The truth will set you free

One of the great benefits of being able to discuss your progress and plans in private, face-to-face, with a group of independent experts that you have a hand in choosing, is that you can be honest. Indeed, Sweden’s Sigridur Beck from Gothenburg University confirmed this when talking about their institution-led research assessment at a recent E-ARMA webinar. She accepted that getting buy-in from academics was a challenge when there was nothing to win, but that they were far more likely to be honest about their weaknesses when there was nothing to lose. And of course, with the SEP you have to come literally face-to-face with your assessors (and they can choose to interview whoever they like) so there really is nowhere to hide.

The problem with REF is that so much is at stake it forces institutions to put their best face on, to create environment and impact narratives that may or may not reflect reality. It doesn’t engender cold, hard, critical self-assessment which is the basis for all growth. With REF you have to spin it to win it. And it’s not just institutions that feel this way. I’ve lost count of the number of times I’ve heard it said that REF UoA panels are unlikely to score too harshly as it will ultimately reflect badly on the state of their discipline. This concerns me. Papering over the cracks is surely never a good building technique?

Formative not summative

Of course the biggest win from a SEP-style process rather than a REF-style one is that you end up with a forward-looking report and not a backward-looking score. It’s often struck me as ironic that the REF prides itself on being “a process of expert review” but actually leaves institutions with nothing more than a spreadsheet full of numbers and about three lines of written commentary. Peer review in, scores out. And whilst scores might motivate improvement, they give the assessed absolutely zero guidance as to how to make that improvement. It’s summative, not formative.

The SEP feels truer to itself: expert peer review in, expert peer review out. And not only that but “The result of the assessment must be a text that outlines in clear language and in a robust manner the reflections of the committee both on positive issues and – very distinctly, yet constructively – on weaknesses” with “sharp, discerning texts and clear arguments”. Bliss.

Proof of the pudding

I could go on about the way the SEP insists on having ECRs and PhD students on the assessment committee; and about the way units have to state how they’re addressing important policy areas like academic culture and open research; and the fact that viability is one of the three main pillars of their approach. But you’ll just have to read the 46-page guidance.

The proof of the pudding, of course, is in the eating. So how is this loosey-goosey, touchy feely approach to research evaluation actually serving our laid-back low-country neighbours?

Pretty well actually.

The efficiency of research funding in the Netherlands is top drawer. And whichever way you cut the citation data, the Netherlands significantly outperforms the UK. According to SciVal, research authored by those in the Netherlands (2017-2019) achieved a Field Weighted Citation Impact of 1.76 (where 1 is world average). The UK comes in at 1.55. And as far as I can see, the only countries that can hold a candle to them are Denmark, Sweden and Switzerland – none of which have a national research assessment system.

It seems to me that we have so much to gain from adopting a SEP-style approach to research evaluation. In a post-COVID-19 world there is going to be little point looking back at this time in our research lives and expecting it to compare in any way with what’s gone before. It’s time to pay a lot less attention to judging our historical performance, and start thinking creatively about how we position ourselves for future performance.

We need to stop locking our experts up in dimly lit rooms scoring documentation. We need to get them out into our universities to meet with our people, to engage with our challenges, to breathe our research air, and to collectively help us all to be the best that we can be – whatever ’best’ may look like for us. I believe that this sort of approach would not only dramatically reduce the burden (I’m not sure if I said, but the SEP is only 46 pages long), but it would significantly increase buy-in and result in properly context-sensitive evaluations and clear road-maps for ever-stronger research-led institutions in the future.

Frankly, I don’t want to come out of REF 2027 with another bloody spreadsheet, I want us to come out energised having engaged with the best in our fields, and positioned for the next six years of world-changing research activity.

7 responses to “Dear REF, please may we have a SEP?

  1. Yes! Totally agree – this is the way forward. All through the R and D roadmap were references to ‘reducing bureaucracy’ without identifying what ‘bureaucracy’ was targetted – but for those of us currently deep in REF spreadsheets – we have an idea. 46 pages of guidance and a peer review meeting to discuss a self assessment document – that pretty much does the trick and ends us all up in a better place.

  2. Interesting article, and indeed we have a lot to learn from other nations.

    One reason why REF is so focused on “selectivity” (aside from elitist policy assumptions) is that our universities are so diverse. Many countries have kept something like our pre-92 binary system. It seems to me that doing so allows a more uniform approach to funding research universities — simply because they are fewer and more homogeneous. And that might be what is going on in the Netherlands, since I *believe* (could be wrong) the Netherlands has a more stratified system than the UK does. That’s not necessarily a killer argument for stratification… but I was a bit disappointed the article didn’t explore this.

    Also it makes me anxious whenever a report suggests replacing the REF funding formula with headcount or citation metrics. These are terrible ideas and would kill off any pretence at a “dual support” system. So yes, REF is expensive and flawed. But please let’s not blunder into something even worse. Dual support is already in need of revival, because 80% FEC on grants leads to much or all QR being soaked up by those underfunded grants, rather than providing the intended “seedcorn” capacity. The soak rate has become critical in recent decades as government funding has massively boosted project grants but not QR. This has also screwed up early career pathways… the money funds only short-term postdocs on projects, not a body of established staff at institutions. Ditching dual support would likely make these problems worse, by locking in a stronger “rich get richer” effect.

  3. Far too sensible. After all, how could one trust guidance that runs to only … how many pages was it again?

  4. I suspect that the reason the REF involves such complex documented policy, rules and processes is that very little published research delivers measurable outcomes in the form of products and services. Academics talk about impact but are currently locked into a fractured front-end innovation process that ignores consumers, customers and end-users, design thinking and agile project management. The context of value is not understood, therefore it does not exist.

  5. “that very little published research delivers measurable outcomes in the form of products and services”

    Go and read-up on what academic research (that gets published in academic journals) is vs commercial research and applied research (transfer and commercialisation) and come back with a more nuanced comment.

  6. Thank you for this. I did the HEFCE funded evaluation of RAE impact of the 1992 exercise. It was ignored in the main, but its findings have been confirmed by others in the 23 years since it was published (after the 1996 exercise had been completed). I have long advocated the recognition of a variety of excellences, which means assessing work within a ‘fitness for purpose’ approach, rather than the one size fits everything isomorphic pressure with excellence defined by staff from a subgroup of institutions – there were no modern university reps on the Stern Committee – who then dominate panels whose judgements are apparently influenced, like the Eurovision Song Contest, by where the research is done as well as how good it is: the environment category formalises this, but it was evident when I was a sub-panel member, and the pressures within panels to conform to a dominant norm have recently been recorded by Neyland et al. The development funding of the 1992 exercise quickly disappeared as elite institutions lost money, so there is no investment in improvement. Jon Adams and colleagues have demonstrated that concentration of funding has gone so far it is inefficient, so the value for money mantra that drove the first exercises no longer operates. I hope things may change if levelling up, local impact of smaller projects done in Mode 2 partnerships and a link to disadvantaged communities and regions become an element in research policy in operation.

  7. This contribution indeed identifies some of the crucial choices and characteristics of the Dutch protocol. I am member of the standing working group SEP. In reaction to some remarks and passages:

    The new protocol is the next in a series of protocols that were dedicated to formative evaluation, that emphasized the context of the unit, etc. In practice, the protocol was sometimes applied, let’s say, somewhat different than intended. Researchers, whether they are being evaluated or are evaluating, seem to have a reflex to think in rather simplified and quantitative measures for research quality. Also, societal relevance is a criterion that not all researchers feel comfortable with. And it seems some boards are only interested to know whether a research unit is excellent, or not. Whatever that means.

    The proof of the pudding is in the eating. This time around we organise training sessions (and publish a video). In the training we explain the intention of the protocol. But we have also scheduled ample time for the participants to discuss how to ensure that evaluation will proceed as intended. After all, some elements are not standard practice and for some aspects, including Open Science, there are no commonly agreed and consolidated definitions nor measures yet. Those involved in an evaluation, in whatever role, might stumble upon unchartered territory, for which there are no instructions in the protocol. Not even if the protocol were more extensive, say 53 or 86 pages.

    I know researchers in the Netherlands and the UK. Those in the Netherlands seem to look at a SEP evaluation with somewhat less fear and aversion, as compared to their UK colleague and the REF. But a perception of unwanted management interference is often present. I think that will always be the case. Also, researchers find the process time consuming. Writing a concise self-evaluation report, in a narrative style, that includes strategy and context is not an easy task. Was it Mark Twain, who apologized for writing a long letter, because he didn’t have time to write a short one? On the other hand, and with hindsight, researchers tend to value the discussions that took place throughout the process, both internally as well as with the assessment committee.

    In the Netherlands there is an ongoing debate regarding funding of universities, including in Parliament. This regards total volume (of course), the balance between block grant and project funding and how the basic funding / block grant is distributed between the universities. The distribution between unis is based on number of students and graduates and further based on historic calculations. Fluctuations are small, which has its pros and cons. Research output or citation metrics don’t play a role; neither do SEP evaluations. Nevertheless, I have encountered quite some researchers who think the government makes funding decisions based on SEP results.

    Boards of universities and research organisations can (not necessarily should) redistribute funding within their organization, based on the results of an evaluation. Two real examples, that might be counterintuitive. An excellent unit (according to the evaluation report) was closed down by the board, since the research didn’t fit the future strategy of the university. Another unit, that was assessed as not particularly good, received extra funding from the board; the unit was seen as important, its teaching was appreciated very much and the board wanted to ensure that the unit improved its research.

    Finally, yes, we still have a binary system in the Netherlands. The volume and orientation of research at the universities of applied sciences (i.e. the former British polytechnics and the German Fachhochschulen) differs from research at the universities. The universities of applied sciences have their own protocol. The two protocols are different, but not in the basic design: importance of the specific context of the unit, self-evaluation report, site visit by committee, a report that includes recommendations for the future. A formative approach, not summative.

    Prior to the publication of the current SEP, we published a “facts and figures” report on the evaluation system in the Netherlands: https://www.rathenau.nl/en/vitale-kennisecosystemen/twenty-years-research-evaluation

Leave a Reply