This article is more than 8 years old

by Simon Marginson

21/06/17

Performance management is here to stay, but TEF needs a rethink

TEF's proxy metrics are not sufficiently reliable for evaluating university performance, and may lead to perverse incentives, argues Simon Marginson.

This article is more than 8 years old

by Simon Marginson

Comment

21/06/17

wonkhe-gold-silver-bronze-tef — Image: Shutterstock

Simon Marginson

by David Morris

staff

20/06/17

Simon Marginson is Professor of Higher Education at the University of Oxford

Creating the virtuous circle

The key to achieving the best possible and least damaging performance management system, is to create a virtuous circle between real outcomes, performance measure and the resulting competitive position. Any ranking worth its salt must be grounded in real outcomes. Improvements in real performance, in a comparative sense, must lead to new behaviours that generate improvements in the relative position of the institution or discipline or the virtuous circle has not been established. There is no incentive to improve.

For example, in the case of measures of university reputation based on surveys of academic staff, as used in some ranking systems, the virtuous circle is absent. These rankings, which reward the established university names regardless of actual effort, are not worth the paper on which they are printed. In a ranking system like this universities have incentives to boost their marketing, not their research or teaching and learning.

Yet measures of research performance based on the volume of high citation papers produced by a university do tell us about real outcomes, and arguably, that kind of ranking has driven growth in the quantity and quality of research. Likewise the quality assessment in the REF fosters improvements in research, all else being equal, and this is a primary reason why UK universities achieve stellar research outcomes from a modest national outlay on research and development. Perhaps the principal problem with the REF impact measure is that it is unclear whether the evidence-base captures real impacts, rather than simulations of impact, and thus it is unclear whether the virtuous circle has been created.

The limitations of TEF’s proxies

In TEF 2 there is no virtuous circle. The bottom line is that there is no guarantee that positive effects on outcomes and behaviour will be generated, leading to better teaching and learning in UK higher education. This is because as designed, the indicators used in TEF 2 do not measure teaching and learning. They measure proxies, and there is no necessary relation between the proxy outcomes and the actual teaching and learning outcomes. TEF 2 is a competition game that despite the best intentions is decoupled from real classrooms and real learning achievement.

Graduate employment rates and salaries tell us about many things at once. They are governed by a mix of elements: the social backgrounds of students and their continuing impact on the career through social networking, the formal educational experience, the extra-curricular experience, the macro-economy and the local micro-economy, and so on. Any statistical attempt to separate the different causal elements, which continually affect each other, and isolate the singular ‘effects’ of teaching and learning, is ultimately artificial and assumption-driven.

This means that not only do outcomes related to graduate employment fail to identify educational effects accurately, comparisons of performance over time, or between different institutions or different disciplines, have no intrinsic merit either. This data has secured a hold on the public imagination based on the mantra that ‘education causes earnings’, but educationists know that this mantra is seriously misleading. Graduate outcomes are very important, and providers do affect those outcomes, but graduate returns data should not be hijacked for the purpose of false measures of what has been learned.

The use of student satisfaction data is also problematic. There is a long history of studies in social choice theory about the conditions under which inter-personal comparisons of welfare are valid and the NSS does not mean those conditions. Student satisfaction surveys are important in themselves, telling us something we need to know, it’s just that we cannot validly use them to measure something other than satisfaction. A key problem here is that by importing student satisfaction measures into the TEF problematic, with its financial drivers, we create an incentive for institutions, and their individual disciplines, to dumb down the academic demands of programmes so as to boost student survey outcomes. The American study by Richard Arum and Josipa Roksa, published as ‘Academically Adrift’, found that there was some evidence that this had happened following the longer use of student satisfaction surveys and student ratings of academic faulty in the United States.

In short, two of the principal measures to be used in TEF 2 will not lead to a necessary improvement in teaching and learning, and they are likely to have perverse effects. These are very serious weaknesses.

Making TEF better

What then should TEF 3 do? Inescapably, TEF 3 needs to be based on solid, well worked through measures of actual student learning outcomes, including value added measures. Proxies won’t work and there are no short cuts. Generic measures of academic learning can tell us relatively little because in higher education learning occurs through immersion in complex domains of knowledge.

Measuring learning outcomes within one discipline in one institution is straightforward. The challenge is to devise a system which enables comparability across disciplines and between different kinds of institution, operating in varying contexts. This is difficult but not impossible. In Europe, the collaborative Tuning project on learning outcomes is currently testing measures of learning across a range of discipline clusters, and in different kinds of institutions, in several countries. It is not beyond the whit of UK higher education to do something similar. What is needed is hard, patient graft over time to devise measures of teaching and learning that will work, with the minimum negative effects on either student performance or inter-institutional fairness, and well piloted before they are implemented.

fest Festival side

TFOHE25_Website_Column_1000x1680_Book@2x

View here

by Mark Leach

featured message

19/05/23

post list Latest articles

Shutterstock_2393375781 — Image: Shutterstock

Embed! How to save the civic agenda

by James Coe

Comment

14/07/25

wonkhe-foundation-building — Image: Shutterstock

How funding policy has affected foundation year provision

by David Kernohan

Analysis

13/07/25

Drug,Addiction,Concept,And,Substance,Dependence,As,A,Junkie,Symbol — Image: Shutterstock

It is high time higher education adopted a harm reduction approach to drug use among students

by David Hillier

Comment

11/07/25

178cambridgecorpuschristioncork — Image: Hugh Jones

Higher education postcard: Corpus Christi, Cambridge

by Hugh Jones

Comment

11/07/25

Podcast: Student experience, LLE, civic

by Team Wonkhe

Podcasts

10/07/25

Shutterstock_1728102091 — Image: Shutterstock

The LLE finally gets a Labour overhaul

by Michael Salmon

Policy Watch

10/07/25

wonkhe-swimming-pool — Image: Shutterstock

National Student Survey 2025

by David Kernohan

Data

9/07/25

Thermometer,Showing,High,Temperature,Is,On,Orange,Background.,Heat,,Measuring — Image: Shutterstock

How can students’ module feedback help prepare for success in NSS?

by Helena Lim

Comment

9/07/25

Colorful,Plasticine,Clay,On,Blue,Background. — Image: Shutterstock

Higher education leadership is at an inflection point – we must transform, or be transformed

by Amanda Broderick

Comment

9/07/25

Shutterstock_2159768735 — Image: Shutterstock

REF panels must reflect the diversity of the UK higher education sector

by Louise Bracken

Comment

8/07/25

4 responses to “Performance management is here to stay, but TEF needs a rethink”

Andrew Fisher says:

Jun 21 2017 at 10:20 am

This is interesting, but could you provide a link to the Tuning work on learning outcomes? Google got me to http://www.unideusto.org/tuningeu/teaching-learning-a-assessment.html, but there didn’t seem to be anything there that could be operationalised in TEF 3.

Reply
Stephen Longstaffe (@SteveLongstaffe) says:

Jun 21 2017 at 11:33 am

I don’t understand the comments on the NSS. Firstly, the US evidence is a red herring: it’s old, and comes from a very different environment (and in the above, student satisfaction is bundled in with Rate My Professor et al). The NSS is a national and research-informed exercise. Secondly, though there is a headline element of ‘satisfaction’ in the NSS, it captures a lot more than that.

I’d be interested to know which of these questions (from NSS 2017) the ‘educationists’ referenced think aren’t usable proxies for good teaching, and why a good performance on these questions indicates dumbing down:

Are staff are good at explaining things? Have they made the subject interesting?
Is the course intellectually stimulating, and has it challenged the student to achieve her best work?
Have the criteria used in marking been clear in advance?
Are marking and assessment fair?
Has feedback been timely, and have comments been helpful?
Has the student been able to contact staff when she needed to?
Has she received sufficient advice and guidance on the course, and good advice when making study choices?

Reply
Simon Marginson says:

Jun 21 2017 at 7:01 pm

In response to Stephen. None of these questions elicit data on what students have actually learned during their programmes. What they say they have learned, and whether they consider the learning experience ‘good’, ‘interesting’, ‘stimulating’, ‘challenging’, ‘clear’, ‘fair, ‘helpful’, ‘sufficient’, etc – you can pick any other slippery ambiguous value-laden signifier you like – remain trapped in the realm of subjective opinion. There is no necessary consistency between variations in subjective self-reporting and variations in objective learning.

You could ask a group of people to estimate the distance between the earth and the moon and take the average of the guesses as the truth. But it wouldn’t be very good astronomy. At least though that question would be about the material world. If you asked them how far it was on a Likert scale of 1-5, from ‘a very long way’ to ‘not so far at all’ that would tell you even less. Why do we have to run educational assessment using business research methods designed for customer marketing? Why can’t we design measures of actual learning outcomes, for comparative purposes? The OECD can do it for 15-year olds in three domains in the PISA programmes. It’s harder at higher education stage because of the specialised character of knowledge but it is not impossible.

Reply

Creating the virtuous circle

The limitations of TEF’s proxies

Making TEF better

Share

Share

fest Festival side

post list Latest articles

4 responses to “Performance management is here to stay, but TEF needs a rethink”

Leave a replyCancel reply