Three important questions about TEF metrics

Those of us who work in strategy and planning are used to colleagues glazing over when we start getting technical with the numbers. It’s a perfectly normal response for the less-than-quantitatively minded.

It can be easy to defer to the data people and assume that the product of any number crunching is as pure as divine revelation. I exaggerate, but the point I’d like to make is that the numbers are always contested, and with so much riding on the TEF judgements, we must also ask questions of the metrics.

You can read elsewhere criticism of the TEF’s benchmarking process, and if you need a beginner’s guide you can enjoy Wonkhe’s guide to the metrics and flagging process. Looking behind the outcomes, I’ve got these questions for the TEF architects:

The TEF metrics are benchmarked based on HESA’s UK Performance Indicators (UKPI, not to be confused under any circumstances with Farage and co). On the HESA website, we read that only where a provider is three standard deviations from the mean, and three percentage points, should we consider it a sizeable difference. Yet TEF will ‘flag’ institutions which are two standard deviations from the mean and two percentage points. Why will TEF use the different measure, especially given that the UPKI has had a lot more time to develop?
The TEF metrics are all benchmarked in very different ways. In one case – employability – there is no regional benchmarking. This is curious given the recent DfE report which found that “region of domicile, social disadvantage (as measured by POLAR), disability, and type of degree obtained were statistically significant factors.” Clearly this will have a major negative impact on institutions where graduates choose to stay in regions with lower than average economic growth. Why not take the region into account in the employment benchmarks?
The TEF relies on averaging over three years’ data, but that means that the experience of students starting their studies six years ago counts as much as the most recent cohort for which data are available. Why has TEF chosen to average over three years rather than emphasise the most recent performance?

This is all just for the current version of TEF, so called ‘year two’. When we get to the proposal for a future iteration, where providers are judged at subject level, then it’s going to get even more complicated. This comes just when the sector is introducing a replacement for the JACS subject coding called HECoS in time for 2019. Understanding the data will only get more complicated, not less.

For there to be any confidence in TEF, there must be acceptance that the metrics, their benchmarks, and the weight attached to them give valid results. We have heard that the TEF assessment panel will operate with all the necessary contextual information, which gives some comfort. But what is in the metrics, what is benchmarked, and what is flagged are crucial.

TEF year two is still a trial, let’s hope these questions get an airing before future iterations of the exercise.