The TEF is a statistical wonderland

It is now accepted sector wisdom that the Teaching Excellence Framework is neither a measure of teaching nor a measure of excellence. The designers know that and don’t want to keep hearing it said.

Now that universities have the positive and negative indicators and flags, the TEF evidentially does not pass the face validity test either. Excellence must be centred on absolute measures of success and not simply on being above par.

In Alice Through the Looking Glass we read that Alice faced with the Caucus race “thought the whole thing very absurd, but they all looked so grave that she did not dare to laugh.”

The problem is serious because potential students – and the wider public – are about to be misinformed. We, across universities, have collectively allowed ourselves to reach this point. We know the TEF is not a measure of teaching: it measures student satisfaction, student employment outcomes and non-continuation rates. None of these relate to the quality what students hear and learn in lectures or seminars.

Nor is TEF, as currently designed, a measure of excellence because, counter-intuitively, the designers have produced a system which will reward institutions with lower absolute performance. One provider will get a positive flag for a high student drop-out rates versus another, without that positive flag, for better performance. To understand how this statistical wonderland is possible you have to step – with Alice – through the TEF Looking Glass.

The perils of benchmarks

The metrics underpinning the TEF decision-making focus on significant differences between the actual values and the expected benchmark values for institutions based on their student profiles. High entry tariff institutions with low non-continuation rates have been benchmarked against other high tariff institutions and so do not reach Gold, despite having lower drop-out rates than the lower tariff institution down the road which does. The statisticians are positively and negatively flagging institutions which are performing better (or worse) than they might be expected to perform based on the quality of their student intakes.

We can also see that the larger the university the harder it is to create a significant difference from a benchmark it is also contributing towards. Larger institutions are chasing their own tails, and the data show that they are less likely to get flags. Not everything is benchmarked though. It is well known that NSS scores are typically lower in London, meaning London-based HEIs will find it harder to achieve positive significance flags. Numbers are not neutral, and this way of doing the numbers handicaps larger, high tariff universities and particularly those in London before the Caucus-race starts.

As Mark Twain said the ‘figures often beguile me.’

We might reassure ourselves that this is a pilot, and such design mistakes will be made. But the whistle has blown and the race is on, all will get their prizes. The TEF results will be public and the brand logos are designed ready to be put on prospectuses. The public will not understand that the outcomes are benchmarked on anything other than absolute performance. Gold surely means Gold. The subtlety that the awards are not for doing better, but for doing better than expected, all other things being equal, will be missed in the public reception of the outcomes.

The result when the race is finished? Several smaller, lower tariff institutions will win the best prizes. Some larger, global institutions with high reputations and better absolute scores on the underlying measures will come out relatively badly. This will allow some to argue that the TEF is evidence that excellence and reputation are not aligned. This matters not because of sector special pleading, but because applicants will be misled, and the TEF will produce results which will not only damage the credibility of the exercise itself, but the credibility of UK higher education as a whole.

A plea to the panels

This situation is avoidable, as Professor Chris Husbands’ TEF panel’s considerations lie ahead. The applications are not yet in, and the problems with the statistics being controlled for tariff can be resolved. We have the absolute performance on each TEF measure, and these can be ranked, and the TEF assessors and panel can be asked to place more weight on absolute scores. Removing the beguiling complexity is short work. It is important that the TEF, even in its pilot year, should avoid distorting the market and misleading students.

If the TEF is here to stay as some proxy for educational quality, and with a link to fees, then universities will probably have to participate, and future students will be guided by it. The process needs to have face validity. Gold needs to mean ‘best’, and not just ‘better than expected’. The over control by benchmarks needs simplification in this pilot year. This means those with vested interests in the current design need to listen to well-founded criticism. Like Alice, they need to wake up, accept the numbers are adrift from reality, and allow more than just tweaking.