Here to stay: the politics behind TEF wonkery

It was always going to be this way. As soon as Jo Johnson confirmed his wish to press ahead with a Teaching Excellence Framework, the precise definition, measurement, and evaluation of each of those terms – ‘teaching’, ‘excellence’, and even ‘framework’ – were always going to be picked apart again and again.

The undue pressure put on BIS, DfE and officials to put the whole thing together in barely more than a year only intensified that likelihood. And it has just-so-happened that while we await the outcomes of TEF year two, in reality, the first ‘proper’ iteration of the exercise, the House of Lords has been taking care to have its own say on the government’s work, as the Higher Education and Research Bill makes its way through the legislative process.

What has that debate looked like? Well, it’s been an odd cocktail; a complex tale of wonky statistical debate, a lack of clarity on the government’s goals and objectives, a healthy dose of misunderstanding and fear-mongering, and at times some arguably deliberate obfuscation and outright defence of vested interests.

Jo Johnson and the TEF assessment panel’s chair Chris Husbands have gone to great lengths to offer reassurances and rebuttals to TEF’s critics, particularly regarding the uses and abuses of certain metrics. How they have done so is significant, as the government has promised a ‘lessons learned’ exercise after the results are released in May but before applications open for TEF year three at the end of the calendar year. There is much in the detail of the TEF that is still up for grabs, and it’s, therefore, important to find some clarity over the most contentious matters.

The politics of student feedback

The National Student Survey has particularly taken a battering in many of the recent debates on TEF. It has been referenced on multiple occasions in the Lords, been called “garbage” by one noble peer, and has been subject to a boycott by a small number of students’ unions. Husbands himself has verged on disowning the survey’s inclusion in the exercise at all, calling it the “least important” of the three main metric sources.

For all the recent criticism, one could be forgiven for thinking that the NSS was a lazily conducted, ‘post-facts’ exercise, with little statistical input and dodgy oversight that does not measure anything relevant to a quality learning or student experience. Lord Lipsey has even called for a public inquiry into the use of the survey in one of the Lords’ debates.

Yet HEFCE is nothing if not rigorous in how it deploys the survey. Indeed, it has only just completed a full review of the survey that took over two years to complete. This included multiple rounds of cognitive testing of old and new questions, incorporation new of questions from the widely respected North American National Survey of Student Engagement (NSSE), and an independent academic literature review running over seventy pages long.

Dealing with each and every critique levelled at the NSS is a whole other article in and of itself: suffice to say there are some limitations to the survey. Like all other forms of data, it is also abused, misrepresented, and sometimes even manipulated. But the underlying methodological grounds have been shown by the above literature review to be sound, particularly in regards to the questions from the survey that will actually be used in the TEF. These are currently questions 1-12, concerning teaching, assessment and feedback, and academic support. This was the review’s overall conclusion:

The NSS has shown itself to be a resilient instrument that has had a positive impact on enhancing students’ experiences of HE and has proved useful in quality assurance and for assisting in student choice. It derives from a tradition of research that has provided consistent evidence of relations between students’ reported experiences and the quality of their learning.

Lipsey and other noble Lords have frequently cited reports by the Office of National Statistics and the Royal Statistical Society that ask questions about the survey’s statistical robustness; that is, the appropriateness of comparing the range of results. However, this may not be as relevant as first thought. Ed Humpherson of the UK Statistics Authority has considered the matter in a recent letter to DfE:

“We also recognise that a number of the points raised by the ONS and the RSS refer to the overall satisfaction question (Q22) of the NSS and that, in recognition of this, the TEF is making no use of this question, instead using questions which relate more directly to teaching and learning.”

The UKSA’s argument also serves as a rebuke to criticisms that TEF measures ‘student satisfaction’. This is further underlined by the HEFCE review:

The NSS is not fundamentally a satisfaction survey. In fact, the single question in the original NSS that mentions satisfaction (Q22) was only included initially to validate the instrument as a measure of the quality of the student experience.

Interestingly, some of TEF’s critics have been all too happy to cite the national overall satisfaction score from NSS (87%) as evidence that nothing is wrong with universities’ teaching and they should simply be left alone. It will be far more interesting to see how well the sector does on the new questions measuring student engagement, which are a much more widely recognised measure of a quality learning environment and could be a significant upgrade on the uses of the survey in future iterations of TEF.

Benchmarking: undermining the hierarchy

After NSS, the TEF’s use of statistical benchmarking was also roundly derided by some members of the House of Lords. Times Higher Education’s ‘mock TEF’ exercise released last summer showed why benchmarking could be a game-changer in creating a ‘new hierarchy’ that overturns the conventional wisdom of the usual league tables.

The critique of benchmarking might prove more valid than those of the NSS. Lord Lipsey pointed out that the government has promised an independent review of HESA’s benchmarking methodology, but that this has not yet been commissioned. Yet HESA’s benchmarking of institutional data for informal performance indicators is nothing new and has been accepted as a helpful guide to universities’ relative performance for several years. Nonetheless, given that the stakes in TEF are much higher, an independent review does not seem like an unreasonable request. As Jackie Njoroge wrote, there are some legitimate methodological questions to be asked about the finer points of TEF’s use of benchmarks.

The real critique of benchmarking is more political than statistical. As the Duke of Wellington complained in the Lords debate, the current benchmarking method will mean that some Russell Group universities will be “graded as second or third-class universities… despite their well-deserved reputation in this country and abroad”. Lord Lipsey praised the government’s “so-called LSE amendment” made when it was pointed out that NSS suggested that LSE and a number of other prestigious institutions were rated low by students, and it seems clear that there is significant political pressure from some parts of the sector to ensure the ‘right’ outcomes. I wonder what they will look like…

Fair outcomes?

It is particularly interesting that the NSS’s inclusion in TEF has got more of a kicking in the Lords and elsewhere than the inclusion of employment outcomes (and the likely inclusion of LEO salary outcomes in future iterations). For all the fears about proxy measures and real indicators of quality teaching, employment outcomes seem a far more objectionable inclusion in the exercise than student feedback.

The government has made clear it intends to include LEO salary data in future iterations of TEF. This should be the next big battle over TEF metrics. As the IFS’s Jack Britton has said: “For the purposes of really understanding the value added by subjects and institutions the raw data is not very helpful since it does not allow us to compare like with like.” LEO has thus far shown us that prior attainment and socio-economic background are far bigger predictors of employment outcomes than any factor under universities’ control.

Why has student feedback metrics come in for more criticism than employment metrics? A cynic might say that this is because the latter is much more likely to be linked to research prestige and performance on existing university rankings. There are several Russell Group institutions that perform far better on employment metrics than they do on student feedback. Again, there is a vested interest from the established high fliers in keeping one set of these metrics in the TEF rather than the other, and this needs to be separated out from an honest debate about the relative merits of either metric.

What does TEF want to achieve?

You might think that this is all rather cynical. The debate on TEF is clearly being influenced by various parties’ expectations of what the results will look like. Yet behind the manoeuvrings, there are three more fundamental and interrelated questions about what TEF aspires to achieve, and thus how its metrics should be used:

Should TEF measure institutions’ value added or absolute outcomes?
Should TEF be primarily a tool for student choice or an effort to improve performance?
Should TEF measure universities’ performance against each other or against an agreed external benchmark?

Benchmarking, in particular, has been incorporated into TEF in an attempt to ensure it measures ‘value added’ performance, balancing the results for institutions’ student entry profile. If the aim of the exercise is to improve and compare universities’ performance, then this seems entirely reasonable, particularly when it comes to employment outcome metrics, where we know that prior attainment and background the prime determinants. But as Tony Strike has argued, this might create confusion for applicants.

Let’s take a hypothetical example.

A post-92 medium tariff university in the Midlands has been very successful in recent years at achieving top level NSS scores, reducing its dropout rates, and in securing relatively good employment outcomes for its (relatively less privileged) students compared to its main medium tariff competitors. It is set to get a strong Gold rating in this year’s TEF due to the effect of benchmarking, and arguably, quite right too. The university has clearly invested a great deal in its student engagement and facilities, and the TEF has supposedly been designed to recognise this.

By contrast, a very high tariff and very research-intensive university in London has recently struggled to improve its NSS scores and is starting to lag behind. It still achieves very good employment outcomes for its students and has a very low dropout rate because of its relatively privileged student population, though it performs less well regarding ensuring equal outcomes for the small number of its students from less privileged backgrounds. When the ‘split’ metrics and benchmarking effect are all taken into account, this university will score Bronze or a very low Silver in TEF. Again, this matches the government’s supposed objectives. This university has almost entirely devoted its strategic attention to maximising research outcomes in recent years, particularly in the REF, in which it is a very big player, but perhaps at the expense of a focus on student experience, hence the poor NSS results.

In theory, then, TEF will have achieved its goal here, which is to applaud the first of our examples and to encourage the second to up its game. But how does that look from the perspective of the applicant? A student with good enough grades to get into either of these universities is left with a difficult choice. They will probably receive better teaching and support, have better-planned courses, and an all round better student experience at our Gold university in the Midlands. But they are still more likely to have better employment outcomes – and in pure human capital terms – make the ‘best’ choice – by going to the Bronze university in London.

Quite how much TEF will disrupt the market in this way is unclear. I find myself agreeing with Paul Blackmore, who argues that long-held prestige – primarily based on research outcomes – will probably still triumph TEF outcomes in the short-term. Jonathan Simons, former Head of Education at Policy Exchange, has pointed out that similar attempts to influence parental choice through metrics in the school sector have failed to break the supremacy of 5 A*-C performance and Ofsted ratings. It’s hard to fall once you’ve been at the top for a long time.

Beyond that, if our second university is incentivised by TEF to up its game and reinvest in student experience and teaching, what happens if it squeezes our first university out of the top 20%? So who under the current system will be allocated Gold? Is it fair, if our first university is still performing to a very high standard? Michael Barber’s recent appearance before the Education Select Committee touched on this point. The incoming OfS chair implied (perhaps unintentionally) that he might prefer a system where TEF outcomes are evaluated against an agreed stand-alone benchmark, so ensuring that a rather higher percentage of institutions can achieve the top grades, rather than the current relative benchmark where only a selective percentage can see each outcome under the TEF’s flagging system.

Please sir, I want some more

The TEF’s ‘incredible machine’ and the public debate around it could warrant a full book’s worth of analysis. There are further questions to be considered about the dangers of perverse incentives, Goodhart’s Law, the possible future development of learning gain measures, and the complexities of the ‘split’ metrics system. Watch this space.

At our upcoming event, The Incredible Machine, we will be running an interactive workshop where we ask attendees to ‘Accentuate the PosiTEF’ and think about how TEF can better measure, communicate and ensure excellent teaching. Acknowledging the current exercise’s methodological flaws and contradictions will have to be balanced with the need to recognise competing and contradictory interests within the sector in securing particular outcomes. Part of the challenge for the sector is that it will come under greater scrutiny the more its outcomes challenge entrenched and anachronistic ideas about what a ‘good’ university or ‘good’ teaching looks like, but there is a strong argument to be made that this is precisely what it needs to do.

That will not be easy, but the long-term shape of the TEF is far from certain, and that’s before we even get to the matters of a subject-level or postgraduate TEF. TEF wonkery is fast growing into a new subsection of the sector’s policy thinking. At the present rate of development, it will be with us for some time. As we’ve said before, now is not a time to be afraid of the detail.

Join Team Wonkhe and a host of expert speakers on June 8th in London to explore TEF and the future of teaching excellence. Reflect on the ups and downs of TEF’s journey so far, and look ahead to the next steps for quality and excellence in UK higher education. Sign up for The Incredible Machine: What next for TEF?

One response to “Here to stay: the politics behind TEF wonkery”

AndyR says:

Mar 28 2017 at 10:48 pm

If the TEF *does* disrupt the market it will also create changes in student profiles at the kind of Gold institution in the hypothetical example which could, over the 3 year duration of a TEF award, seriously alter an institution’s benchmarks.

Which underscores some of the other challenges with student choice: will a Gold institution be the best choice for a student, or the best choice for a student who is representative of the type of student who currently attends the Gold institution? Providers develop courses and teach for their student body. If that body changes, they will need to adapt.

Here to stay: the politics behind TEF wonkery

David Morris