Will there be a safe landing for the subject-level TEF pilot?

As the first year of the TEF subject-level pilot draws to a close over the next few weeks, and the technical consultation reaches its deadline, it’s an opportune time to reflect on what has been learned over the past months from the work of the pilot institutions and the large team of panellists and the Office for Students’ staffers.

The July 2017 pilot specification introduced the sector to two prospective ways of delivering subject TEF ratings alongside an institutional rating:

Model A, which for institution-level closely follows the TEF2 (2017) and TEF3 (2018) approach, and then focuses subject assessment on provision where the metrics’ performance differ from that of the institution, operating “by exception”;
Model B, a potentially more holistic approach, looking across all subjects (combined into seven groups for the written submission), which then inform the institution-level rating, known as the “bottom up” approach.

Given the large-scale demand to take part in the pilot exercise, we decided to increase the numbers who could be involved from 40 to 50: 19 with Model A, 19 with Model B, and those 12 generous institutions who wanted to participate in both! We’ve been fortunate to work with colleagues from across a wide range of provision types: pilot institutions comprise 34 universities, 10 colleges and 6 alternative providers, across varying sizes, subject mixes and student demographics. One of the benefits of the fact that it is a pilot is that we have been able to work with these institutions closely to get their feedback, scrutinising issues at first hand and co-designing where possible.

Seven panels made up of subject experts carry out the assessment of subjects, overseen by a main panel. The subject panels cover medicine and health, engineering and technology, natural sciences, business and law, social sciences, humanities and arts. Each panel has a chair and deputy who is, or was recently, a student. They are all also members of the main panel, along with a number of others, including employer and professional, statutory or regulatory body (PSRB) representatives, and widening participation and employment experts. Some panelists also participated in TEF2, ensuring continuity of expertise. Student representatives are a fundamental part of the assessment process.

No small undertaking

In total, we have trained 141 panelists, representing the full breadth of the higher education sector, although not as many in this round are from the devolved nations or from colleges as we would have liked. This gives you some idea of the scale of the exercise needed to pilot subject-level models and test between them. Training – as in institution-level TEF – is intense and thorough, working through principles and practice of analysis, using judiciously invented examples and calibrating on real metrics data and written submissions

At each stage of training, calibration and the subject level exercise, we use feedback and analysis to understand what works, what doesn’t, what we need to improve and what might need to be changed or dropped. We have treated the pilot as the “real thing”, and, having sat through many sessions of subject panel and main panel discussion, I thank everyone for the quality of debate, seriousness of intent and the sheer hard work that has been put in

We have now completed all of the subject assessments, and today we start our final two-day meeting as a main panel to decide institutional ratings, review subject ratings and debate the pros and cons of each model. So we don’t have any definitive conclusions yet.

What are we learning?

The technical consultation may not, in all likelihood, raise questions of which we’re not already aware. But the responses will generate further insight and evidence that we will attend to closely. It will be no surprise that there are issues with both models that we are thinking hard about and that will need to be addressed, one way or another. Here are three we’re well aware of: first, when breaking down the metrics into 35 subjects, cohort sizes can be small. This raises challenges for assessment judgments, especially in extreme cases where the numbers are very low. We are exploring the options for addressing this challenge.

Second, while Model B focuses on rating all subjects in an institution’s portfolio, it is clear that the current format of the seven subject groupings poses challenges. For example, while it may reduce the writing load by asking institutions to describe its subjects in a summated way, it has sometimes limited what subjects can say about themselves, making it difficult to identify what happens in individual subjects. And we have heard that the format can increase writing effort, even if volume is reduced.

In this model, metric data are provided at the level of 35 individual subjects. We know from TEF2 that written submissions are well able to moderate metrics-based hypotheses. It’s critical during this exercise that the written judgments can continue to do this, and that holistic judgments are not captured by metrics. There is therefore a question whether metric and written submission data can be better balanced in Model B.

Thirdly, we are currently exploring how subject ratings under Model A are panning out. There has been a well-rehearsed view that if a panel moderates an institution’s rating away from the initial hypothesis – say from Silver to Gold – then subjects for which the initial hypothesis was the same as the provider, in this case Silver, would not be assessed but would all automatically receive a Gold rating. Conversely, subjects that were identified as exceptions – including some Gold – could well be moved to Silver. This is one of the credibility factors with Model A that we are examining by assessing a number of extra “non-exception” subjects – we will be able to compare how panels rate such subjects with the rating they would have received under current model A rules.

There are other matters on our agenda in evaluating subject-level methodologies, such as teaching intensity metrics, ensuring that conclusions are methodologically and statistically sound, how to handle interdisciplinary and multi-programme courses, and the role and relative weight of PSRB accreditation. And, of course, one of the main purposes of TEF: how best to communicate to students what the ratings of individual subjects actually mean, ensuring that they are at a sufficiently granular level to be helpful, how they might be used to benefit and inform choice, and where they need to be treated with caution.

Over the horizon

At this point, as we draw to the end of the first year of the subject level pilot, the results of the technical consultation will soon be known. The recommendations of the independent review of TEF as a whole will be made in due course.

At its best, through peer review, TEF will drive enhancement and innovation, setting student engagement at its heart, recognising that education is a joint process of discovery, in which students are supported to become effective, motivated and resilient learners. The second year of the pilot will be used to refine how the TEF could operate at subject level, working with the HE community further to recognise and enhance excellence across UK higher education.