The Office for Students has designed a new TEF

The Teaching Excellence and Student Outcomes Framework (TEF) is dead – long live the Teaching Excellence Framework (TEF).

Today, fifteen proposals, some of which – as we will see – stick more closely to the recommendations of the independent Pearce Review than others, outline in broad strokes the shape of the exercise that will return to the original TEF nomenclature (bizarrely considered to be “now well-known, including internationally”).

The story begins

Originally proposed in the 2015 Conservative manifesto, the first version of the Teaching Excellence Framework addressed provider performance against benchmarked indicators drawing on student experience (NSS) and outcomes (DLHE, continuation measures). The Bronze, Silver, and Gold awards were designed as a mechanism to inform applicant choice. There is no evidence to suggest that this has worked, but a statutory review of the OfS process chaired by former vice chancellor Shirley Pearce – hence the Pearce Review – identified a benefit to teaching quality enhancement for providers.

TEF has not run since 2019, and providers are not currently permitted to publicise previous awards – some of the underlying data now relates to graduating cohorts more than ten years old. For further information on the previous structure of the TEF as it was last run – there have been numerous changes since it was initially spelled out in the original white paper – see our guide to “the incredible machine”.

The shape of the TEF

We’re here looking at proposals for a quadrennial exercise, based on an equally weighted expert panel assessment, with Sheffield Hallam vice chancellor Chris Husbands returning as chair, of student experience and student outcomes at provider level – of which only half of the evidence is derived from the numeric indicators (to be based on the NSS and B3 output measures, with benchmarks). The next TEF will be carried out during 2022-23 with results announced in April and May 2023, ready for use in 2024-25 recruitment. Bronze, silver, and gold awards remain in the OfS proposals, with the addition of a “requires improvement” award, using the word “award” in the loosest sense.

Those numeric indicators would constitute four years of data on five scales (“agree” or “mostly agree” from teaching, assessment, support, learning resources, and student voice as long as the current survey exists) from NSS, and three measures (continuation, completion, and progression) from the B3 monitoring process. There will be disaggregation by mode (full-time, part-time, apprenticeship), with second-level splits covering level of study, subject of study (CaH level 2), student characteristics, year of entry or qualification, specific course types (with foundation year, higher technical qualification), and partnership arrangements.

OfS argues that, for TEF purposes, benchmarking should remain broadly as currently constituted (there’s a separate consultation on the use of data) though Associations Between Characteristics of Students (ABCS) will be used rather than age, disability, ethnicity, and sex for the B3 measures. Wonkhe readers will be pleased to learn that B3 (outcomes) benchmarks will also include compensation for geographies of employment, that error bars will be used in the presentation of statistical uncertainty, and that the original “flags” signifying notable performance above or below benchmark and calculated on standard deviation will be replaced with an assessment of “materiality”, equivalent to 2.5 percentage points above or below the benchmarks.

The requirement for registered providers in England with more than 500 FTE registered (not taught, as previously) undergraduate higher education students would remain, as does voluntary participation for others – though once you volunteer you are not permitted to withdraw from the exercise. Scottish, Welsh and Northern Irish universities are free to participate on a voluntary basis, should they wish to.

A win by submission

The provider submission, previously just a chance to caveat unpromising flag scores, would be greatly expanded and submitted alongside an optional student submission. Expect a template from OfS – there’d be a 20 page limit, and the hints in Annex C suggest a range of evidence without prescribing any particular indicators. In a new twist, OfS would be able to carry out validation checks on a random sample of submissions, based on provider support in supplying and indicating referenced content, data, and statements. Education gain makes a reappearance here, too – though OfS is not keen on developing a national measure, it encourages providers to submit their own definitions and accompanying evidence.

The independent student submission is also a new development (though pre-figured to an extent in the subject-level pilot) – previously student interests were deemed to be included in a provider statement and the use of NSS-derived indicators. In the ten page statement proposed evidence can be drawn from existing student representation processes, and OfS is particularly looking for evidence gathered directly from students, rather than commentary on the indicators. OfS is keen to take into account the “capacity and resources” available to student representatives, so there will be a lot of guidance and support – and the verification process will be less onerous.

Panel beating

Under these proposals the TEF panel would be configured as a subcommittee of the OfS Board. The regulator intends to appoint a number of academic and student panel members to conduct assessment via open recruitment. OfS would offer training and undergo a calibration exercise – and each proposal would be marked by “a small number” of panel members who would form a recommendation to be ratified by a larger group. This replaces the panel and assessor split familiar from previous TEF iterations, a larger panel would – apparently – reduce workload.

We also bid a fond farewell to the initial hypothesis, as formulaic judgements are no more (also making it impossible to pre-judge the TEF based on an early release of data or to compare panel decisions on awards with what the metrics suggest). There’s an emphasis on a balanced consideration of all sources – this is very much principles based rather than rules based in regulatory terms; the only stipulation is that outcomes and experience should be weighted equality in determining a final award. On that, there is a peculiar world of the interaction between aspects that serves to ensure that a provider (for example) with “bronze” in student experience should not receive higher than a “silver” overall, even if its student outcomes rating is “gold”.

There’s even one of those grids that the Pearce Review didn’t like (from Annex G of the publication):

Ratings would be conditional on continued adherence to relevant quality and standards baseline requirements, with OfS reserving the ability to suspend ratings if this requirement isn’t met. And the “provisional award” is gone – even where cohorts are small and the potential for statistical uncertainty is vast, it is reckoned that this will not affect providers that meet the requirements of entry (although entry is optional for newly registered providers that do not have at least one valid TEF indicator in the data).

To aid assessment panel members would be provided with data about the size and shape of provision alongside the indicators – showing student numbers, types of course and subject mix, and characteristics of students at the provider (personal characteristics, socio-economic status, entry qualifications) based on the most recent cohort.

Language games

The ratings themselves would be backed by a standardised language of “outstanding quality” and “very high quality” – so:

Gold is “typically outstanding”
Silver is “typically very high quality, with some outstanding features”
Bronze is “typically high quality, with some very high quality features”
Requires Improvement is where there are “no, or minimal, very high quality features”

Confusingly these are configured both as relative and absolute measures with – for example – “outstanding quality” representing the very best in the sector, and also performance materially above (2.5 percentage points) a benchmark for an indicator, or where the benchmark itself is above 95 per cent.

Statistical certainty has another row of interlocking descriptors, used when “considering how far a shaded bar is… in line with… a benchmark”:

Around 99 per cent statistical confidence would provide compelling evidence
Around 95 per cent statistical confidence would provide very strong evidence
Around 90 per cent statistical confidence would provided strong evidence
Around 80 per cent statistical confidence would provide probable evidence

There’s an opportunity for appeal based on the sharing of initial decisions on awards. However, providers would be publicly rated on each aspect (experience and outcomes) as well as an overall award. The awards would also end up on Discover Uni, and potentially UCAS, linking back to an OfS publication that includes the written panel statement, provider and student (where available) submission, and TEF indicators.

Alas, we wave a fond farewell to “TEF day” – awards will be published as they become available so it will be very apparent where providers have made representations to support an appeal (either by a “pending” mark or, less helpfully, a failure to publish).

What we could have run

Frankly, it’s better than it has been in the past, but still probably not as good as it could have been. It’s certainly better than the B3 proposals. Yet only the measured critique of the Pearce review and the need to “incentivise excellence above the quality baseline” saved us from another run of the current model. We can again thank Shirley Pearce for the line drawn through the idea of a data-only process, with a quotable dismissal from OfS:

We recognise that delivering an excellent student experience and student outcomes may by evidenced in ways that are not captured by the set of indicators we have proposed, and we are not aware of other indicators that could fully evidence our proposed list of features. For example, the proposed features include ‘educational gains’ which are not captured by any nationally comparable data.

Fascinatingly, an exercise based on inspections was on the table before being discarded because of an increase in burden on both providers and the OfS. We came closer, I think, than anyone realised, to Michelle Donelan’s dream of actually knowing something about the quality of teaching as it is currently being delivered – or mine of bringing back Subject Review. There was also brief consideration given to not running a HERA s25 scheme at all.

Perhaps more plausibly, the regulator did look at an absolute values version of TEF, which would fit more closely with other language about regulation. Three carefully worded paragraphs note the value of looking at absolute performance, before discounting the idea because it would:

not be in line with the purpose of the TEF to incentivise a provider to improve and to deliver excellence above our minimum baseline quality requirements, for its mix of students and courses.

The detail of these proposals is such that it is possible to forget that they are out for consultation. Everything is very much up for debate, and responses are due by 17 March 2022.