This article is more than 7 years old

TEF results – What the panel statements say, and don’t say

Unpicking the opaque process of the TEF panel, Ant Bagshaw turns to the data to find out why some institutions - particularly in the Russell Group, fared better than others.
This article is more than 7 years old

Ant Bagshaw is a Senior Advisor in L.E.K. Consulting’s Global Education Practice and co-editor, with Debbie McVitty, of Influencing Higher Education Policy

The TEF Panel, the final point of authority on the judgements made about universities’ and colleges’ ‘teaching excellence’, signed-off short statements which give some explanation for its conclusion about each provider’s medal rating. These statements form only one part of the story, and there is a significant risk that they’ll be all anyone reads about the outcomes of TEF.

For starters, the TEF panel’s outputs should – if you’re looking at this from the wonk’s perspective – be read alongside the ‘inputs’ to the panel judgements. That is the data sheets showing the positive and negative flags, and also the provider’s own submission. With those two inputs, in addition, it’s possible to see: first, what judgement the provider ‘should’ have got given the purported role of the flagging system in making judgements; second, what the provider said about the data to make its case for a higher rating.

As a refresher, this is the basis on which the ‘core metrics’ flags were supposed to guide the TEF assessors in making their initial hypotheses:

When looking at the delivery mode in which providers teach the most students [i.e. full-time OR part-time]:

  • A provider with three or more positive flags (either + or ++) and no negative flags (either – or – – ) should be considered initially as Gold.
  • A provider with two or more negative flags should be considered initially as Bronze, regardless of the number of positive flags. Given the focus of the TEF on excellence above the baseline, it would not be reasonable to assign an initial rating above Bronze to a provider that is below benchmark in two or more areas.
  • All other providers, including those with no flags at all, should be considered initially as Silver.

If you’re keen for more than the outline, take a look at p49 of the TEF guidance document.

While these instructions only apply to the ‘initial’ judgement, they stand as a pretty clear guide to how the TEF exercise was designed to use the benchmarked data to arrive at its judgements. Other than a couple of standard elements, the short judgement statements draw attention to whether the provider was notably above or below its benchmarks on any the metrics, basically providing a commentary on those flags. Each document then had a short bulleted list of points which appear to be drawn from the provider submission; these relate to features of the provider’s operations which aren’t captured in the data sheets so must have been submitted separately.

A tale of three cities

We took three examples, all from research-intensive universities, one from each of the categories of Gold, Silver and Bronze and compared them to the underlying data. Here are the relevant parts of the panel’s statements which show how similar they are in content, and how little of use they actually say.

University of Nottingham – Gold

The provider metrics supplemented by the submission indicate that students from all backgrounds achieve consistently outstanding outcomes. The metrics indicate that very high proportions of students continue with their studies, and then progress to highly skilled employment or further study. The metrics indicate high levels of student satisfaction with teaching and academic support. The metrics indicate students are satisfied with assessment and feedback, at levels below benchmark. The Panel deemed this to be addressed by the provider submission.

The Panel considered the submission in relation to the TEF criteria and its judgement reflects, in particular, additional evidence of:

  • high levels of contact time which are prescribed and monitored
  • students strategically engaged as change agents, and feedback on other aspects of teaching and learning are gathered and evaluated systematically
  • an embedded culture of personalised learning ensuring that all students are significantly challenged to achieve their full potential
  • exceptionally high student engagement with advanced technology-enhanced learning
  • all students receiving focused and discipline-specific careers support throughout their studies, confirmed by external indicators of employer esteem
  • an embedded institutional culture that encourages and rewards excellent teaching, by incorporating peer observation, extensive staff professional training and accreditation and promotion opportunities for outstanding teachers.

Indeed student satisfaction was well below the benchmark, at 5.9 standard deviations below expectation. This was a negative flag, and the university received a positive one for highly-skilled employment. In this case, the presumption that a negative flag would rule out Gold have been overturned by the panel, perhaps because the TEF guidance also steered the panel away from over-reliance on NSS scores. Nottingham’s written submission clearly provided the panel with enough to convince it that the area of poor performance was being addressed.

University of Bristol – Silver

The provider metrics, supplemented by the submission, indicate that most students achieve excellent outcomes. Retention of students is extremely high, and employment or highly skilled employment or further study are in line with what might be expected given the student profile and subject mix. There is a high level of student satisfaction with teaching, although satisfaction with, [sic] assessment and feedback and academic support, are notably below benchmark.

The panel considered the University submission in relation to the TEF criteria and its judgment reflects, in particular, evidence of:

  • world-leading research translated into education in which independent learning is encouraged. Students rate the intellectual stimulation of their courses very highly and evidence points to the academically rigorous and research-rich environment in which students study
  • inclusion in all programmes, a challenging final-year project that explicitly enables students’ independent learning through development of research skills and critical thinking. This can result in student publications, conference presentations and prizes and participation in visiting guest lectures and seminars
  • a well-embedded culture of valuing, recognising and rewarding academic staff involved in teaching and learning. Academic promotion structures include a teaching and scholarship pathway and staff are trained in the principles of curriculum design

  • an institutional internship scheme that promotes new interdisciplinary research and offers undergraduate students the opportunity to plan and undertake a summer research project

  • strategic investment in infrastructure that enables students to further their learning through high quality learning spaces and equipment, and through innovative learning technologies.

In Bristol’s case, it appears that is has overcome two horribly bad negative flags ‘notably below benchmark’ to pull it from Bronze to Silver. These were in the NSS-derived ‘assessment and feedback’ and ‘academic support’ categories. It’s also below benchmark in the employment measures, but not enough to warrant a flag. The narrative on positive features of the provider has features similar to Nottingham’s including the recognition of staff who teach.

University of Liverpool – Bronze

The provider metrics, supplemented by the submission, indicate that most students achieve good outcomes. Very high proportions of students continue with their studies and progress to employment or further study. Progression to highly skilled employment or further study is below benchmark for most student groups. Student satisfaction with teaching, academic support, and assessment and feedback is below the provider’s benchmark for some student groups. The Panel deemed that below benchmark performance was partially addressed by the provider submission.

The Panel considered the University submission in relation to the TEF criteria and its judgement reflects, in particular, evidence of:

  • course design and assessment practices, which provide sufficient stretch to ensure most students make progress with their studies
  • appropriate provision of opportunities for students to enhance their employability and develop skills and knowledge which are valued by employers
  • good investment in the physical estate and digital resources
  • a developing approach to research-engaged teaching, leading to opportunities for most students to participate in research activities within the institution’s research-intensive environment
  • the implementation of an institutional culture that facilitates, recognises and rewards excellent teaching.

In contrast to Nottingham and Bristol, Liverpool didn’t manage to overcome its two negative flags, perhaps because one was not in an NSS-derived category. Reading this, it seems perverse that an institution – in Bristol’s case – which was ‘notably’ below benchmark should receive a higher outcome than Liverpool for which the statement is softer. The narrative also recognises the institutional teaching culture.

Taking just these three examples, we can see how poor performance against the stated data measures can still result in an outcome better than Bronze. But for institutions with a similar data pattern to Bristol’s, such as Southampton (with two negative flags in the same categories, but which wasn’t upgraded to Silver) there could be some well-deserved anger. And if you look to Durham, with its one positive flag, and no negatives, it only has a Silver result when compared to Nottingham’s Gold. This could make for some uncomfortable talk around the Russell Group VCs’ table.

When you say nothing at all

The judgement statements leave out more than they say. By avoiding any detailed commentary on the data, it’s not possible to get a view on the extent of the deviation – positive or negative – of the provider from its benchmarked expectations. And what about the statements on performance for ‘some groups’? It’s important to know whether this is by gender, ethnicity or for widening participation groups. Who, exactly, is it that the institution is failing? We’ll see this is the data sheet, but it shows how the judgement statements cannot be seen as the final word.

When one of the often repeated aims of TEF is to inform students’ choices about where to study, the quality of the published information is of enormous importance. The headline categories are useless as they’re derived from bogus ‘teaching’. Then, alongside the headline judgements, there are only anodyne statements which don’t provide a level of detail which could actually be useful. Picking out a few bullet points – unverified beyond that which the provider has written about itself – also serves for nothing when it comes to trying to inform students. Contrast this approach with QAA report which make similar statements of identified ‘good practice’, but which do so by interrogating the underlying evidence. And, let’s not forget, which also show the areas where institutions need to improve and so aren’t just an uncritical list of positives.

Once again, TEF has failed by trying to capture the full and varied nuance of a whole institution into a single word category and a few bullet points of unvalidated text. This can be remedied by reading the statements the data on performance. But these will take some work (if it’s possible) to be understandable to a less wonkishly-inclined audience. While the TEF process has aimed for transparency, it has missed the mark. In doing so, it may have made things much more opaque than they needed to be.

For the most comprehensive coverage of TEF and the results follow Wonkhe’s #TEF tag.

9 responses to “TEF results – What the panel statements say, and don’t say

  1. Very interesting, as a Liverpool staff member, to see our statement compared with Bristol and Notts. In isolation, “most students achieve good outcomes” seems like a positive thing, but compared with the “excellent” and “outstanding” from the other two it seems less of an endorsement. I’d be interested to see what the panel class as “good”, “excellent” and “outstanding” outcomes – simply degree above/below benchmark, or more nuanced?

    Really helpful article Ant, thank you.

  2. Is Nottingham’s -5.9SD a typo?
    If not can we have some explanation on how they’re arriving at these figures?
    In a normal distribution roughly 75% lie between +/-1 and 95% between +/- 2, so -5.9 would be extreme to say the least!
    Looks like the numbers are going to be pretty opaque too.

  3. Please pretty please can someone do a table of the initial judgement (from the metrics) compared to the final result? I’m really interested to know which universities changed high direction (although my suspicion is that few of any will have gone down)

  4. Re: High/Low standard deviation numbers. HEFCE would explain that this therefore relates to a non-random effect ie there is something that the institution is doing/not doing that hasn’t been controlled for in the benchmarks.

  5. I’d like to know this too – Bristol’s Assessment and Feedback z-score is a remarkably hefty -14.8. I’ve got no idea how they’ve managed to come to that.

  6. They did not normalize the data before running Z-scores (because there would not have been enough differentiation between institutions given the bunching in the NSS data)

  7. I don’t think that makes sense. The methodology summaries aren’t clear, but my guess is that the SDs in questions are what statisticians usually call standard errors (the SD of parameter estimates) and the z scores are thus test statistics rather than what are conventionally thought of as z scores (dividing raw data by SD after subtracting the mean). I then suspect they sum z scores across categories to create an overall index.

    If I’m right then trying to interpret the scores as conventional z scores is a lost cause. It is just an index of deviation from benchmarks that tries to take into account sample size.

Leave a Reply