It’s easy to throw stones. Last week I wrote an article on the TEF consultation which was largely critical of the government’s proposals. That’s what my reading of the policy told me: medal-style ratings will do no good. Labels need to be meaningful, and if you can’t find meaningful labels then maybe that should tell you that your exercise is flawed.
But is there a ‘glass half full’ reading which I missed on the first look? I agree with the TEF’s overall aims: we should have excellent teaching in UK higher education. You can’t disagree with that. And I’m also of the opinion that the work of the Higher Education Academy to increase the status of teaching hasn’t pervaded far enough. Nor the HEFCE-funded Centres for Excellence in Teaching and Learning. Nor the National Student Survey or reviews from the Quality Assurance Agency.
The government is right to look at the Research Excellence Framework and say ‘that seems to have worked: we need to put some cash incentives for teaching or otherwise they won’t get their act together.’ So far, so logical. Encouraging teaching prizes, promoting teacher training, researching innovative teaching – while good in themselves – have not transformed the teaching experience in UK higher education. And where improvements have been made, universities have not always been great at conveying that message.
NSS does drive institutional behaviour, but very variably across the sector. Not everyone cares whether they’re up or down, and too many still argue ‘the survey’s meaningless’ in the face of evidence for its robustness. That said, we should also take seriously the complaints that measuring student satisfaction is a biased exercise, that satisfaction at course-level (perhaps not true at institution-level) can be affected by the gender and ethnicity of the teacher.
Constructive critique
How could TEF work better? First, it needs to define teaching. Its designers should either find a tool for measuring the quality of teaching, or redefine it as the ‘student experience excellence framework’. It may be that an effective ‘learning gain’ tool could quantify the outcome of the time studying, but doing so on a national level is going to be very tricky indeed. The other option is to have standardised curricula and examinations, which would be impossible to implement.
A SEEF could reasonably include the metrics proposed for TEF, particularly the ‘split metrics’ where performance is considered for different student characteristics. One way SEEF could be better than TEF is the presentation of the full scorecard of metrics without boiling them down into the three unwieldy categories of Gold, Silver and Bronze. This would better reflect the diversity of the sector, and create a narrative of ‘the provider does really well for these groups, in these measures, but needs a kick up the backside over here’.
Let’s imagine a table of minuses and plusses, a heatmap of the good and bad; it sounds very like the data any school governor would be familiar with. The Ofsted rating masks good and bad elements across a school. Most universities are much bigger, more complex to manage, and boiling down the overall teaching into a single category is utterly meaningless. Fundamentally, the three categories will not provide good quality information to prospective students.
When it comes to differential fee caps and the incentives for teaching, one way to do this is to run the exercise periodically (every year, or maybe every three years) for every provider. Make recommendations each year: tell the provider in which areas the data must be improved. The following time, allow them the fee uplift if they’ve made improvement in the stated areas, and not lost ground elsewhere. The incentive then is to keep improving from whatever baseline they’re at right now. That would keep them on their toes.
There should be excellent teaching. TEF could be better as an exercise reimagined as a SEEF. Dropping the categories of judgement would, in one fell swoop, take a huge amount of heat out of the debate and allow the real nuances of the sector to come through. Without the Ofsted-style levers of ‘special measures’ you need to keep the cash incentive, but there are ways of doing this without the categories. Complicated it would be, but the current reductive approach risks failing in one of its most basic aims: the meaningless categories may make prospective students’ decisions much more arbitrary rather than better-informed.
Fascinating that you interpret the Cheng & Marsh paper as evidence for the “robustness” of the NSS. They find that only “about 2.5%” of the variance in NSS scores can be attributed to universities. They conclude:
“In summary, we recommend that NSS ratings should only be used with appropriate caution for comparing universities. This caution applies to comparisons of ratings averaged across different universities and especially to comparisons of different courses—either different courses within the same university or the same course across universities. Any such comparisons should be qualified in relation to interpretations of probable error based on appropriate multilevel models. These necessary cautions in the interpretation of NSS ratings also call into question their usefulness for their intended purposes. Whilst it is premature to argue that the ratings are not useful in terms of providing appropriate feedback to students, employers, universities and the general public, the onus is on NSS advocates to demonstrate their construct validity in relation to ways in which they are actually used as well as ways they are intended to be used.”
It seems strange to summarise this as representing evidence that the NSS is robust.
Cheng and Marsh aren’t uncritical of the NSS, but they do find that it is robust. That’s a very distinct issue from whether it is helpful for comparing universities.
No, they find that it’s reliable. The article cites Cheng & Marsh as evidence that the NSS is not meaningless. This is a question about validity, not reliability or robustness. C&M find that ~90% of the variance in scores is student-level noise. And that only 2.5% is attributable to the University level. I think most people would interpret this as showing it’s not a very meaningful survey.
What you describe sounds a lot like the REF (especially a revised REF with some of Stern). It allows excellence to be celebrated wherever it is found and does not force a ranking. We argued for this in our green paper response but the idea fell on stony ground. So to mix metaphors properly, good idea, but the horse seems to have bolted already.
I’ve been saying for ages (as have far more illustrious commentators) that the TEF stumbled at the first hurdle by narrowing the lens to “teaching”, rather than looking more broadly at learning and the factors other than teaching that have a crucial impact on learning.
I find the idea of a SEEF intriguing: it could go beyond even learning to look at other important aspects of university that would help students to make more informed decisions on what matters to them – but we would be starting on the back foot due to a lack of data. I think more generally a “matrix-style” TEF allowing different providers to highlight their distinctiveness would be much more useful and accurate than the blanket Gold, Silver, Bronze proposals.
Nice piece Ant, got me thinking.
Thanks Kate, glad you like the piece!