This article is more than 8 years old

by John MacInnes

9/07/18

Subject TEF and the question of numbers

Chartered statistician John MacInnes introduces us to some of the major statistical problems presented by the design of subject TEF.

This article is more than 8 years old

by John MacInnes

Analysis

9/07/18

wonkhe-random-groupings — Image: Shutterstock

John MacInnes

Professor of Sociology

by David Kernohan

staff

6/07/18

John MacInnes is Professor of Sociology at the University of Edinburgh and a Chartered Statistician.

A short but necessary lesson in statistical logic

Random variation plagues the measurement of small groups, and is compounded when we want to make comparisons between these groups. This is because our real interest isn’t the groups themselves, but what they might tell us about future similar groups. Thus, if I evaluate a hospital unit by counting patient outcomes, my interest is in predicting the prospects of future patients. The larger the number of past patients whose experience of facing similar conditions I can measure, the more likely it is that they will be representative of future patients. Indeed, this number is the only guide I have, since I cannot ever know just what ‘representativeness’ might comprise.

This is where the paradox of randomness strikes. If I can measure 1,000 people, I can make estimates with a margin of error of only a few percentage points. The ‘noise’ of variation is small compared to the strength of any ‘signal’ about their experience. Indeed almost everything we know about modern economy and society comes from this logic, perfected in Britain a century ago.

Conversely, measurements of small numbers of people, no matter how carefully made, and no matter how ‘typical’ we might hope these people to be, tell us disappointingly little, because random variation swamps any signal. With 100 people my margin of error rises to around +/- 10 percentage points. With 15 people the noise becomes as strong as the signal. With fewer, noise is about all we get.

Why this matters for the subject level TEF

UK Universities currently offer about 37,000 undergraduate degree programmes. Half a million students enter annually, giving a mean course cohort of 15 students.

Thus the Subject level TEF proposes to measure 35 CAH2 ‘subject instances’ rather than degree courses. This raises the average size of units to about 115 students, and evades the problem that universities divide disciplinary boundaries in diverse ways.

Four key problems

Unfortunately this will not fix the problem, for four reasons. First, we know that for many of the NSS metrics on which TEF relies, the variance of student opinion and behaviour within the same department and provider is large compared to that between departments and providers. Analysing early NSS satisfaction data Marsh and Cheung (2008) found that most of the variance in the results occurs at individual level, with only a little over 10% being attributable to some combination of provider and department. This increases the size of the unit we need to distinguish underlying unit performance.

Second, ‘subject instances’ will vary in size. Many will still be smaller than any threshold needed to find any signal amongst the noise. But if the aim of the TEF is to inform stakeholders, there will be a perverse incentive to report some signal, lest providers disappear from league tables or other commentaries. Thus a highly likely unintended consequence of the subject level TEF will be the rolling up of specialist departments or niche degrees into larger ‘one size fits all’ programmes.

Third, random variation also hobbles the business of making comparisons between units. The larger the number of potential comparisons, the greater is the risk that any individual one is merely the result of random variation. With a couple of hundred universities this can be managed, although even here it is hard to do more than identify a handful of providers that do better or worse than the average. However this has not stopped the widespread abuse of such metrics to construct league tables or ‘identify’ failing units by assuming the numbers to be far more reliable than they in fact are.

But what comparisons might students make when choosing between several thousand degree courses? We do not know. Although statistical techniques exist to mitigate this problem, it is difficult to envisage any system that could avoid conflating random variation with real differences on the one hand, but do more than distinguish the brilliant from the abysmal on the other.

Finally, subject instances are a queer fish. The DfE consultation document asserts that not only the 35 CAH2 subjects but also the seven ‘broad’ subject groupings into which they will be sorted ‘are likely to have similar teaching practices, teaching quality and student outcomes.’ No evidence is offered to support this questionable claim. Is teaching in Computer Science and Civil Engineering similar, or Maths and Agriculture, or Architecture and Politics or Archaeology and French language? The unit that is of interest to most students is the degree course, or the subject area or department teaching it. This is also typically the lowest unit through which university governance and compliance processes operate and for good reason: the demands of teaching organisation, delivery and assessment are typically subject specific.

What next?

In order to judge the potential viability of the Subject level TEF, DfE ought therefore to supply some basic information, including:

the distribution of subject instance sizes
NSS and DLHE metric variance between and within subject instances
the associated standard errors and their means of calculation
the mitigation strategy for dealing with multiple comparisons.

The Office for National Statistics asked for an independent review of benchmarking. None has yet taken place. We also need an account of benchmarking that can be understood by stakeholders. Without it, results based upon it are likely to be abused by appraisers in the same way as previous performance indicators.

The Scylla and Charybdis of the subject level TEF is that the aggregation of students into groups large enough to make meaningful statistical analysis possible debases the validity of the analysis by treating disparate groups of students with a variety of educational experiences, studying different subjects, located in disparate units of university governance, as if they were in fact homogeneous. Randomness is not something the Office for Students or the Department for Education can change. Without a robust account of how they intend to deal with it, the prospects for a viable subject level TEF look poor.

making sense SLOLS 26 side green

View here

by Mark Leach

featured message

18/01/24

post list Latest articles

Image of origami birds transforming — Image: Shutterstock

Stop measuring tomorrow’s universities with yesterday’s metrics

by Donna Whitehead

Comment

28/07/26

Shutterstock_2772775069 — Image: Shutterstock

Reform’s moves in local government can serve as a guide to the party’s approach to HE

by Alex Waddington

Comment

28/07/26

Shutterstock_2503867781 — Image: Shutterstock

For innovation, purpose matters more than machinery

by James Coe

Comment

27/07/26

Shutterstock_2477802089 — Image: Shutterstock

How to work with the department formerly known as DSIT

by Abigail Morris

Comment

27/07/26

shutterstock_2576451539-1 — Image: Shutterstock

Northern Ireland’s cheap degrees come with hidden price tags

by Jim Dickinson

Comment

24/07/26

222vaskathecatoncork (1) — Image: Hugh Jones

Higher education postcard: is it the summer yet?

by Hugh Jones

Comment

24/07/26

wonkhe-exam-desk-empty — Image: Shutterstock

Admission testing needs proper oversight

by Beth Hocking

Analysis

23/07/26

Shrinking population — Image: Shutterstock

The Post-18 Project: No country for young men

by David Kernohan

The Post-18 Project

22/07/26

shutterstock_2742382911-3 — Image: Shutterstock

Survey fatigue is the symptom of a wider student feedback fragmentation problem

by Emily Chapman

Comment

22/07/26

Magnifying glass missing piece — Image: Shutterstock

Why structural racism sits outside the vision of the Office for Students

by Shaalinie Sivalingham

Comment

21/07/26

3 Comments

Oldest

Newest

Inline Feedbacks

View all comments

Andrew Fisher

8 years ago

‘With a couple of hundred universities this can be managed, although even here it is hard to do more than identify a handful of providers that do better or worse than the average.’

This is the nub, isn’t it. For years the published data have shown that a small group of institutions consistently do worse than the average; but rather than acting on those data we have chosen to prioritise finding a way to rank the great majority which are about average.

Bradbury Smith

8 years ago

Very informative post, thanks John.

lizmorrish

8 years ago

Excellent assessment. DofE and OfS now need to respond to this. My guess is they won’t, because they can’t without losing face.

A short but necessary lesson in statistical logic

Why this matters for the subject level TEF

Four key problems

What next?

Share

Share

making sense SLOLS 26 side green

post list Latest articles