HESA, HESES and the missing students

There’s a lot of data about students out today, but none of it answers the key question of how many students are enrolled (as of early December 2020) at universities in the UK, and how many that might be expected to be enrolled are not.

As a special treat for you all I’ve mashed together UCAS, HESA, and OfS (HESES) data to try to start answering that question. I’m (alas) England only, and am pushing what the data can sensibly tell us to the absolute limit.

Every year most providers recruit the majority of their undergraduate full time students via UCAS – with acceptances starting on Higher or A level results day and moving right through clearing. But not all of these students end up enrolling, and not all of these enrolled students commence their studies, and even those who do might not hang about even for two weeks – none of these groups get returned on the HESA Student Record. Second and third year (and beyond) undergraduate students may not get there either.

And as I’ve been over before, not all of the remaining students stick around until the HESES deadline – which is all a bit of a mess. We may, therefore, never know how many students did not continue with their studies in 2020-21.

So why not just ask providers?

Column three of the HESES return invites providers to estimate student attrition against numerous categories for the full year – and a big chunk of Annex D of the HESES guidance explains how this works. Non-completion here is about more than just remaining in a provider – a student has to pass each module within 13 months of the start date to be described as “completing” the year.

Because HESES data is supplied in or around early December, providers need to make an estimate of non-continuation for the full year – based primarily on historic data (used at a fair level of detail to take account of individual courses, and mode and year of study). This time round the recommendation was not to make use of 2019-20 data in making predictions for 2020-21 because of the pandemic being “an exceptional circumstance” – who would have predicted the same pandemic would continue into 2020-21?

[Full screen]

Anyway, though this is fascinating in and of itself, it only applies to estimated attrition after the census date – so we’ll need to try something a little more ill-advised.

Wait

Three recent data releases may help us see the unseeable. We got the UCAS end of cycle data for 2020 last week, and now we have the first iteration of HESES data for 2020-21, and the remainder of the HESA student data for 2019-20.

The HESA data allows us to split by first year and other year of study, so looking at all undergraduates in their first year for each of the last two years – plus the undergraduate UCAS intake (all acceptances) for 2021 is the starting point to give us an approximation of the undergraduate full time population of each provider.

However, there are some groups of students who will cause us problems:

Part time students – UCAS doesn’t differentiate between full and part time acceptances, and part time students are less likely to apply via UCAS
Students on courses longer than three years – as far as I know there is no data on this available anywhere.
Students recruited outside of UCAS – for instance some international students, or students recruited directly (RPAs in UCAS language, which are sometimes but not reliably included in provider acceptance data).
Students who leave university later in the year, with numbers subtracted from the “fixed” student record later on.
Students who accept a place but don’t stay two weeks after commencement (or don’t enrol) – this is the group we are interested in.

So just adding the UCAS data is going to cause us problems. I’ve no way of controlling for that data individually, so I’ve taken an average of the percentage of UCAS students who show up as full time undergraduates in 2018-19 and 2019-20, and multiplied the 2020 UCAS cycle figure by this average to get a comparable figure for 2021.

What that looks like

I’ve plotted my 2021 approximation for each provider, so you can see I’m generating data that at least looks plausible.

[Full screen]

Because I’ve not introduced the HESES data to the model, this is available for most UK providers (where I have two years of HESA data and three years of UCAS data available).

If you’re interested in the average change between UCAS and HESA Student for the last two years of available data, I’ve plotted that here – bear in mind the caveats above. Again, this is UK wide, but I’ve removed some smaller providers with huge changes just to make the graph tidier.

[Full screen]

Enter HESES

But why would I do all that, when we have HESES? For those of you out of that particular loop, HESES data is returned around 50 days after the point of commencement by all English providers registered with OfS. One of the returns gives us data on all students commencing a year of study between August and the census day – the undergraduate split of which should be broadly comparable with the UCAS numbers (again, bearing in mind the caveat above).

My four exception groups can be identified as follows:

Students on courses longer than three years – the HESES figure will be higher for providers with large numbers of students on courses longer than 3 years.
Students recruited outside of UCAS – the HESES figure will be higher.
Students who accept a place but don’t stay two weeks after commencement (or don’t enrol) – this is the group we are interested in.

Here’s the plot:

[Full screen]

This is still frustratingly untidy – but we can see some fascinating artefacts. For instance, every single Russell Group provider has more HESES students than HESA/estimated students – a combination of higher than usual recruitment and more courses lasting more than three years. The trend is broadly similar for other pre-1992 providers.

For post-92 providers the picture shifts – we tend to see more HESA/estimated students than HESES students. Some of this may be lower than usual recruitment, some may be resits, but we may also be seeing early student attrition.

So what’s the point of all this?

If you’re thinking that I’ve spent too much time torturing data for little actual insight – you may well have a point. But, fundamentally, it should be easier than this to answer questions about whether students are so frustrated with this academic year that they are giving up their studies en masse. Reliable totals could help convince government of the need to step in and actually support the whole student body (tokenistic additions to hardship funds designed for individual emergency situations notwithstanding). Knowing details by provider could help target support where it was needed – or identify good practice that could be copied elsewhere.

Our data system isn’t set up to provide this kind of near-real-time data in normal times – but these are not normal times. Data Futures would have addressed this – but the future of Data Futures is far from certain, if the past is any guide.

Bonus data

HESES also includes data on student numbers by price group, so given their impact on the OfS ~~Teaching Grant~~ Strategic Priorities Grant I thought I should plot them for 2020-21. As a visual aid, green parts of the bar get OfS additional funding, red parts do not.

[Full screen]