Supporting student choices with subject-level data

Back in March 2018 Sam Gymiah reannounced the subject-level TEF. Alongside this, he launched an Open Data competition – allowing developers to use selected government data on universities to create apps to help prospective students decide where to apply.

“This competition will build on the government’s recently published Longitudinal Education Outcomes (LEO) dataset, which gives information on employment and salaries after graduation. By democratising access to information about courses and their outcomes, it will help all applicants, regardless of their background, make better decisions and get better value for money.”

We reported this as “Sam calls app Britain” in that week’s Monday Morning Briefing – a nod to a pivotal scene in Series 4 of The Thick of It. The competition itself appears to have gone quiet – there’s been no details of specific data releases, competition terms or – indeed – prizes.

But let’s not dismiss the idea out of hand – is there value in letting students get a better sense of what data exists?

What students actually want

Since the “launch” of the competition we’ve seen a substantial new report from the OfS, with some bearing on the issue of information for prospective students. Writing for Wonkhe, authors Mary Leishman and Joe Cox, take time to highlight the “hidden” costs of study: from textbooks to travel there is little reliable information about costs that have a real impact on the experiences of students.

What does exist can be found in general terms within the Student Income and Expenditure Survey (SIES) – with the 2014/15 iteration released this year following a sustained campaign by wonks across the sector. A survey sample methodology isn’t going to get us far in looking at institution-specific costs – but tucked within the voluminous (and praiseworthy) tables we can see costs crosstabulated with subject area.

To the spreadsheets!

We also have a lot of other data mapped to a similar set of high level subject areas, for full time UK-domiciled undergraduates where such segmentation is possible alongside the SIES data. I used:

  • Applications and acceptances from the 2017 UCAS cycle, expressed as a proportion of all applications and all acceptances.
  • Top scores for overall satisfaction (Q27) from NSS 2017.
  • LEO data for the most recent available tax year for salaries 1, 3, 5 and 10 years since graduation.
  • And data from Hotcourses, demonstrating interest from UK prospective students over an 18 month period expressed as a proportion of all interest.

From SIES 2014/15 I’ve taken information:

  • On annual median student support income (loans and grants) – note that income via NHS bursaries is separated out as “other support income”.
  • On annual median work income – for (full time) students working alongside studies
  • On annual median total income – which includes the above plus parental contributions and other income.
  • On annual median living costs – things like food, clothes, travel, and entertainment.
  • On annual median non-fee participation costs – things like books, computers, equipment, and field trips.

Points of interest

What if the real benefit is the things we learned about sector data on the way?

Certainly, one big standout for me was the variation in subject data practice. Though all of these sources are nominally mapped to JACS codes, in practice different aggregations are used for a variety of reasons. Sometimes, a particular sub-cohort skewed a wider sample (e.g. Economics as against wider Social Sciences for salary data) so needed to be presented separately.

Sometimes sufficient data was not available from the overall population safely to present all groups (Technology is generally lumped in with Engineering, and Languages tend to be combined into a single group). SIES, in particular, uses survey responses from a representative sample, so needs to group multiple subject groups together.

In all honesty, the data I have used isn’t brilliant – and the various back-end kludges mean that it should only ever be seen as indicative. It relates to different cohorts and different aggregations – but it remains the best data available.

One thing I was pleased to see was how closely Hotcourses (interest) data is tracked by UCAS applications and acceptances – it’s more exciting through being not perfect, and there’s a tab on the visualisation letting you see this in more detail. There’s an interesting (probably qualitative) piece of work to be done on the relationship between initial prospective student subject interest and eventual application.

Where’s my digital dividend?

Fundamentally, I don’t feel like the app competition is a goer. Data is – necessarily – caveated on release to account for areas where student numbers are not sufficient to give a statistically valid figure. Any responsible app would need to have pages with 80% caveat text in order to be meaningful, or risk being misleading or in breach of data protection law.

There is a case for including some SIES data by subject and other course characteristics in Unistats (probably more useful than TEF scores to prospective students, frankly) but we’d need to seriously amp-up SIES in volume and frequency to do so.

Finally – would anyone really choose a subject based on this data? I chose Pharmacy because it looked interesting – it turned out not to be (for me) so I then switched to English Literature and Music because they actually were. None of the data I present here would have been of any help to me at all.

Subject choice is far more complicated than simply finding the numbers that match your needs. We can all agree that the more (reliable, quality) information that is available, the better – but expecting prospective students to consider their future on a purely economic basis is frankly laughable.

Mind you, I’d still love to hear about that competition prize…

4 responses to “Supporting student choices with subject-level data

    1. Hi – no, it’s not in the public domain. Hotcourses share certain insights (at a very general level) with us to support our analysis.

  1. Hi,

    Something isn’t quite right in the subject lookup table – the LEO and SIES values for engineering, historical studies and languages are double from what they should be.

    Also, as a heads-up, the Tableau tables don’t appear in this article on Firefox or Chrome – I get only a white box. On internet explorer, I still get a “content cannot be displayed in a frame” error message – but there’s a link to the Tableau Public site that works.

  2. Thanks – fixed the former issue. The latter seems odd, I’m just using a standard iframe as the Tableau native JS was playing up as I set the article up. I’ll look into it some more.

Leave a Reply