LEO’s star ascendant in 2018

The latest collection of Longitudinal Educational Outcomes (LEO) statistics has arrived, and to the uninitiated it could look like a confusing mass of data.

In a nutshell, LEO maps individual learners to their tax records, so that their subject and institution of study can be seen alongside their salary, at set numbers of years after graduation. The data we get represents median, upper and lower quartiles of income. These can be split by gender, and you can also see prior attainment bands, as well as the percentage of matched graduates with POLAR3 data that are from POLAR3 quintile 1 (the least likely to come from a background with a history of HE participation).

People get interested in LEO because it invites the inference of a causal link between subject and institution of study, and future earning power. It is, of course, a bit more complicated than that, as there are numerous caveats.

Four paragraphs of caveats

First of all, let’s be really general. There’s no real reason to think that the past earnings of graduates are any kind of an indicator as to future graduate earnings. The graduate job market, and indeed the economy as a whole, is changing rapidly due to a number of technological and political factors. LEO is an interesting historic data set – it can tell us how much 2009/10 graduates earned in the 2015/16 tax year with a reasonable degree of accuracy, but it can say very little regarding how 2019/20 graduates will fair in the 2025/26 tax year.

Looking at data quality, the data focuses primarily on pay-as-you-earn (PAYE) earnings – those earnings on which tax is automatically deducted at source, typically for somebody with a permanent job. Though this year we see the inclusion of other earnings alongside PAYE as a separate field, these are used before deductions are applied so the two are not really comparable. LEO also doesn’t include salary data from graduates that are not paying UK tax, which includes those working overseas.

There are, of course, many other factors that influence salary level. Gender, prior attainment and family background are included, and can be viewed in isolation. Ethnicity and disability are not. Neither are care responsibilities. Age – this one startles me a lot – is also not included. Though the data includes the region of the institution that an individual graduated from, it does not include the region that said graduate is currently working.

Finally, LEO does not give us context as to the state of the job market within wider society. Though you could compare these salaries with similar non-graduates, in practice this would be very complex to achieve. You may argue, as many do, that those in law and finance may be paid more whilst contributing less to society. You may suggest that professional musicians earn poorly (often working multiple jobs) but experience greater job satisfaction. You may even argue that the purpose of higher level learning is not simply to maximise individual earning power.

The visualisations: subject graphs

First of all, I’ve looked at average median PAYE earnings by subject and institution. You will need to select the subject of interest, tax year, and academic year of graduation at the top to view data – which is then filterable by sex, institutional region, current Teaching Excellence and Student Outcomes Framework (TEF) award, and institutional group, on the right. The coloration of data points shows the prior attainment band: darker colours (band 1) represent cohorts with a higher level of academic attainment prior to higher education, lighter colours represent those with lower prior attainment.

The ordering of the marks shows the lower median earnings to the left, and the higher to the right. For popular subjects you may need to use the scroll bar at the bottom of the graph to move left-to-right.

If you hover over one mark, the tool tip pops up showing the institutional name and attributes, alongside the PAYE median, and the number of graduates and matched graduates to put this into context. Lower values in the latter two fields suggest that the data point in question may be an outlier.

The sheer scale of these things means that embedding them would slow the page down to a ridiculous crawl, while not giving readers the space to properly interact. So, grab the biggest screen you can, maximise your browser window and:

View this visualisation here.

The visualisations: institutional graphs

For those in institutions looking to dive more deeply into their data, in particular to see the way that earning power changes after graduation, there are two further visualisations just for you.

Choosing institution, sex, subject, and tax year – you can view median and quartile salaries. There’s one graph for PAYE median salary and another for all median salaries, though note as above that the way this has been done in the source data doesn’t appear to be particularly robust.

You can view these two visualisations here.

So what does it all mean?

The text of the data release will explain that the relationship between salary and institution varies substantially by subject, to a greater or lesser extent, which again varies by subject. Which is to suggest that the employment sector that tends to attract particular subject expertise was the greater determinant of salary over the years covered. Variation by subject area also correlates with prior attainment and sex, tending to back this hypothesis up.

Fundamentally, we are looking at a complex multivariable system that tends towards the chaotic. There’s no adequate way of tracking causation, though interesting patterns of correlation do occasionally emerge from the data maelstrom.

It’s complex data – fascinating to researchers, but near useless for the mainstream press or prospective students. Even, Mr Gyimah, if we put it on Snapchat.