Why doesn’t SAGE have the good data?

There's something odd that has struck me about the new SAGE task and finish group paper.

I’ll assume by now you have read Jim Dickinson’s stellar analysis on the site, and if you have been reading Wonkhe over the summer you’ll see a lot of ideas you might recognise from earlier articles.

What I was looking forward to was diving in to their data and modelling. The modelling has not been published – at all. Which – given the potentially life-saving use this would be to universities and colleges – is a worry. Could I ask SAGE to consider signing the ALT Open Covid pledge?

There are a couple of papers cited in there that I’m looking forward to reading. But let’s be frank here, I’d assumed they’d have better data than me. Either esoteric administrative data that I’d never get to see but could dream of a summary perhaps, or – preferably – openly available data that I could use in my own analysis.

SAGE has worse data than me. This should not be the case.

Annex A is a very high level digest of HESA data with a bit of basic analysis, with a cheeky injection of four year old Sutton Trust data on commuter students.  This is concerning because there is better data out there in the public domain. I’ve published analyses of lots of it – there are more that I’m still working on and it is taking me time because marrying data from different sources together is often a pain, because it comes at different levels of granularity and different levels of sensitivity.

For instance:

  • Even if we just stick with HESA data there is a better look at where students are travelling from to study in a particular region…
  • But the ONS has internal migration estimates split by age, showing you where students are moving to and from at a local authority level of resolution. (I plotted this here)
  • There’s another look as to where students may be living in halls or houses by local authority area if you use the annual Council Tax Exemption dataset (you’ll want categories M and N, which I know because I plotted it here).
  • ONS’ NOMIS service has students in accommodation at ward level. For 2011, admittedly, but it gives you a great idea where students might be, as I found.

But that’s just the stuff I can get to. Mortal men like me don’t get to play with the really good stuff – the closest I’ve come so far is the custom HESA split by postcode sector I got from Jisc and showed off here. Here’s what else I know is out there:

  • The Student Loans Company has full home addresses of students, and knows which provider they are heading to.
  • As does UCAS – who also hold a lot of demographic and course level data – including students that may be studying outside the main campus. UCAS also already knows how many students are on which courses at each campus at each provider for next year.
  • HESA has full term time addresses (not brilliantly accurate by its very nature, and for a year or so back, but better than any other source) mapped to provider and home address (handy for the oft forgotten y2, y3, placement, and PG students), and data on the split between HMO, provider halls, and private halls by providers.

If you wanted to track what the return to campus actually entails, those datasets would let you spot the micro area a student comes from (and thus the likelihood that a student was already infected with Covid) and where they were likely to be going to (and thus the likely impact of this). We can mash together administrative data like this to tell us useless things like what students who graduated 7 years ago earned 3 years ago, so why can’t we use these powers for good?

Surely SAGE has the resources of government available to it – there is literally nothing more urgent or important right now than preventing a second wave of Covid. We’ve know this was coming all summer – we should know a lot more about what is going to happen than we do.

The data exists, but I can’t plot it. SAGE, or other parts of the government, can. Why haven’t they?

Leave a Reply