Are test results sent to student home addresses distorting national Covid-19 data?

Poor data associated with Covid-19 test results in England (imagine!) may be distorting what we know about where the virus currently is spreading. Let's see.

The story going round is that students are getting a test on or near campus, but the test result is linked in national data to their home address (or, more accurately, the place they are registered with a GP) rather than the campus itself. Here’s the version in the Evening Standard, which is as good a place to start as any.

The suggestion is that this both underplays the extent of infection on campus, and overstates the prevalence of Covid-19 in the kind of places that students tend to come from.

But we can plot data on places where lots of young people tend to go to university (I’m using the OfS’ TUNDRA dataset) against recent Covid-19 cases from Public Health England). It is surprising that nobody has done this yet, because these are both public data sets and are handily at the same resolution (both use MSOAs, census-derived areas of around 7,000 people in England).

But is it, though?

[Full screen]

The left axis is the young participation rate for each area – the proportion of 16 year old state-funded mainstream school pupils who would be expected to participate in higher education at aged 18 or 19 years. The right axis is the last seven days of PHE data on lab-confirmed Covid-19 cases (as downloaded on 9 October). You can use the filters on the right to look at a region (set to “London” at the moment), an individual local authority area (for example, a London Borough) and to highlight an individual area based on the handy list of “friendly” names prepared by the House of Commons library. On the latter if you are looking for a particular small area, it will not be shown if it is marked “-99” (less than 2 cases in the time period described) in the official data.

The line in the middle shows the correlation – if the hypothesis about students on campus being the source of rising case numbers in their home are, we would see a significant slant towards the top right, and the “R squared” number – here a statistical measure of the degree of correlation between our two variables, would be greater than about 0.6.

To me, there’s no evidence of correlation. Though it is entirely possible that some student cases of Covid-19 are being misattributed to a home address (frankly nothing in our test, track, and trace system would surprise me at this stage), this is not happening in sufficient numbers to show up in the data.

Would it matter if it was?

Nationally, we do tend to get hung up on precise numbers of cases in a given area. This is arguably less important than simply knowing whether there are lots or whether there are a few. We know that there are clusters around where students live, especially in the North – but as students tend to live in high density areas with significant local poverty we can’t blame students for this. High density and low general health (and being in the North) are a better predictor of Covid infection rates. There’s not much else data can tell us here.

So there is little danger of a leafy area of London or Surrey facing local lockdown just because a lot of young people have gone off to university, but we do still need to be concerned about the safety and welfare of these students throughout the rest of this year.


Leave a Reply