Is sector data still good enough?

From her laptop, Bridget Phillipson can track pupil absences in a single class at a single school.

In most cases the data is collected and returned automatically from the school’s management information systems for each session (morning, or afternoon) every single day. On that same day.

The data collection largely happens via a service provided by a commercial partner called Wonde – in returning data to DfE each school or multi-academy trust grants this service and DfE the ability to access and process its data.

The Secretary of State (or, more likely, her staff) can cross-reference this data with information about these pupils’ attainment, socio-economic background, and the type of school they are attending. This analysis feeds directly into policy making and regulatory intervention – a spate of unauthorised absences in that class can be identified, and appropriate interventions made. Perhaps that same day.

Meanwhile in higher education

Data is the tool used to determine and drive regulatory interventions in every educational sector. For this to work properly, and allow for meaningful action to be taken to support students at the moment they need that support, the data needs to be timely.

Currently, the Secretary of State can get a sense of where students are not participating in their studies via a termly requirement for providers to provide enrollment information to the Student Loans Company. This is purely administrative data that controls the payment of fees to universities and maintenance loans to students. Though a reason for withdrawal can be submitted for every student, this information is not used in regulation.

The gold standard for regulatory “non-continuation” data in higher education is via the HESA Student data collection. Once a year (currently: the plan is to add another in-year collection from 2028–29) each provider returns information about its currently enrolled students, and this is used to track individual students’ progress via a comparison with the previous year of data. Data is submitted in November for the preceding academic year, it is available for use by January.

Continuation data is just one example of this kind of lag. Financial data is provided to the regulator just over five months after the end of the financial year to which it relates – so late, in fact, that OfS has introduced an additional in-year submission to keep track of the stability of the sector. For moments of crisis, a reportable event is meant to see providers report problems to the regulator directly.

And we should note that it takes us until January to get data on who has offered places to which applicants to do courses that started in September or October. It is almost impossible to get data on much beyond the standard full time first degree autumn start courses – with postgraduate recruitment now running above undergraduate numbers, and non-traditional start dates becoming more common, it is readily apparent that we know very little about what, where, and how people want to study.

And data on the Lifelong Learning Entitlement? All we’ve had in the trial stages has come from parliamentary questions and freedom of information requests – and there’s been no evidence of any collection or publication plans in future.

Lagging indicators and missing data

Universities generally try to be responsive to the needs and concerns of their students, but in regulation our only eye on this is the National Student Survey, which applies only to final year undergraduate students and only becomes visible after that cohort has graduated. We get some information about what happens to graduates afterwards via a survey with a response rate of around 30-40 per cent, just 15 months on (barely at the beginning of a graduate career).

For longer term information we turn to the magic of administrative data joining and enter the world of Longitudinal Educational Outcomes. This collection is known almost universally as the source of graduate salary data, but the limitations of tax data mean that it is unable to distinguish between full and part time work, and unable to provide information on overseas employment.

You’d think that the government (concerned as it is with industrial skills planning) would be interested in whether universities are able to employ (and train!) the staff they need – but the staff data collected currently includes only those on academic contracts. As most of the claims about universities as anchor institutions rely on the employment of local people in these technical, professional, and support roles this is a concerning gap. Elsewhere, a disturbing proportion of “unknowns” blight useful data about academic career paths and teaching qualifications.

You’d likewise expect a government containing Ed Miliband to take an interest in the environmental impact of the higher education sector. The HESA Estates collection is as close as we get – but it is optional, funded by the sector, and the data collected doesn’t necessarily meet modern environmental monitoring needs. And there appears to be no appetite to fund a much-needed update.

Collection and interpretation

Higher education is not the school sector. Because providers are diverse, complex, and autonomous it isn’t as straightforward as a single point of ingestion. HESA – the Higher Education Statistics Agency – was established as the data layer for the sector as a single point of truth, but this dream remains an ambition: there are many data gathering operations (and data publications) that do not flow through HESA.

The attractions of making your own collections for the various regulators and professional bodies will be familiar: ease of use, and timeliness. Unfortunately all of these collections increase the burden on providers – most of whom are not exactly swimming in cash at the moment.

Providers also collect a lot of their own data, most of which never leaves the institutions. Learner analytics information (covering, often in microscopic detail) the learning experiences of students, are often used to identify where people are having a problem and intervene before it becomes a reason to leave higher education.

There’s also a range of internal performance data that offers more contextual information on the performance and needs of departments and teams. Often to be found linked to strategic indicators or on dashboards, this in-house collection is sourced largely from administrative systems (linking together, say, student information, the virtual learning environment, and attendance records). It can be used for ongoing monitoring and to inform very serious decisions – information on recruitment trends, for instance, often plays a role in determining the size and shape of the course portfolio.

Statutory collections are far from perfect, but they are dependable, reliable, and (largely) consistent across the UK. Recent years have seen the Office for Students move away from common approaches to performance indicators (on access and participation, continuation, and progression to graduate employment) to rolling their own from the same data. For the first time, your academic registry needs to have a handle on subtle changes in methodology and data definitions.

It also makes for a difficult environment for any applicants looking to make decisions based on data. In England the TEF condenses a set of very diverse indicators to a gold, sliver, or bronze rating – there’s very little evidence that anyone finds this high level of abstraction useful. The various commercial league tables are perhaps a little clearer but still bake in compiler assumptions (do prospective undergraduates really care about research performance that much?). And then we have the venerable but almost unknown Discover Uni, which offers a dizzying range of data with varying levels of correspondence to any undergraduate course on offer.

Last chance to see?

The schools comparison made at the top is perhaps a little unfair. Something as diverse, and frankly as awkward, as England’s higher education system could be captured in the kinds of automated returns that schools make.

But within DfE there is a level of frustration with higher education data – both within regulation and as a tool to think about the value that the nation gets from its investment in the sector. In a world where schools can return data twice a day, why do universities struggle to do so once a year? Why are there so many lags and gaps within what we are able to see? Why are we still unable to provide usable data to applicants about the quality of the courses they are applying to, and the outcomes they might expect?

These are not straightforward questions, but the danger for the sector is that they sound like straightforward questions. And if ministers become engaged, we may not like the answers they offer.

The in-year collections for student data – due in 2028-29 unless we get yet another delay to a Data Futures process already ten years behind schedule – will be a key means of showing that the sector is serious about addressing these problems.

But this is not a substitute for a proper debate about what data we need to collect and what might be the easiest and cheapest way of collecting it. If the sector can’t lead and shape this debate, the decisions may be made for us.

The whole sector, joined up

Every HE professional knows their patch. Far fewer can hold the whole picture – funding, teaching, research, students, data, governance, culture, international, strategy – and see how one part pulls on another. Wonkhe’s Making sense of higher education is built to close that gap. Ten live online modules delivered by Team Wonkhe with special guests, and designed for anyone who needs to think, work and lead across the whole sector, not just their bit of it. Sign up for our free taster session on Thursday 2 July, 12–1.00pm.