A history of Data Futures

David Kernohan

Deputy Editor

by David Kernohan

staff

21/10/14

David Kernohan is Deputy Editor of Wonkhe

A history of the future of higher education data

That recommendation, tucked away in chapter 6 (para 6.22, no less), led initially to the bodies in question commissioning Deloitte to produce an initial proposal for collecting and sharing sector data. This led to an independent exploratory program, running from 2013 to 2016 – the Higher Education Data and Information Improvement Programme (or HEDIIP).

HEDIIP is an acronym that makes a generation of higher education data professionals get all misty around the eyes. Almost universally loved, it carefully and consensually developed with the sector and key stakeholders a range of work packages around key data themes:

Data capability – helping organisations get better at managing their data
HE data language – a student data model, with associated definitions
Subject coding – the genesis of HECoS and CAH
Student identifiers – work on the adoption of the ULN in higher education
Inventory – a surprisingly long list of all the existing higher education data collections
New landscape – all-encompassing work on the wider higher education data landscape, commissioned from KPMG.

At the time, the key recommendations came from the “new landscape” work. Though the end of programme report rightly highlights successes in all of the project areas, only the subject coding work and the HESA data capability toolkit have led to lasting change.

The other parts of HEDIIP work were swallowed into Data Futures. Most notably, this included the idea of HESA as a data warehouse, which would allow other data collecting organisations the ability to draw from a common dataset rather than perpetuating the “shocking” amount of data duplications (97 different collectors, 520 different returns).

The “official” programme kicked off in 2016 with grants from the four UK higher education funding bodies totalling £7.4m. It initially contracted with Civica in 2017 as a part of the detailed design phase – with the advice given at the bidding stage for that contract emphasising the role of HESA as a central collector and distributor of sector data, involving new, speeded up, collections alongside greater efficiencies. The bit that made the sector sit up and pay attention was the idea of multiple, in-year, submission points for data (up to 7 in one indicative diagram) – replacing the deadlines towards the end of the calendar year that had existed, at that point, for more than two decades. But of equal importance (and the subject of attention within data specialist circles) was a redesign of the data model – which would determine what data is collected and how it is structured.

The initial plan was for a full pilot in 2018-19, freezing the specification by August 2017 so providers could prepare and tweak internal systems as required.

What happened next?

Well, history relates that we finally achieved (for various values of the word achieved) a single deadline collection using the new Data Futures data model for the 2022-23 academic year. Though it worked – in the sense that some data was collected – the experience was a painful one. Although any change in a data collection will yield a momentary drop in data quality, numerous issues with the data collection platform (in particular the application of data rules) have led to problems. As things stand, providers had the chance to update their submissions through to April 2024, and following a final round of checks and consultation from the Office for Students, we expect to see data land in August 2024.

Not to discount the many hours of hard work by those involved, this is a little bit of a departure from the schedule.

Initially things seemed to be working out OK – version 3 of the collection design dropped in February 2017, and the programme moved into the detailed design phases. In August HESA’s Rob Philpotts was in an upbeat mood as the design phase came to an end, promising the publication of the 2019-20 student record specification in the coming weeks. By 2018 people who provide technology to higher education institutions were getting excited about implementing it. The Alpha phase launched in January 2018, by March some 14 providers were involved. A Beta phase covered 2018-19.

But it turned out not to be that easy.

Data despair

The redoubtable Student Record Officer Conference (SROC) held a one-off day in November 2018. Those who were there will recall “The Wall”, 249 post-it notes detailing the concerns of sector professionals around the design and delivery of Data Futures. March 2019 brought confirmation that Data Futures would not be going live in 2019-20, so that year was collected using what I guess we should call HESA Student “classic”. Even a cursory glance at September 2019’s OfS Board papers suggested that toys had very much been thrown out of the pram:

HESA Data Futures continues to face significant delivery issues which impact onward delivery to OfS. It has now been confirmed that HESA will not be able to deliver in-year data until 2021-22 at the earliest and there are still considerable questions about its ability to deliver for 2022-23. We have ceased grant funding for Data Futures until a clear way forward has been identified.

November saw a decision (with “complex” ramifications) to move to discrete collections, with three reference periods. January 2020 saw the Board return to the topic, in a paper so secret it remains redacted more than four years on. It is not hard to guess that it discussed the way Jisc replaced Civica as the technical delivery partner for Data Futures. This announcement was made quietly by HESA on 18 February 2020:

HESA has decided to deliver the next phase of the programme using an alternative delivery approach.

Another review

As OfS board papers relate, by the Covid-scarred September of 2020 HESA and Jisc (as data collection mechanism partner) were moving things forward – this kicked off with £1m for Jisc and £1.3m for HESA, then a further £2.6m was chucked into the pot to cover things until March 2021, with a further £7.2m (including a £1.1m contingency) for work between April 2021 and March 2024.

In retrospect this is a fascinating paper. There was a KPMG review of governance (annex C) that suggested it was not clear what the “nature and extent” of the OfS role in the programme was. The initial specification gave our beloved regulator involvement in “key design decisions” and demanded that progress should be “tracked against OfS requirements” – however it did not clarify what a key decision was or what OfS requirements were. An explicit recommendation to sort this by properly documenting things was merrily rejected.

It established a quarterly review group that would report to the OfS board (with funding tied to satisfactory reports), but failed to give the appropriate bits of OfS capacity to properly monitor progress – just two people had oversight of the programme, on top of substantial other roles.

The paper also states bluntly that OfS “requires” three data collections a year, something that becomes funnier in retrospect.

God bless us, every one

The new Alpha phase of the programme, involving piloting systems with a limited number of providers, ran between May 2021 and January 2022. In December 2021 a Christmas Carol themed blog announced a consultation on yet another change to plans from the Office for Students – with the idea of three individualised collections dropped. The options on the table were two collections, a cumulative three part collection, or a single collection with a shift to timing from the sacred norms.

Responses to this collection turned up in two chunks: part 1 (which confirmed a move to two collections by 2024-25, and suggested a potential removal of the HESES collection altogether!) in May 2022, and parts 2 and 3 (which deferred decisions on the removal of data items and the use of linked and third party data to “the short to medium term” ) in November 2022.

The Beta phase of the programme ran between February 2022 and November 2022, helped by the release of the online validation toolkit in June. By August, everyone was collecting using the new model ahead of the 2022-23 collection. And in October 2022, Jisc and HESA completed their long rumoured merger – which, little noticed at the time, brought an end to OfS funding for Data Futures.

Extended play

With the agreement of statutory customers, the sign off and final submission deadlines were extended in August 2023 (by two weeks), again by a further week in mid-October, with a later decision to collect information about provider concerns about submissions. This latter communication came on the same day as a decision by OfS not to attempt in-year data collection for 2024-25, which also communicated knock-on delays to HESES and Graduate Outcomes data collection plans.

It is from here that we got the idea of an independent review of data futures – something that more than 6 months on does not have a chair, terms of reference, or expected start date. Jisc has already run and drawn conclusions from an internal review that will feed into the 2023-24 collection already in active development. Providers have been asked not to expect all of their concerns to have been addressed. And the 2022-23 Student open data release (as confirmed by HESA in April 2024) is also 6 months late – arriving, we hope, in August 2024.

March 2024 brought another note from the Office for Students, bumping in-year data submissions to 2026-27 at the earliest, while promising to share details of the review in “the coming weeks”. It has not yet happened, and all I have from OfS is a personal assurance that a potential supplier has been selected (nothing on a chair or terms of reference) with due diligence underway.

Now what?

Twenty-twelve is a long way behind us, but we do not feel significantly nearer to the dream of a free flow of real-time student data. It will be 2026-27 before we even get more than one submission a year. And the current plan of two doesn’t exactly make for a finger on the pulse of the sector. The enormous delays to the programme have led to an unfortunate clash with the birth of the lifelong learning entitlement, at which point regulatory plans and thus data needs will change again.

The 2022-23 collection – a single deadline – looks, from the outside, very much like what has been standard HESA practice over the last few years. But even the avowed endpoint (an individualised in-year collection with multiple submission points) simply duplicates the HESA process between 1994 and 2001. Higher education providers already submit monthly data as part of the arrangements for apprenticeship funding. Data Futures as things stand currently is very much behind the curve. And it keeps changing, without offering many of the benefits that change could bring. A lot of very experienced data professionals have left the sector in frustration, and they will be very hard to replace.

We’re edging towards the end of the decade before we are likely to see two sets of data collected under the same (or broadly similar) specifications – with a consequent detriment both to data quality and the value of time series within the HESA Student data. This is not to downplay the hard work that staff at Jisc and providers put in last year and this year, but it is a shame and it does the sector no favours in government. The issue of data burden continues to be raised – with a focus on the needs of one statutory customer meaning that the idea of wider applicability, and a single data collection driving multiple analyses for multiple customers, appears to have been largely abandoned.

A proper independent report will help in establishing just what can be learned from this entire debacle. All we need to do now is launch it.

post list Latest articles

Shutterstock_2462184773 — Image: Shutterstock

Counting the cost of financial challenges in English higher education

by Stephen Roper

Analysis

2/12/25

Paper,Plane,Shattering,Glass,Ceiling,-,Symbol,Of,Freedom,,Innovation, — Image: Shutterstock

Only innovation can return higher education to growth

by Rod Bristow

Comment

2/12/25

Robot,Teacher,Explains,Modern,Theory.,Classroom,Interior,With,Empty,Black — Image: Shutterstock

High quality learning means developing and upskilling educators on the pedagogy of AI

by Debbie McVitty

Comment

1/12/25

ghjeiruoghjuioerhgiu — Image: Midjourney

The end of pretend – AI and the case for universities of formation

by Jim Dickinson

Long read

1/12/25

Student engagement does not work if institutions are stuck in survival mode

by Jonathan Eaton

Comment

28/11/25

Wonkhe-Scaffold-Framework — Image: Shutterstock

Skills England has a new way to talk about skills, and the sector needs to listen

by David Kernohan

Analysis

28/11/25

Higher education postcard: Peterhouse, Cambridge

by Hugh Jones

Comment

28/11/25

Wonkhe_WonkheShow_Social_Blue@2x — Image: Wonkhe

Podcast: Budget, R&D, Scotland’s tertiary bill

by Team Wonkhe

Podcasts

27/11/25

Universities now need to be much clearer about the total cost of a course

by Jim Dickinson

Analysis

27/11/25

Red,And,Blue,Pill,Choice,As,A,Person,At,A — Image: Shutterstock

The post-matrix university – trust, relevance, and the politics of plugging back in

by Amanda Broderick

Comment

27/11/25

3 Comments

Oldest

Newest

Inline Feedbacks

View all comments

David Ealey

1 year ago

One of the unfortunate outcomes of the whole 22/23 Data Futures fiasco is that a lot of people have ended up holding the DF data in tables on their system that are separate from their main data. Whilst this has always been the way that many systems work, what almost certainly happened in 22/23 was that more data was manually updated in these HESA tables to get returns through. There is then a greater difference between the HESA data and the data that the institution is using to manage its students, which raises issues with internal reporting and comparisons to… Read more »

1 year ago

Reply to David Ealey

I suspect issues may arise because providers have become increasingly dependent on HESA to generate what they consider ‘management information’, but relatively little is based on internally generated data (e.g. processing). It’s only return compilers who have to cobble that together into a ‘valid’ return, for HESA to enrich and provide to data customers, to populate their data warehouses and dashboards. I suggested HESA share their derived data after introduction of the old Student Return in 2007/8, and ‘HESA Core Data’ has been provided since 2008/9. That made the Student Return an annual source of validated and enriched data for… Read more »

Johnny Rotten Return

1 year ago

Don’t be told what you want
Don’t be told what you need
There’s no (data) future…

Share

A history of the future of higher education data

What happened next?

Data despair

Another review

God bless us, every one

Extended play

Now what?

Share

Share

post list Latest articles