We need to talk about Data Futures

As recently as 11 February, a HESA blog assured us that Data Futures was on track – indeed, “very much data present”, was the title. Today, with the postponement of the programme, it seems like another change of tense is required – HESA Future imperfect, if you will.

Student data for 2019-20 will now be collected using the existing HESA Data Collection system on a retrospective basis. Modified versions of the 2018/19 Student and AP Student specifications will be made available in due course – as we are already fast approaching the 2019-20 academic year we hope this will be sooner rather than later.

The future sound of HESA

Data Futures tries to solve two main concerns about current data – combining three similar but distinct returns (the main student return, the alternate provider student return, and the initial teacher training (ITT) return) into one, and allowing for data to be submitted much closer to the date it was collected rather than the current year-in-retrospect model. This latter attribute has been the focus of much excitement at the Office for Students as it would allow for the dashboard style “real-time” regulatory data that underpins the regulatory framework.

Well, it’s not quite dashboard-like – we’re still talking about submissions in each of three reference periods through an academic year. Though data submissions could occur as close as needed to the business events that generate the data, there would be three sign-off points where the institution would certify that the data they held and the data HESA held were in concordance. (The tricky first year, 2019-20, would not have seen final sign off until the summer to allow for further testing to take place).

This short description does not in any way get across the huge changes to working and liaison practices that such a change needs. The effect on student records teams is clear, but there is also an impact on the data infrastructure required to support in-year returns. Institutional IT departments and software vendors have been scrambling to update platforms, and will be grateful for this breathing space.

In essence, this is an old problem. Any change to data collection practices or systems will necessarily include a temporary reduction in data quality – HESA had been assuming that this temporary reduction would be addressed during the alpha (last year) and beta (this year) test phases. Clearly this has not happened – the key question now is whether an extra year of development will be able to bring quality back up, or whether there is a more fundamental issue.

Back to the futures

The Office for Students’ Richard Puttock is sanguine about the delay:

The Data Futures programme board, which includes a broad range of sector representatives, unanimously recommended that the HESA board delay full implementation of this project. This is a complex project, and it became clear that it was not practical to proceed on the original timeline. While we recognise this decision will cause some concern in the sector, it is crucial that our regulatory processes are underpinned by high quality data. We can’t rush this project, or compromise on the quality of data which we use. We want to understand how best to simplify sector-wide data collection in a way which ensures we have access to high quality data and minimises burden on individual providers.

He’s right that good data, like good food, can’t be rushed – but the decision by the programme board (on which he sits) meshes poorly with the way OfS has been talking about their use of data over the last year.

The Board paper setting out the decision to approve HESA as the Designated Data Body (DDB) notes:

Our judgement is that Data Futures is a suitable operational design for data collection to meet the OfS’s requirements in terms of technology, capacity and capability.

It does not make a similar claim for HESA’s current processes.

The OfS’ approach to institutional monitoring relies on “lead indicators”. These, in the words of the framework (s128), can be:

Indicators constructed from data and information flows, in as near real time as possible, that will assist the OfS to identify trends and anticipate future events.

Stating the obvious, the current HESA system does not offer data in anything like real time. So these components will not be available for monitoring, meaning that alternative approaches will need to be identified. In fairness, Data Futures provided for submissions during three “reference periods”, so this was hardly “real-time” either. But annual data will not offer OfS the ability to respond as rapidly, meaning that expectations at OfS (and DfE, for that matter) need to be managed.

Data Futures data was also intended for use in funding and performance indicator development by OfS, directly replacing current student datasets.

OfS involvement in the decision, and the absence of a suitable alternative DDB (nobody else even applied to be considered for the role) suggests that HESA will not suffer as a result of this decision. I should be clear that it was very obviously made for sound data quality reasons, and at the recommendation of an independent board.

Days of futures past

So, what will happen next? Clearly HESA (and Civita Digital, as Data Futures delivery partner) will be working closely with institutional contacts and the programme board to ensure that the data quality issues raised can be addressed in the extra year they have been granted.

In practice this will mean re-opening some of the discussions about data definitions that had been considered mostly complete following the release of version 2.0.0 of the guidance. We now sit at version 2.2.0 (as of 7 February) – the central two of the numbering system refers to two fairly major non-backwards compatible changes since the launch of 2.0.0 late in 2018.

With all such large data projects, you can only really identify the problems with the system when actual data starts being fed in. Providers participating in the Alpha and Beta test phases have been doing just that, starting in Alpha with data relating to mainstream students and moving to the edge cases as HESA moved towards beta.

Beta only kicked off – very much delayed – earlier this month – I’d be fascinated to know why the programme board signed of the transition to beta (a stop-go point as I’d understood it) before pulling the plug.

It’s been a busy time for everyone involved, and emotions will be mixed as the pressure of readying processes and systems for September has been replaced by another year of development and testing. It’s a different kind of pressure – and the need to patch up often creaking student records systems for an extra year of the old model having already got some way down the procurement path for a new Futures-compliant system (that, admittedly, would not have been entirely ready in time anyway) will not exactly delight institutional IT teams.

A culture often develops around such mammoth data-related change processes – there is a matched set of goodwill and well-meaning sarcasm among the HESA people (one of the most consistently delightful subcultures in UK HE) for Data Futures. The work to ensure the data is accurate extends far beyond Cheltenham, and we should not forget that the future improvements to data quality will come from the efforts of those in institutions working on the pre-release versions – repeatedly adding data, trying to get it to validate, and liaising with HESA.

One complaint that has shown up on occasion is that Data Futures was being bent to the will of OfS – that the needs of the regulator had begun to overshadow the wider needs of the sector. As the sector is collecting the data and the sector is using the data, the sector needs to continue to shape the process, not the regulator.

4 responses to “We need to talk about Data Futures

  1. I’m not sure HESA’s expectations were that the reduction in data quality would be resolved in the Alpha and Beta. In all the discussions I’ve been involved in it was very much acknowledged that there would be a reduction in data quality in the first year but that was a necessary sacrifice for the greater good. OfS, however, are not willing to accept this, nor willing to work with the sector to identify how best to mitigate this.

  2. Spot on Mick. I am extremely dubious that insitutions can drive up data quality to the levels of the current return. The maturity/processes/culture/focus required to clean at the point of capture, and maintain DQ through the many paths it takes through a complex university are not there yet. Another year will help – if resources are not shifted to other projects – but it’s hard to see how quality will be comparable when we move from a six month QA tail to a six week one.

    Anyway, that’s an excellent article David. I share the concern that the HEDIIP vision has been – at best – massively diluted. It feels as if the needs of one customer are more important than the wider sector. Maybe that was inevitable. I hope this extra time allows bodies such as the Data Landscape Steering Group (DLSG) to maintain at least some part of that vision.

  3. I think, that as we have been discussing on Twitter, Alex’s view that the HEDIIP vision has been overtaken by events – primarily the creation of the OFS – is spot on. It is clear that the original vision of Data Futures is not one that appears to be shared by the OFS, there is certainly the desire for more real time data but they seem to be finding it difficult to move away from the old HEFCE approach – based around funding, (why are we still returning FUNDCOMP).

    If this is to progress then think there needs to be a clear brief from the OFS as to what they want and why they want it. Even with the current specification it is hard to see the purpose of some of what is being collected or, why it is being collected in the way it is, fee information is a case in point. It would be far better if we were clear what they wanted to use it for as the sector could actually advise the best ways of collecting it.

    Data Futures was always going to involve institutions in significant changes to business processes, for these to be put in place, even for 2020/1 needs a clear and stable specification by June 2019 if it is to be delivered

  4. A good article, thanks David. One of the things perhaps understated is that the move to Data Futures was not simply about the timeliness of data. The data model is enormously more complicated than under the current HESA Student return, both in terms of the logical model and the amount of data collected. Institutions have been forced to change business processes in order to deliver data in the way HESA required, and the extent of that work has been a significant factor in the struggle to be ready for 2019/20.

Leave a Reply