What would it take to shift the dial on data burden?

Everyone loves to commit to reduce data burden in higher education. Andy Youell asks what it would take to actually do it.

Andy Youell is Executive Director: Regulation at UCEM

In its recent report on the OfS, the Lords Industry and Regulators Committee called for the Department for Education Data Reduction Taskforce to be reconvened.

This recommendation – which is addressed directly to the DfE rather than the OfS – echoes the recommendation made by Universities UK in its recent study on regulatory burden and by GuildHE in the recommendations of its recent Regulation Briefing Series.

The issue of data burden seems to have been on the agenda for ever and yet it feels no closer to resolution now than it was thirty years ago. How did we get here, and what would it take to make real lasting progress on this perpetual problem?

A brief history of data burden

Funders and regulators have always used data to inform funding and policy; universities have always complained about the burden of supplying this data. In the early 1990s a senior vice chancellor told me that setting up HESA as a sector-owned agency would enable the sector to push back on “all these wretched demands for data”.

In the 2000s the debate around data burden was wrapped up in the broader issue of regulatory burden as HEFCE ran a series of three Accountability Reviews. These were followed by the Better Regulation Task Force and then the HE Data & Information Improvement Programme (HEDIIP) which launched ten years ago.

Despite all these concordats, programmes, frameworks and agreements the expectations of the value that can be derived from data have increased relentlessly and so therefore has the demand for granular, timely, high-quality data.

What drives data burden?

Although many initiatives have sought to address the problem of data burden, there remains very little consensus about the specific drivers of burden and how it varies. Earlier this year I launched the Data Burden Project to analyse the lifecycle of activities that institutions go through when making data submissions to funders and regulators. The main areas of activity are:

  • Understanding the reporting requirements and preparing systems and processes
  • Data capture and processing
  • Making the data submissions to data collectors
  • Reconciliation of data submissions with other data sources
  • Engaging with funding and regulatory metrics

Within each of these broad areas the specific tasks involved were analysed and the nature of the burden was assessed.

The study found that overall data burden has increased significantly in recent years. This has been driven by an increase in the use of funding and regulatory metrics and by a general increase in complexity across all of the data interactions. This in turn increases demands on data systems and on the data professionals working in the sector.

The analysis of tasks found that data burden is not significantly driven by the size of the institution. For each task burden is either a fixed amount or an amount that varies according to complexity and fit of the institution’s data and processes with the external data model and algorithms.

It follows that burden reduction initiatives by individual funders and regulators can only have a marginal impact on the burden experienced by each institution and that this impact will often be difficult to predict and randomly distributed. Burden does not correlate to institution size so smaller institutions are disproportionately burdened.

The final conclusion of the project is that the duplication of data collections across HE is a far more significant problem than the burden associated with any individual data collector.

What needs to happen?

Despite so many initiatives to address the problem, the Gordian Knot of data burden remains. I think there are three key elements that need to be put in place to make a real and lasting reduction in data burden.

There is a need to standardise the data definitions used by the funders and regulators across the sector. Across the range of returns that we have to make we often find that the same concepts have frustratingly different definitions. Jim Dickinson recently explored the myriad of different definitions of full-time that we have to contend with; OfS alone has two different definitions – one for B3 metrics and one for funding calculations. The requirement for institutions to simultaneously map their internal data to all these different definitions adds layers of complexity and cost to the data submission processes. It also creates a whole new category of burden when institutions are asked to explain why their submissions to different bodies don’t appear to match.

And there is scope to rationalise the number of data collections that run. Funders and regulators often repeat the mantra collect once use many times but the extent to which data is shared and reused remains frustratingly low. Institutions have to engage separately with each collection: separate portals, user-accounts, submission processes, validation checks, sign-off processes and each one running to a slightly different timetable.

Thesw first two elements are essentially technical; standardising and rationalising the machinery of data collections. The third necessary element plays to a different set of rules.

HE providers in England have to submit data to many silos of funding and regulation: OfS, ESFA for apprentices, SLC for Student Finance (similar collections of acronyms exist in other parts of the UK). In each funding/regulatory silo the data machinery serves a funding/regulatory framework that is operated by a funding/regulatory (arms-length) body which operates within a set of legislative rules and policy objectives. Each level in each silo is optimised to serve the requirements of the level above it and the extent to which data collections are duplicated and not standardised is perhaps an inevitable consequence of a lack of standardisation and coherence across the funding and regulatory landscape more generally.

If we are to progress the standardisation and rationalisation of the data machinery there needs to be a way to pursue the standardisation and rationalisation of the data landscape without having to rebuild the entire funding and regulation system. The funding and regulatory bodies will not do this voluntarily because they would have to cede some control of their own data machinery; previous attempts at leadership/oversight/governance in both the HE and FE sectors have failed due to a lack of hard mandate in this respect.

Therefore, there needs to be some kind of independent, authoritative entity that can steer and mandate change across this data landscape. This would require political will; a commitment from Mr Halfon – or maybe Mr Western – to support a real and lasting solution to the age-old problem of data burden.

Without this commitment, we could be having this debate for the next thirty years.

6 responses to “What would it take to shift the dial on data burden?

  1. One bit of data burden coming towards the English sector is the move to credit-based funding in 2027 (under the name of LLE – but it’s much wider). Data for students to access tuition fee loans to pay for modular courses requires a rethink. It’s only four years away – its a real opportunity to get this right.

    1. You’re right Mike. The data machinery to support LLE is likely to be very burdensome; high complexity and a poor [data] fit for many institutions.

      Four years is not a long time in the world of sector-level data architecture. Things could get a lot worse before they get better…

      1. And what’s worse, the sparse information so far suggests that the LLE system will be run through a new portal with new eligibility rules, yet it will have to run in parallel for many years with the old portal as continuing English students will remain in there until completion. Even worse, only England is in LLE (correct me if I’m wrong here) so SFW, SAAS and SFNI will all remain on the old portal indefinitely, with their different eligibility rules, online guides and contact methods.

        Despite extensive SLC process automation on our part, far beyond what our software supplier provides, we have seen our SLC burden increase many times over in the last few years. LLE is only going to make it worse, even if well-implemented, because it adds complexity, but doesn’t remove any.

  2. And, of course, this also affects FE providers with small amounts of HE provision. Doing any HE returns when you’re used to FE data is truly mind-bending (what *is* a completer????”).

    Also, communication between FE and HE data collectors is almost non-existent (beyond taking data items out of the ILR). About a year ago there were hints that FE was considering an “always on” ILR and when I raised the issues Data Futures had faced, I was met with blank looks…

    1. There’s such a gulf between FE and HE: universities have the same issues when they’re faced with the language in the ILR, and those with an understanding of both are few and far between. Teacher training apprentices are probably the worst because of having to understand them in terms of ESFA, OfS, JISC/HESA and DfE (all with language differences).

      FE/HE differences aside, it would have been so good if the standardised langague talked about as part of HEDIIP had been given the recognition it so definitely needs. It wouldn’t fix the FE/HE gulf but it would have been a start.

  3. Standardisation among data collectors – yes; if someone else is collecting the same thing don’t make a big fuss about needing a “special” version for yourself – negotiate with the other users or make do with what others use.
    Collect once – yes; if HESA collect it already, get it from them and not from HE providers – don’t save yourself trouble at someone else’s expense.

    But beyond these lets have a proper effort at comparing the cost of collection with the value obtained from what’s collected. Even showing what use is made of what’s collected would be a start. I don’t mean making a claim about its use, I mean show us what you’ve done with it. Here are two easy, evergreen examples.

    Staff record, Current Academic Discipline. HESA says “The current academic disciplines of staff are required by statutory customers for the monitoring and progress of strategically important and vulnerable subjects (SIVS). Statutory customers base their decisions to support subjects at risk on evidence which is gathered and developed regularly.” I’ve never seen anything published that indicates this monitoring actually takes place, still less of any policy decisions or statutory activity resulting from what the monitoring uncovers. And whenever I ask I get silence.

    Provider Profile, everything to do with the departments, organisational structure and the distribution of cost centres.”Reason required – To provide information on academic HESA cost centre allocations across the sector.” Who has ever used this information, and for what? More silence.

    This is important.Everything collected requires some level of managerial and administrative attention, which is drawn away from other things. Lots of little unimportant things create a silt that clogs up parts of the organisation.

    Over the years silt accumulates.We need to move towards a place where you’re not allowed to collect information unless you can show that it’s worth more than it costs.Perhaps data demanders could pay HESA the costs of collection rather than HE providers being charged. That might focus minds.

    It’s not just costs, though. If you don’t show how you use data, you’ll not get what you think you’re getting. this is why collecting stuff that “we might need” is futile. If the details that are collected have no effect once they’ve jumped the technical hurdle of quality rules then you’ve no reason for believing that what you’ve collected means what you think it does. It doesn’t matter how much you care, or how long you spend honing definitions. Decisions will be made based on what is seen to happen, not on what you say.

Leave a Reply