We need to talk about data burden

The higher education sector has changed enormously since I started at the Polytechnics and Colleges Funding Council (PCFC) in the early 90s.

Governments have come and gone; policies, funding reviews and regulators have all had their moment.

But there are some things that have remained absolutely consistent over the decades; the brilliance of our institutions, the impenetrable sea of acronyms in HE and the ever-present calls for a reduction in data burden.

I’ve been thinking a lot about data burden recently. At the University College of Estate Management (UCEM) my Reporting and Analysis team are responsible for a range of data returns and the challenge of managing resources to support this work is immense.

There have been many burden reviews, taskforces and initiatives over the past three decades and many hours have been spent discussing the causes, the measurement and the impact of data burden. But nothing feels better; in reality I think it is getting worse.

To solve a problem you need to understand it and the litany of failed attempts to solve this one suggests that we don’t really understand data burden as well as we need to. We need to talk.

A model of data burden

To prompt some constructive conversation in this space, I have developed a simple model that sets out the lifecycle of activities that are necessary to deliver the data interactions with funders and regulators. The model sets out the five key stages of activity:

Understanding the requirements and undertaking preparation activities
Data capture and assurance
The data submission process
Reconciliations between different returns
Engaging with the regulatory metrics

For each of these five stages the model sets out the key activities and considers what sort of burden is experienced and what drives the level of burden. If we can develop a shared understanding of these issues then we will have a better chance of identifying ways in which burden can be reduced without compromising the utility of the data for the data collecting bodies.

What does the model tell us?

The first draft of the model is now complete. I’ve been reflecting on the analysis and what lessons we can take from it. Based on this early work I have come to some interesting conclusions.

I define data burden as the activities beyond our business-as-usual (BAU) that are necessary to meet the data requirements of funders and regulators. My first conclusion therefore is that the approach and aspirations that each institution has around data will define that BAU level and therefore the point beyond which the burden begins.

If you’re doing a lot with data for yourself – if you have in-house reporting and analysis capabilities – then you will be building on better foundations and the external demands will feel less painful.

The analysis of activities across the lifecycle raises some interesting issues about the nature of burden. Many of the activities are a fixed-cost – not affected by the size or shape of the institution in any way.

Where the burden associated with an activity varies the most significant issue is the fit between the data models and algorithms used by data collectors and the reality of activities and structures of the institution. This concept of fit is a significant issue in both making the data returns and subsequent engagement with the reconciliations and the regulatory metrics.

These conclusions suggest a third conclusion: for any given data collection, the experience of data burden can vary significantly between institutions and it feels as though there is a significantly random element to this. There is very little to link the size of the institution with the amount of burden experienced so, in general, smaller institutions experience a disproportionate level of burden.

My next observation is that the increasing emphasis on regulatory metrics has driven an expansion in data burden from one that is focussed on making data returns to one that now includes understanding and interpreting the complex algorithms that are used in funding and regulation. Institutions are expected to consume and understand hundreds of pages of technical specifications in order to interpret how these metrics relate to their reality.

At UCEM we have a group of students who are simultaneously classified as part-time students in the OfS funding algorithms and full-time in the OfS B3 metrics. The inevitable joke about Schrödinger’s students is wearing pretty thin as the team wade through this algorithmic treacle.

The big burden of duplication

I’m struck by the extent to which much of the thinking around burden focuses on individual data collectors. Indeed this model sets out an approach for assessing burden in terms of the relationship with an individual funder or regulator.

Thinking at this level has the potential to achieve marginal reductions in burden across data collections, but to achieve more significant benefits we need to focus on eliminating the duplication that exists between collections rather than the burden within collections.

So far this model represents a distillation of my thoughts and experiences. I’m sure it’s not as good as it can be. So my request to you is to look at the model and think about data burden – talk about it with colleagues at your institution and across the sector – and share your knowledge to help us build a better model of what drives data burden and what we can do about it.

In a sector that is sometimes overwhelmed with consultations I would like to consider this more as a conversation.

I’ve published the data burden model on my website. Please help us improve it.