In HE data, everything is an edge case

All kinds of data can have value, but how valuable is the messiness of internal data compared to the apparent simplicity of HESA data? And which data should you use for which task? Alex Leigh explains.

This article is more than 6 years old

by Alex Leigh

Comment

26/09/19

Alex Leigh

Consultant

by David Kernohan

staff

11/07/19

Alex Leigh has worked with over thirty UK universities, most of the sector agencies including UCAS, HESA and the QAA and a host of practitioners in the HE sector. Alex designed and developed the HEDIIP data capability framework, led the team to create the HESA in-year collection model, designed the sector level governance for reporting … Continued

What is tactical data?

That case is further compromised by the blurred lines between tactical, strategic and regulatory data. Ask three people which of these is most important and expect at least six different answers! One maxim does hold true though “the higher up the management chain you are, the more limited your exposure is to tactical data”. This is a problem we’ll return to another day, but first let’s discuss a dataset almost everyone at least knows exists.

That is the HESA student dataset. A finely crafted data lens reflecting the performance of the university to external scrutiny. It’s held up as the gold standard for data about students, the courses they are on and their journey through the educational process. And that’s hard to argue with. What it isn’t is a true reflection of the student data any university acquires, transforms, compares and – occasionally – deletes.

Logically there should be a simple distinction between internal and external data. Physically the HE sector is way too complicated for that. The premise to populate the Data Futures model was at least partially predicated on business events triggering data events. So a student registration would create data that would be “good enough” for any onward use. This is the utility paradigm.

It pains me to say – as one of the architects of that supposition – it is at least part fallacy. While other factors have driven the model further away from its original goals (pause now for a moment to respect their untimely demise), the idea that internal and external comparability was merely a data management step forward has proven unfounded.

What data should I be using?

The world is too messy, and the universities vary so much in terms of capability and culture around data. It is a case of one size fitting none. Hence the conflict between an expensively assembled and trusted dataset representing a very skewed model of what a university does, against a myriad and confusing plethora of internal data sources at an anecdotal level of data quality.

This drives behaviour on “which is the right data for us to be using?” The answer is far more nuanced. It depends on what you’re trying to do. Fitness for purpose is a proxy for data quality. 100% completeness in a field is meaningless if you’re bending that data so far out of shape that any quality tolerances are breached. We all know if you torture data long enough, it’ll tell you anything.

So, the only way to determine where to spend your “data budget” is to classify data for its importance inside and outside of the university. A simple approach is to establish clarity on what data needs to be able to do and why it can’t do that now. This is the value question.

What data should I be spending money to improve?

Deriving any kind of value for certain data naturally creates a priority to treat that data differently. To take greater care of it. That might be to create soaring analytical insights from multiple data streams laundered through complex algorithms. Or it might be saving money. Or staff time. Or improving the student experience. Or reducing risk.

Your value judgement – however derived – forces a choice of one of those. Okay maybe two. But not all of them. Now you can put in a programme to apply special treatment to only the data which firmly relates to these priorities. That’s where the business case for professionalising data is. The option is to consider everything as an edge case; you can’t use HESA for this because we don’t think of the data in that way. Or you can’t use planning data as it doesn’t have some attribute.

Start with “what’s good on the spacecraft”. Work out what you need first. Go after that. Don’t believe one dataset can rule them all or some datasets are inherently better than others’. You need to put the data to work, not wait for a shiny canned version to somehow answer all the questions you have now or might have in the future.

That’s not a realistic view of data. What is realistic is how to bring this together. It absolutely is about making institutional value judgements on the materiality of data. Then following those through with culture, governance and technology. In that order.

Some might call that a data strategy. But that’s a whole other article.