This article is more than 4 years old

Data delays shouldn’t mean data disorganisation

Alex Leigh hopes that delays to Data Futures don't set back the cause of better data collection.
This article is more than 4 years old

Alex Leigh has worked with over thirty UK universities, most of the sector agencies including UCAS, HESA and the QAA and a host of practitioners in the HE sector. Alex designed and developed the HEDIIP data capability framework, led the team to create the HESA in-year collection model, designed the sector level governance for reporting … Continued

Earlier this week we learned of another delay to HESA’s platform for in year reporting.

The headline is Data Futures will not be live until at least 2021/22. The subtext is that the HEDIIP vision for a high utility collection landscape is pretty much dead. That’s is a whole other discussion which lacks the urgency of “Where to reallocate those precious staff allocated to DF projects?”

But that’s the wrong question.

Most of the activities to support in-year collection are focussed on initially driving up the quality of – primarily – student data, and maintaining well understood hygiene factors to negate the need for a long Quality Assurance tail. That is both necessary to meet the collection timelines, and somewhat peripherally within touching distance of best practice data management.

That’s important because right now we’re nowhere near best practice.

Getting there

Many – if not most – universities trigger their HESA student return processes in the Spring. While there are many differences in how operational data is cleaned, massaged, algorithmically swept and beatified to create a “data double” of the institution, certain conventions hold.

The source data is not fit for purpose. The quality attributes – completeness, accuracy, validity, and so on – all need work and lots of it. While certain key attributes tend to be prioritised, this is just the tip of a spear requiring months and months of work from skilled colleagues to create a data set considered good enough for external validation.

What differentiates institutions is the value of that dataset post submission to HESA. Some institutions treat it as a trusted (if somewhat tarnished by entropy) source of student data, others submit it and forget it. What’s missing are all the other uses for that data that have nothing to do with HESA at all, but would benefit significantly from the same quality efforts.

That’s where we came in. The mantra to clean data at the point of entry is often well understood. The reason it isn’t done less so. A representative example may help here. Let’s consider entry qualifications. Clearly these are incredibly important to admissions staff to ensure offers are made to the right prospective students in line with academic and recruitment targets. That same data is used to derive tariff, to measure course performance, to inform tactical and strategic planning along with a host of other scenarios.

Data reuse

And yet our primary use fixes the data quality threshold. If it’s good enough for admissions, it’s good enough. Other use cases are someone else’s problem. It’s not like people don’t care – I don’t believe colleagues get out of bed aiming to make a fellow staff members’ life a misery as they comb individual entry qualification records for weeks on end – but they are incentivised on what their team needs to do.

This is nuts. It really is. Collecting non-admissions qualifications that are tariff affecting is not a huge undertaking. Not compared to contacting individual students six months later which is both extremely costly in time, and not great in terms of the student experience. This though is often standard practice.

Data Futures was changing that. It introduced the concept of Data Governance. Appropriate governance around data is predicated on understanding the entirety of need across the university. It is based on strong principles of considering data as an institution wide asset, managing it actively in support of short and medium term goals, of sharing it where appropriate, and comparing it with other datasets.

Further it crucially defines everyone who touches that data is a steward of it. The minimum compliance with that role is ‘don’t make it worse for anyone else’. The outcome of this is data utility – making less data work harder. It reduces rekeying, it increases data integration, it starts to break data out of its silos.

Sure it doesn’t eradicate the spreadsheet culture, but it starts to make the case to how that might eventually happen. It stops conversations about whose data is right, and starts discussions around what the data means. It provides accountability so staff can actually find data rather than creating yet another copy.

Doing it right or doing it twice

Universities are already managing their data. But primarily in silo. The wasted effort and poor outcomes of this are evidenced in the life of staff and students every day. When I hear concerns that ‘Yes but good data is really expensive’ I respond with ‘Do you know the cost of bad data?’

If the delay to Data Futures retrenches the silo mentality, what we are essentially saying is “having higher quality data that the whole university can trust and use isn’t important to us”. This is really a proxy for cultural blindness around data.

Too many universities approach to the management of their data asset appears to be “we don’t have time to do it right, but we do have time to do it twice”. Yet, whether it’s process improvement, better decision making, improved insights, richer scenario planning or learning analytics – understanding and managing our data as an asset in the same class as finance, estates and staff is foundational.

This is not easy. If it was, we’d already have done it. Data Futures gave us that reason to start professionalising our approach to data. The delay does not negate the central assertion that quality data is woven into the success of the institution.

The question is not “What should we do instead?”, but “how can we do more?”

2 responses to “Data delays shouldn’t mean data disorganisation

  1. You ask the rhetorical question about the cost of bad data, I’m tempted to reply that there is no cost. Obviously that’s wrong, but if I rephrase the question as “Is there a saving to be gained through good data?” then I suggest we get closer to the challenge.

    Cleaning poor data, rekeying, validating, manipulating data is a high cost for universities (at least every uni that I’ve seen), but it is also embedded as small parts of many jobs, which makes saving FTEs very difficult, it’s effectively a hidden cost that’s difficult to recover (I’m ignoring efficiency benefits here). Whereas getting great data means (re)joining systems together, potentially investments into data warehouses, retraining staff etc. which are very visible actual costs, without even considering all the change management.

    Do we need to do it? Yes! Is the business case easy to make? Not so far… (feel free to post a link to a great business case, I’m happy to learn.

  2. I know what you mean – leaving data as it is is allegedly free. The business case only makes sense if you value the benefits – for example, to decision-making – that good data brings. If you can’t present these benefits you don’t have a business case. All data can be assigned a value based on its importance to the organisation. There’s no sense in improving the quality of data with zero value, because that would be wasted money.

    I think we can learn from lean manufacturing – ultimately, quality reduces cost. And cleaning the ‘inventory’ of poor data, for which you outline the costs, is a separate exercise from fixing processes so that poor-quality data doesn’t enter the system. I’m not certain the solutions are systems or warehouses – data quality is a people and process issue.

Leave a Reply