Bigger and faster data does not always make for better decisions

Big data is a useful way of gaining new insights through automation but – when dealing with matters that have real world consequences – it is no substitute for analysis and assessment by experienced professionals.

When dealing with social sciences the truth is that the data, like the real world which it represents, is messy and fallible. Unlike the physical and experimental sciences, I would argue it is much harder and therefore expensive to quality assure such data.

In social science, there are fewer opportunities to engineer data accuracy because you can’t engineer people’s behaviour in the same way that you can control every aspect of a scientific experiment or, for example, monitor in extraordinary detail the performance of a formula 1 racing car.

Conservation of complexity

Like conservation of energy in physics, I think there is a general law of conservation of complexity and by proxy, regulation – I have repeatedly seen simplification or deregulation in one part of the system only to result in increased complexity or regulation in another. In all areas of Government one can find examples where the promise of reduced complexity and regulation was not fulfilled – for example, does anyone believe that universal credit is any less complex or burdensome than the 6 social security benefits it replaced? The argument that 6 became 1 is at best superficial and at worst disingenuous.

During the Bill stages of the Higher Education and Research Act Jo Johnson and others promised that regulatory and data burden would be reduced. A promise reinforced in the legislation which places specific obligations on OfS in relation to data burden. It is certainly true that some regulations, mostly concerned removing barriers to sector entry for alternative providers were relaxed. But even this spawned a panoply of new regulatory processes designed to deal with the unintended and undesirable consequences.

The narrative surrounding burden is not straightforward. It is one of those words that is open to interpretation. When used improperly it creates unrealistic expectations and ultimately disappointment. Such misunderstandings whether deliberate or not creates an unhelpful climate of dissatisfaction and mistrust. I prefer to distinguish between regulatory overhead (which is necessary), and unnecessary regulatory burden which can of course be dispensed with.

Minimising the overhead

One can never be complacent when it comes to minimising legitimate regulatory overhead, and thus avoiding the creation of unnecessary regulatory burden.

The problem I often encountered here was a well-meaning but often misplaced belief that increasing the volume and frequency with which data is collected is relatively low cost and will somehow lead to more robust conclusions and better decisions and outcomes. I speak as someone who has been closely involved in such matters ever since I joined the UFC in 1991.

It is a fact that during my time at HEFCE I was permanently surrounded by an insatiable appetite for ever more data…the only thing that changed over time was the level of ambition. The emergence of social media and ‘Big Data’ boosted those aspirations beyond my imagination. In 1992 there was about 15 staff in Analytical services at HEFCE. By 2019 staffing had grown five-fold to over 70.

I think deregulation is for the most part it is a mirage because what often happens in practice is that one form of regulation is replaced by another.

More often, less used

The frequency of data collection also has a long history. The idea that universities’ student record systems should all be linked in real time and a central agency should then be able to observe how the universities were performing in real time was first mooted in the 1980s with the MAC initiative. MAC sought to integrate staff, student and finance record management systems across what became to be known as the old universities.

Not before many millions were spent lining the pockets of the big consulting firm that was engaged (I forget which), the MAC initiative was abandoned around about the time I joined the UFC. Fast forward another eight years and a new Labour Government. The same idea resurfaced and was officially communicated to HEFCE by the DfE in the form of a letter to the Chief Executive.

He responded in terms that HEFCE had no requirement or indeed capacity to process real-time data and in any case, you don’t conduct or manage education just-in-time as though we were running Tesco.

Soon afterwards we carried out a review of the HESA data collections and established that nobody was actually using the in-year (partial) December student return and concluded that it was therefore an unnecessary burden and it was scrapped.

Fast forward another 17 years and the idea that somehow the availability of real-time data would result in better and more effective regulation was again mooted as a prescription for how the OfS might remotely exercise its regulatory functions and authority. In the current environment it is true that the current arrangements whereby HESA student data is only available some 15 months after the start of the academic year is untenable.

Looking to the futures

In 2016, I, made HEFCE’s position clear – there was no requirement for student data beyond termly submissions timed to coincide with the arrangements for paying instalments paid by the Student Loans Company. To the best of my knowledge, this remains the official position of the OfS.

HESA data futures was supposed to come on-stream in 2018 but failed to do so…the plan now I believe is 2022. My diagnosis of why this, like so many other IT projects have failed to deliver on time and on budget is that those in charge of the project fail to fully appreciate the additional complexity created when developing a generalised solution to a much simpler, specific problem.

In this instance I believe it was a decision that the system should enable high-frequency updating of items of data in close to real-time. I can’t say with any certainty why HESA felt it necessary to go down this path, but I’m guessing that a desire to address this hunger for ever more data coupled with a desire to future-proof the system probably played a part. In other words, the ambitions for the project went far beyond what was actually required and sought to address a much bigger problem that had not yet materialised.

I would maintain that the volume and frequency of data collected must be commensurate with the purposes for which it is being collected.

Higher frequency or more granular data beyond that which is required to adequately operate the system does not necessarily result in better decision making or improved performance. Marginal improvements must be weighed against the marginal costs, including opportunity costs. The keyword here is adequately – perceived adequacy is in the eye of the beholder and subject to change over time. A settled view amongst the stakeholders of what adequacy represents is what I believe we should strive to achieve before imposing new data requirements.

This article is adapted from an address given at the University of Huddersfield Festival of HE Data.

5 responses to “Bigger and faster data does not always make for better decisions

  1. I like this article, however, I think it misses a key concept that you actually have to go through this experience in order to come to that conclusion. The problem we are faced with is that institutions are even dong the minimum of what is required in this article. If we all did, some would come to this conclusion while others would not. The most important outcome from this is to really determine if they are collecting the right data to do the right thing for the institution and students.

  2. Agreed this is an excellent article. Working at HESA during the design of Data Futures, my read was there was some future-proofing for sure but this was a once in a lifetime opportunity to ‘redesign the data landscape’ as laid out in 2011 ‘students at the heart of the system’ report. I genuinely believed we could reduce overall burden by driving up the utility of the data we collected.

    It feels that has been lost and now I’m not sure what goal is being chased.

  3. Excellent post Mario.

    The saga of the HESA December student return was not just about an internal review, it was primarily about government politics and approaches towards data .

    The short-lived HESA ‘new’ Management Committee circa 1995 at virtually its first meeting wanted to get rid of the return as burdensome and largely inaccurate, especially for institutions of ‘lifelong learning’ that had substantial ‘rolling’ enrolments (largely ‘part time’ and sub-degree in those days – we should remember that the largest number of students in terms of data records at that time came from the former PCFC sector, not the former UFC that Mario represented – monitoring of PCFC numbers was around May time and designed to coincide with an April-starting financial year).

    There were 8 institutional reps, representing 4 ‘interest’ areas (Planning, Student Records, Finance, Staffing – of which I was one of the reps for Planning from the former PCFC sector) and 7 representing ‘customers’ (the three funding councils and government education departments for E, S & W and the then one organisation for NI). There was unanimous support for getting rid of the December return including HEFCE … except for the then Education department in England which said that it ‘needed’ it to get early data on the API (Age Participation Index) to ‘inform ministers’ of progress towards API targets’ – despite the protestations that a preliminary figure could also be estimated from December’s HESES and UCAS aggregate data.

    The statutory requirement on HEIs to provide data to the Secretary of State when requested first appeared in the 1944 Education Act and was repeated in the 1992 Act, and it was this that was invoked to ensure that what was supposed to be an independent agency representing institutions became instead an agent of government policy; actually an agency of English Education department government policy rather than the UK. The December return dragged on being sent in until eventually the review Mario mentions finally concluded what we all knew all along, that the data was so rubbish as to be unusable.

    Governments of various political persuasions, however, failed to learn the lesson – that it needed to work with the sector rather than simply ‘demand’ more and more meaningless data.

    But the other lesson we can learn from this sorry saga is the well known social science concept of the ‘Hawthorne Effect’ – that the observation of data by someone in power produces changes in the way it is presented. There is no such thing as neutral data observation. (Philosophically speaking this also applies to the physical sciences in the form of the Uncertainty Principle – observation changes the object being observed – but is largely only of minor practical importance due to the effect being at the sub-nuclear/quantum level).

    It was certainly the case that when I was a collator of data for HESA, HEFCE, the NHS, the FE funding bodies, government education departments, and uncle tom cobbley and all, that one adapted behaviour in collecting and checking data based on acute awareness of how it would be used, often second-guessing the motives of the requests and extrapolating the potential uses.

  4. Excellent article and also agree with most of what you say Alex. The only thing I would say is that the DF model never seemed quite right, and then when changed it got worse. That may partly be the fault of the practitioners in the sector for not engaging sufficiently with it early enough. I remember Dan Cook getting quite frustrated at a HESA session over the queries around how courses were being set up, as no one had raised it as an issue before.

    The original HEDIIP vision is one that is still highly relevant but very hard to see how that gets anywhere with the present OFS set up. The current obsession with setting targets based on fairly dubious categorisations of students, is taking us ever further away from having actually a really sensible conversation about what, and how frequently, data should be available.

    And as for reducing the burden ………………….

  5. Mike – fascinating history. Only having spent about 10 years in HE, I’m constantly learning how we got here. It does all sound rather familiar though in terms of the same behaviours holding.

    David – I’d agree it wasn’t quite right. Some of that is the whole ‘one size fits none’ but I do believe it was a credible effort to deliver on the HEDIIP vision. That vision does not seem to be held by anyone anymore.

Leave a Reply