Big data is a useful way of gaining new insights through automation but – when dealing with matters that have real world consequences – it is no substitute for analysis and assessment by experienced professionals.
When dealing with social sciences the truth is that the data, like the real world which it represents, is messy and fallible. Unlike the physical and experimental sciences, I would argue it is much harder and therefore expensive to quality assure such data.
In social science, there are fewer opportunities to engineer data accuracy because you can’t engineer people’s behaviour in the same way that you can control every aspect of a scientific experiment or, for example, monitor in extraordinary detail the performance of a formula 1 racing car.
Conservation of complexity
Like conservation of energy in physics, I think there is a general law of conservation of complexity and by proxy, regulation – I have repeatedly seen simplification or deregulation in one part of the system only to result in increased complexity or regulation in another. In all areas of Government one can find examples where the promise of reduced complexity and regulation was not fulfilled – for example, does anyone believe that universal credit is any less complex or burdensome than the 6 social security benefits it replaced? The argument that 6 became 1 is at best superficial and at worst disingenuous.
During the Bill stages of the Higher Education and Research Act Jo Johnson and others promised that regulatory and data burden would be reduced. A promise reinforced in the legislation which places specific obligations on OfS in relation to data burden. It is certainly true that some regulations, mostly concerned removing barriers to sector entry for alternative providers were relaxed. But even this spawned a panoply of new regulatory processes designed to deal with the unintended and undesirable consequences.
The narrative surrounding burden is not straightforward. It is one of those words that is open to interpretation. When used improperly it creates unrealistic expectations and ultimately disappointment. Such misunderstandings whether deliberate or not creates an unhelpful climate of dissatisfaction and mistrust. I prefer to distinguish between regulatory overhead (which is necessary), and unnecessary regulatory burden which can of course be dispensed with.
Minimising the overhead
One can never be complacent when it comes to minimising legitimate regulatory overhead, and thus avoiding the creation of unnecessary regulatory burden.
The problem I often encountered here was a well-meaning but often misplaced belief that increasing the volume and frequency with which data is collected is relatively low cost and will somehow lead to more robust conclusions and better decisions and outcomes. I speak as someone who has been closely involved in such matters ever since I joined the UFC in 1991.
It is a fact that during my time at HEFCE I was permanently surrounded by an insatiable appetite for ever more data…the only thing that changed over time was the level of ambition. The emergence of social media and ‘Big Data’ boosted those aspirations beyond my imagination. In 1992 there was about 15 staff in Analytical services at HEFCE. By 2019 staffing had grown five-fold to over 70.
I think deregulation is for the most part it is a mirage because what often happens in practice is that one form of regulation is replaced by another.
More often, less used
The frequency of data collection also has a long history. The idea that universities’ student record systems should all be linked in real time and a central agency should then be able to observe how the universities were performing in real time was first mooted in the 1980s with the MAC initiative. MAC sought to integrate staff, student and finance record management systems across what became to be known as the old universities.
Not before many millions were spent lining the pockets of the big consulting firm that was engaged (I forget which), the MAC initiative was abandoned around about the time I joined the UFC. Fast forward another eight years and a new Labour Government. The same idea resurfaced and was officially communicated to HEFCE by the DfE in the form of a letter to the Chief Executive.
He responded in terms that HEFCE had no requirement or indeed capacity to process real-time data and in any case, you don’t conduct or manage education just-in-time as though we were running Tesco.
Soon afterwards we carried out a review of the HESA data collections and established that nobody was actually using the in-year (partial) December student return and concluded that it was therefore an unnecessary burden and it was scrapped.
Fast forward another 17 years and the idea that somehow the availability of real-time data would result in better and more effective regulation was again mooted as a prescription for how the OfS might remotely exercise its regulatory functions and authority. In the current environment it is true that the current arrangements whereby HESA student data is only available some 15 months after the start of the academic year is untenable.
Looking to the futures
In 2016, I, made HEFCE’s position clear – there was no requirement for student data beyond termly submissions timed to coincide with the arrangements for paying instalments paid by the Student Loans Company. To the best of my knowledge, this remains the official position of the OfS.
HESA data futures was supposed to come on-stream in 2018 but failed to do so…the plan now I believe is 2022. My diagnosis of why this, like so many other IT projects have failed to deliver on time and on budget is that those in charge of the project fail to fully appreciate the additional complexity created when developing a generalised solution to a much simpler, specific problem.
In this instance I believe it was a decision that the system should enable high-frequency updating of items of data in close to real-time. I can’t say with any certainty why HESA felt it necessary to go down this path, but I’m guessing that a desire to address this hunger for ever more data coupled with a desire to future-proof the system probably played a part. In other words, the ambitions for the project went far beyond what was actually required and sought to address a much bigger problem that had not yet materialised.
I would maintain that the volume and frequency of data collected must be commensurate with the purposes for which it is being collected.
Higher frequency or more granular data beyond that which is required to adequately operate the system does not necessarily result in better decision making or improved performance. Marginal improvements must be weighed against the marginal costs, including opportunity costs. The keyword here is adequately – perceived adequacy is in the eye of the beholder and subject to change over time. A settled view amongst the stakeholders of what adequacy represents is what I believe we should strive to achieve before imposing new data requirements.
This article is adapted from an address given at the University of Huddersfield Festival of HE Data.