The imp in the machine

“As a venerable professor of chemistry once said to me … ‘I do not know why you want a department of English Language; I know English, but I also know some chemistry.'” – JRR Tolkien

Data is the new oil, or so we’re told, the new sexy, and the new black. Yes, I scoff at headlines like this too, and until two years ago, you would have been forgiven for rolling your eyes every time someone referred to data as an asset – a very expensive one at that.

Yes, data is big, but hasn’t it always been so? In fact, didn’t it used to be bigger, like, room-loads big? Yes, the data has been there: unstructured, untidy, and unusable, gathering dust in the basements of governments and universities alike. And it’s true, Big Brother has always been watching; the difference is that, as the Cambridge Analytica files would show, Big Brother now has the ability to understand.

We heart data

A few months ago, my sister asked what it is exactly that I do. Half an hour of misery and sweat later, as I tried to explain even a small facet of my job while desperately avoiding the words “information” and “technology”, she was none the wiser (she proceeded to finish me off with “so, who owns the university?” but let’s not get into that). This was by no means the first time that I’d almost had that conversation.

Even those that consume HE data don’t always know what my job entails. I still twitch at the memory of a conversation with a then head of admissions (when a data and process quality Officer myself), telling me how he planned to advertise for a data analyst, feeling the need to add “but not like you – the other kind”. You know, the real kind.

Perhaps the most painful example comes from the JACS description of data management, migrated unceremoniously word for word into HECoS: “The study of the management of computer systems which capture, process and transmit data”. Et tu, HESA?

The bottom line is that everyone who thinks of data at all sees it either as an IT discipline, or something to do with informative colourful dashboards. Everything in-between is to be avoided in polite conversation at all costs. There’s an actual term for this unspeakable middle: it’s called data wrangling. I agree, it doesn’t sound very elegant. In fact, if you google “data wrangling” you’ll come up with two types of results: IT solutions to avoid you doing it and defeated statements by those who acknowledge its necessary evil.

What’s in a name?

But the term is not entirely accurate in the context of statutory HE reporting. It does nothing to acknowledge the breadth of knowledge and skill required to deliver a dataset that fuels league tables, finances, metrics, surveys, forecasts, KPIs, and national statistics. So why do we overlook this role so much? The short answer is that we don’t understand it. Even the people who do the role have difficulty defining it, something reflected in the numerous different job titles that often contain the word data in a somewhat apologetic tone. After all, it’s not like we’re data scientists – is it?

Part of this oversight seems to be the collective expectation that these roles, like those of oil diggers perhaps, will eventually become automated and redundant. Where this is not already the case, ROI wisdom goes, it’s because we haven’t developed our systems and processes enough. After all, this isn’t a real job, just a product of bad record management. HESA returns shouldn’t be that time consuming, surely? I mean, can’t we just press a button?

As the fumes cleared, he began to speculate seriously as to how the iconograph worked. Even a failed wizard knew that some substances were sensitive to light. Perhaps the glass plates were treated by some arcane process that froze the light that passed through them?

A hitherto unnoticed door opened in front of his eyes. A small, green and hideously warty humanoid figure leaned out, pointed a colour encrusted palette in one clawed hand, and screamed at him.

‘No pink! See?'” – Terry Pratchett, The Colour of Magic

It feels well embedded now, but it was only 2007-08 when HESA’s last overhaul of the student return introduced the need for a new type of role in central admin. The term “HESA compiler” is still popular, but it’s out of date; the “HESA person” (for lack of a better term), does not just compile – they beat the student return into submission.

But who exactly are these people? The skill set involved was often found in existing student systems teams, but rarely did the HESA person have an IT background; for some inexplicable reason, a lot were language or literature graduates, or at least had a strong love for the arts. They shared a flair for problem solving, took great pride in their jobs, and experienced an immense sense of satisfaction at producing an error-free, robust data file. What the data ended up saying about the university was almost immaterial – the important thing was that the data was right.

If you are lucky (or unlucky, depending on your take on value), you’ll have a small team of those people, happily data-validating away, looking extremely stressed around September. They will generally be professionally liked by nobody because they always ask for things at the wrong time of the year, don’t seem to understand that not everything fits into a drop-down menu, and appear to produce no visible results for students or management.

The hidden factory

Fast forward to 2018-19. Let me make a bold generalisation here and say that if you’re not scared by Data Futures, you probably don’t understand your own institution. Can you take a stab at how many hidden data factories you have? Do you know what a hidden data factory is? If you answered yes to the first one, please call me, I need your insight. If you answered no to either, then allow me.

The excellent term was coined by Thomas C. Redman and it’s best explained in the context of HE as follows: if you need to transfer data onto a spreadsheet to perform a task, adding columns, changing values, making notes, highlighting rows, then congratulations, you have a hidden data factory. The data won’t be fit for purpose until it undergoes your transformation.

Now consider how many processes you run off of spreadsheets that you modify, how much data you sent externally that need translating on to a given format, how many feeds between systems don’t quite work until you do something to link them, how many free-text forms get typed into systems, how much data gets transferred from one virtual place into another that needs some kind of human intervention. Can you take a stab at a number now?

Data Futures demands that we shed light on all these processes if we are ever going to make the new data landscape work. We all understand what needs to happen – in theory. The reality is that unless we acknowledge our hidden data factories, their reasons for existing, and our reliance on the people who keep them, and by extension the university, working, then we’re going to soon run out of paint.

WWHD?

Lets talk about those HESA people again – the data quality, statutory reporting, or external returns Officers. They may not have a good name for themselves, but they do know that the word data belongs to them, even if everyone else tells them otherwise. They are data wranglers, but also business analysts, software developers, policy and process experts and, most importantly, extremely good at translating the soft world into hard data. But they have something else really important going for them: the good fortune of learning from an incredibly efficient and customer-centric data collection agency all these years.

Every records officer at an institution may not know it, but the word data belongs to them too. In the many talks about data assurance with our wonderful records administrators over at Teesside, the questions addressed to me were starting to sound very familiar: what’s an acceptable tolerance, what’s the deadline, what if something is correct but comes up as an error, will your staff be able to manage if we get them all the queries near the deadline, can we have some clearer guidance? And then it hit me: am I HESA in this conversation? How did this happen and what does it mean?

We need to acknowledge the data expertise we have in our universities, give it a name, make it a strength, and use it to propel ourselves forward. We shouldn’t deny the talents of our own staff because we don’t think they fit the perfect-process narrative. Data management should be in every administrator’s job description, not because we want to force it on them, but because it’s already what they do. They just don’t know what to call it. And as HESA people, we need to let them know. It’s a skill set, not a burden.

If you want to know more about what student records and data officers in universities do, follow SROC’s twitter takeover series “A week in the life of…‘.

11 responses to “The imp in the machine

  1. Funny and excellent. Like one of those 10pm professional services pub conversations, but with better grammar.

  2. I wouldn’t survive a day without our data imp – and she is highly valued and much loved across the whole University.

  3. Absolutely fantastic, hits the nail on the head and made me smile. One of the biggest data factories is the HESES return and will remain so even when data futures kicks in.

Leave a Reply