Standard Occupational Classification (SOC) coding of graduate outcomes data to group analogous employment destinations can prove problematic.
Particularly when the position of granular unit groups within a wider framework is used to differentiate professional or “graduate level” roles in an attempt to measure quality.
Providers who were detached from this process as part of the centralised coding adopted in Graduate Outcomes have both a pertinent perspective on these issues and I would argue a renewed role in helping address them.
While HEIs were excited to get their hands on second-year survey data sets, the absence of yet-to-be determined weighting and sector statistics to provide context to results currently limits scope for analysis – and it was the Higher Education Statistics Agency’s (HESA) other Graduate Outcomes publications (released at the end of April) which really got my attention.
HESA produced two reports on SOC coding – a summary of results for this year’s quality assurance (QA) activities, and an independent analysis of coding involving experts at the Office for National Statistics (ONS). HESA headlines were positive but the details behind these won’t sit well with some readers.
In a laudable part of the Graduate Outcomes survey’s QA processes, providers get to review provisional coding and highlight where they feel there are errors for HESA and its contracted coding agency, Oblong, to review and adjust before final data delivery.
But what happens when a provider spots something they think is amiss? Well, likely not a lot actually. We learn from the first report that of more than 3,000 claimed errors, only about 5 per cent were referred on to Oblong, with the rest dismissed as “non-systemic” or “non-actionable” by HESA.
Pairs of problems
Providers have two main grievances here. One is that HESA won’t consider correcting coding where there are a relatively small number of “non-systemic” errors reported for any single occupational code. This is infuriating, and not just because there are some howlers in the coding crying out for correction.
It seems absurd to take a view that errors in the data set are OK if there aren’t enough similar ones, particularly when relatively few providers contributed to feedback. These count towards results, and they mount up.
The other issue is around the subjective nature of coding and “non-actionable” errors. I won’t get stuck into the “what constitutes a graduate role?” debate here (although within the context of Graduate Outcomes data, it’s one that’s still much needed) but raise the question of who gets to decide an error.
Allowing universities carte blanche here could invite old problems of DLHE’s (the Destinations of Leavers from Higher Education – Graduate Outcomes’ predecessor survey) creative coding to maximise metrics and there will be instances of this in the feedback which HESA rightly guards against.
But should it be up to HESA or a coding contractor to have exclusive say? Arguably not, because neither are able to bring the same level of diligence to the exercise that providers can to decide disputed cases.
Part of this is down to a disparity in expertise that informs assessment of appropriate coding and, let’s be honest, institutions can draw upon a depth of knowledge of graduate career pathways within specific industries that HESA and coding agency staff cannot.
It is also about interpretation of survey data which, despite the wealth of relevant information collected, in HESA or Oblong’s case relies too strongly on the rigid application of a coding frame to job titles to determine which SOC to use.
Those attuned to these difficulties will have taken great interest in HESA’s second report, which claims “strong agreement” in coding done by Oblong and the ONS and “substantial confidence in the reliability of codes delivered to the higher education sector”.
Providers will remain unassured though. The sample size of just 2,400 records doesn’t seem particularly robust to test the myriad job roles reported in the UK’s largest social survey and neither do the strong agreement statistics look good enough when so much rests on accuracy here.
While no one can question the ONS team’s knowledge of the coding framework and application, like HESA and Oblong they’re likely to be hampered by limited knowledge of graduate career pathways and the system itself.
ONS’ rules for coding job titles give useful direction for processing vast volumes of data in a seemingly objective manner but, followed doggedly, can get in the way of good decision-making when things get complicated.
Consider the role of word order highlighted in HESA’s first report – generally prefixes but not subsequent words to a job title keyword are ignored. So what significant evidence and experience may suggest is an entry-level graduate role will likely be coded within the non-graduate SOC groups, not just because it has the word “assistant” in the title but because it’s in the wrong place.
We see some much welcome flexibility from HESA in refining the process following provider feedback so that, for example, “Teaching Assistant” can be coded as a professional-level occupation where more advanced activities are reported in job duties. But progress has been painfully limited and without admitting both nuanced understanding of roles and room to apply this more widely to coding or error checking, the process will inevitably churn out spurious results.
Let’s not forget how important this is. Destinations coded as “graduate-level” are judged as positive outcomes for students and, crucially in the context of this discussion, for providers too. These help a subject area stay on the right side of OfS quality baselines as well as generate statistics and league table rankings that support recruitment – plus there’s the ever-present prospect of them being linked to higher funding allocations and fee levels.
Non-graduate roles don’t and if vice chancellors or the OfS start asking awkward questions around performance, it’s scant consolation to know that erroneous employment outcomes in the data haven’t been widely reported or are correct according to flawed assessment.
Graduate Outcomes understandably sought objectivity by distancing providers from data coding but the current centralised process creates tensions, not harmony, between objective and accurate data in some areas. I’d like to press the case for easing them by taking advantage of providers’ ability to make highly informed judgements about destinations and involve them more, not less, in review and QA activities.
There are obvious steps for HESA to consider here: such as enlisting provider representatives to review anonymous samples of claimed errors, and repeating the ONS comparison exercise with a bigger sample and provider participation. These moves could help bring additional resources to explore the non-systemic errors and evidence of increased collaboration may encourage more feedback to capture systemic issues, something which the sharp decline in provider engagement (down from 90 to 55 this year) indicates is much needed.
Views about coding will inevitably differ but let’s bring these into an honest debate about where, why and how to move forward in adjusting data, Oblong’s processes, and interpretation of ONS guidance. This isn’t about finding fault with any party but collaborating to ensure that results are as accurate as possible. SOC coding generates some thorny problems but they’re too important not to try hard to grasp.