This article is more than 4 years old

Access and participation data unpacked

More than a gigabyte of comma-separated variables wasn't enough to stop David Kernohan visualising OfS access and participation data.
This article is more than 4 years old

David Kernohan is Deputy Editor of Wonkhe

It shouldn’t, on reflection, be this difficult to understand the ways in which the experience of students with particular characteristics differ from the average throughout the student lifecycle.

The screams of agony from sector data wonks had alerted me to the chaos around the Office for Students Access and Participation dataset – the reality of digging into more than a gigabyte of data made the problem more apparent.

The issue was not with the content of the data itself, it was with the way it was organised. Presentationally, the aim had been to let a user view either sector-wide statistics, or data specifically relating to a particular institution. These are reasonable choices, but this decision made it harder to compare one institution with another.

There’s been enough ink spilled around the curse of the league table to make me see this as refreshing and positive in certain moods. But I need to be clear that there are sound reasons for wanting to make comparisons. Many institutional planners have their own “benchmarking groups” – choosing institutions with a certain set of similarities with their own – that are used for understanding where strengths and weaknesses lie, and to identify possible interventions that have worked for others in similar situations and may work for you.

Often you need a table or other visualisation to help you find these comparators. So that’s what I set out to build. It took me a while. I strongly advise you to use the full screen link here – for tableau users I’ve enabled the ability for you to download the workbook. Do drop me an email with any questions, or if you’ve fixed it to make it work better!

[full screen]

Getting your head around the data

The first thing to bear in mind is the sheer variety of what is available. There is data enough to answer all kinds of questions around institutional differences – the decision to release the data as a single lump makes it harder to understand and harder to work with.

To start with the basics – there are four “life cycle stages”:

  • Access – students entering the institution
  • Continuation – students passing from the first year to the second year of study at an institution
  • Attainment – students graduating with a first class or 2:1 degree
  • Progression – students in employment or further study 6 months after graduation

For each of these you can look at full time (and apprenticeship) students or part-time students, and a choice of possible course types (all undergraduate, first degree, other undergraduate, undergraduate with postgraduate).

If we start with the (single) options on my tableau, you can choose a split type (broad category of characteristics) and split (precise characteristic) using the menus at the top.

The characteristics range from the expected – sex, ethnicity, the newly-controversial POLAR, disability marker – through to some lesser-seen options – disability type, index of multiple deprivation, free school meals. And then there’s some intersectional variables – IMD or POLAR against ethnicity or sex. Not the ethnicity and sex one that we’ve all been asking for, alas.

You then need to check my least favourite part of this data – the measure detail filter. Depending on the life cycle stage you’ll need to select a different measure –

  • For Access – chose “proportion”
  • For Attainment – choose “attainment rate”
  • For Continuation – choose “continuation rate”
  • For Progression – choose “progression rate”

This is daft to me because each of these are effectively the same calculation – the numerator (the number with the stated characteristic) divided by the denominator (the number of students in the whole population under consideration). I could have hard-linked these but I wanted to leave the option to look at other underlying data for the super-keen. Really from a data design perspective I’d have liked to have seen these as separate columns so I could show the other data within the tool tip for the main value

For each of the dot graphs the plotted value is this rate for year 5, with the colour showing the difference between the value for that year and either year 4 or year 1. Darker dots mean a larger change – blue is an increase, orange is a decrease. The sector average for Year 5 is marked with a star. Of course – the year numbers mean different things for different measures:

  • Access indicators, Year 1 corresponds to 2013-14, and Year 5 to 2017-18
  • Continuation indicators
    • for full-time and all apprenticeship students, Year 1 corresponds to 2012-13, and Year 5 to 2016-17.
    • for part-time students, Year 1 corresponds to 2011-12, and Year 5 to 2015-16.
  • Attainment indicators, Year 1 corresponds to 2013-14, and Year 5 to 2017-18.
  • Progression indicators, Year 1 corresponds to 2012-13, and Year 5 to 2016-17

The all years graph is a bit of a mess if you look at all institutions, use the filters on the left to choose a few of interest. This is handy for looking at trends in the ratios. I’ve left in a sector average value for you to compare against.

Hard mode

If you want to look at the difference between the two groups within a given set of characteristics you’ll be wanting the “comparison” graphs. Though they look very similar to the “single” graphs, they tell quite different stories.

We’re looking at the percentage point gap between the ratios (as on the “single” graphs) for the characteristics set by split1 and split2. Not all combinations of splits work, and in the absence of much help in the official documentation I would suggest trial and error with the following caveats:

  • Generally, the larger group needs to be in split one, the smaller group needs to be in split two. The exception is (with depressing inevitability) that “male” goes in split one and “female” in split two
  • The only splits available for “access” compare the population of 18 year olds with a given characteristic with 18 year olds at the institution with the same characteristic.
  • If you select N/A it’s as if you are looking at the “single” graphs, and you will need to change the measure detail as appropriate.

What this tells you is the size of the gap between split 1 and split 2 – a positive value means that the ratio for split 1 is higher than split 2, a negative value means the opposite. Again I’ve used colour to show the differences across years for the dot graph, and because we get information on statistical significance for these comparisons I’ve used shapes to denote where a difference is significant.

This flag also means I can pull together the (surprisingly few) instances across the sector where there is a significant difference between the experiences of students with differing characteristics at a given institution.

My definition is this

OfS provides their own guidance for using the data and the dashboards they have created. Alongside a detailed set of field definitions, we also enjoy delights like an explanation of the correct use of filters in Microsoft Excel – “we recommend that users highlight the header row and choose filter from the data menu in MS Excel to add a filter to every column. Each column can then be filtered as required.”

The idea that someone new to Excel might be expected to work with this data (even the set for a single institution is around 5 megabytes) is something that concerns me greatly.

We also get a walk-through the OfS tableau dashboards. As above, these set up to let you see either sector in aggregate or an individual institution’s data – making it harder to make comparisons.

6 responses to “Access and participation data unpacked

  1. Great work David.
    FYI though, POLAR data has always been controversial because the aggregation of postcodes at ward level renders the data unsuitable for individual level differentiation. And to be fair to OfS they may have been trying to avoid benchmarking for good policy reasons because benchmark targets can lead to ‘coasting’ by institutions that compare themselves only to similar institutions’ performance data (norm referencing) rather than against an absolute (criterion referencing). This can lead to institutions raising entry requirements in order to move up the league tables in the safe knowledge any consequent reduction in e.g. lower income household or BAME student numbers won’t hurt their reputation so long as they stay ‘within benchmark’. This actually happens at institutional planning level.
    See
    Taylor, C. and McCaig, C. (2014) Evaluating the impact of number controls, choice and competition: an analysis of the student profile and the student learning environment in the new higher education landscape, Higher Education Academy, York, August 2014. https://www.heacademy.ac.uk/sites/default/files/resources/impact_of_number_controls_choice_and_competition.pdf
    and
    McCaig, C & Taylor, CA (2017) The strange death of Number Controls in England: paradoxical adventures in higher education market making. Studies in Higher Education, Volume 42, 2017 – Issue 9, pp. 1641-1654. Published online 7th December 2015 http://dx.doi.org/10.1080/03075079.2015.1113952

  2. Hi David
    If you go into Attainment, Splittype Disability, with Split1 options of “Disability”, “NoKnownDisability”, and “All”…
    …then the “All” option adds together the percentages for each subset, giving overall attainment rates on a scale up to 200%, instead of showing the rate for the population as a whole.
    Not sure if that’s something you can fix in Tableau.

  3. A great piece David. Thank you. It just feels that data is all too often produced for measuring, weighing and judging rather than to show patterns in the data that can be used to genuinely highlight issues and improve the experience of all stakeholders.

  4. Hi David. I love pivot tables as much as the next data nerd, so when I first saw the dataset I was a little aggrieved that the data were in a format which required playing with the filters in the data sheet itself rather than a separate pivot table. However, having had a play with the filters and reading the instructions I actually think the way the data are organised is actually easier for a non data specialist to manipulate. For example, you can easily amalgamate a number of your ‘competitor’ institutions into one spreadsheet and compare you own institution with the sector average and individual competitors. Once you get the hang of what the ‘indicators’ you’re looking for are (the 3 worked examples in the user guide are helpful in this regard) it’s actually quite intuitive in my view. More importantly, pivot tables and/or advanced knowledge of data analysis are not required.

Leave a Reply