Jim is an Associate Editor at Wonkhe

On the same day that the Office for Students (OfS) published eye-catching information on unconditional offers, a sheaf of information about the National Collaborative Outreach Programme (NCOP) and a long document surveying the provider registration process, it quietly slipped out a supplementary document pertaining to that latter review – “Condition B3: baselines for student outcomes indicators”.

There’s an old Transport for London “Awareness Test” that comes to mind here – you’re busy counting a group of youths making rapid basketball passes, until the voiceover asks if you spotted the moonwalking bear that was moving across the screen all along. Once you see it, it’s so preposterous that you wonder how you ever missed it. Sorry to spoil it (it’s kind of our job at Wonkhe) but we think we might have a B3 bear right here.

A star is born

Condition B3 in OfS’ regulatory framework has so far proved controversial. It’s one of the quality ones, where the provider must “deliver successful outcomes for all of its students, which are recognised and valued by employers and/or enable further study”. In the initial registration phase it managed to generate 147 “interventions” – 50 formal letters, 77 lots of “enhanced monitoring”, and 20 specific (and public) conditions of registration – as well as playing a starring role in five of the six outright registration refusals that have so far been made public.

It has also popped up as the principal player in the ongoing court cases surrounding (the refused) Bloomsbury Institute and Barking and Dagenham College cases, and has been causing consternation across the sector because we’ve not known what the minimum level of acceptable outcomes is that a provider is allowed to “generate”. Until now.


In the “registration process and outcomes: key themes and analysis” document, there’s a longish description of the process that the registration team went through on this particular test. For those regulated previously, OfS used HESA data (or for colleges, Individualised Learner Record data from the Education and Skills Funding Agency) to look specifically at performance in relation to three key indicators, each broken down to show outcomes at different modes and levels of study, and for students with different characteristics:

  1. Student continuation and completion indicators.
  2. Degree and other higher education outcomes, including differential outcomes for students with different characteristics.
  3. Graduate employment and, in particular, progression to professional jobs and postgraduate study.

It considered a performance “in aggregate”, over a “time series” (for the number of years up to a five year period for which indicators could be derived from available), as well as across splits for each indicator described above that show performance within each for students from different demographic groups – broken down by mode (full or part-time) and level of study (for example “other undergraduate”, first degree), as well as by age, participation of local areas (POLAR), English indices of multiple deprivation (IMD), ethnicity, disability, sex and domicile.

The approach, says the document, was to establish a baseline for each indicator (in each mode and level of study that the provider delivers) as a guide to whether performance in relation to a particular indicator raised concerns, and that baseline varied according to the mode and level of the course. And the result was then broken into three types – “no concern”, “concern” (you get a letter or enhanced monitoring) or “significant concern”.

Then it considered the proportion of the provider’s current students who were at risk of experiencing the outcome(s) that were identified as being in that “significant concern” category, and the extent to which different demographic groups experienced outcomes of “significant concern”- and to do that OfS looked at the relative proportion of the most recent student population for which data was available who were represented by a student demographic split indicator that may be “of significant concern”.

That generated the kicker. If more than 75 per cent of a provider’s student population fell into a demographic group which was identified as experiencing an outcome that may be of significant concern, the provider was not likely to satisfy the B3 condition. It did then consider contextual information – data relating to the type of provision the provider offers, and the characteristics of its students and the size of student cohorts – although performance wasn’t “benchmarked” as such, given OfS looks at performance in absolute terms because it “expects providers to deliver successful outcomes for all students”.

Blame it on the baseline

If you’ve managed to follow all that (and TL;DR if OfS found a group of students with weak outcomes making up the majority in a provider, it intervened) the major question on your lips will be “yeah, but what counts as weak”. It’s that baseline (or as it turns out, baselines) that has separately emerged in a bear costume now – in the excitingly titled “Condition B3: baselines for student outcomes indicators” and the thrilling “Technical algorithms for institutional performance measures: OfS registration condition B3 indicators, methodology and rebuild instructions”.

That latter doc includes a definitional description of each of the indicators (including explaining how the “completion” metric will move from “experimental” to “downright lethal” in the future) and a chart that deserves a look, revealing that the definitions of things like “continuation” in the TEF, its access and participation resources and this exercise can all actually be different. For example – unlike in the TEF and A&P work, international students are counted in continuation data here. We also learn how a metric on completion will move from “experimental” to “lethal”, which given the international news will cause many a provider to become hot under the collar.

But it’s the former where the excitement lies. First you’ll see that the baselines for each indicator vary according to the mode and level of study – so (for example) the minimum continuation rate for full-time PhD students is different to, say, part-time HND students. This is justified on the basis that OfS considered “a wide variety of data on this issue” including sector level trends published by … OfS and and analysis contained in the equality impact assessment for the regulatory framework – although without any text actually explaining why OFS thinks it’s OK for continuation rates to be different in this way, it all looks a bit arbitrary.

Here’s an example – for the continuation metric – of the baselines this all generated, this one on continuation rates.

Mode of study Level of study The indicator is not of concern if the proportion of the provider’s students who continue their studies is more than... The indicator may be of concern if the proportion of the provider’s students who continue their studies is between... The indicator may be of significant concern if the proportion of the provider’s students who continue their studies is less than...
Full-timePhD90%80% and 90%80%
Taught masters90%80% and 90%80%
PGCE90%80% and 90%80%
Other postgraduate90%80% and 90%80%
Undergraduate course with postgraduate elements85%75% and 85%75%
First degree85%75% and 85%75%
Other undergraduate80%70% and 80%70%
Part-timePhD75%60% and 75%60%
Taught masters75%60% and 75%60%
PGCE75%60% and 75%60%
Other postgraduate75%60% and 75%60%
Undergraduate course with postgraduate elements70%60% and 70%60%
First degree70%60% and 70%60%
Other undergraduate70%60% and 70%60%

Then there’s a justification on this split metric, 75 per cent thing. “We decided that it was important to consider not only the performance against the significant concern baseline, but also how widespread any instances of performance that were of significant concern’ were within the provider’s student population”, opines the narrative. “We decided that if 75 per cent or more of a provider’s students fell into demographic groups with at least one outcome of significant concern, then Condition B3 would be unlikely to be satisfied”, although it adds that it would then consider other relevant factors, including the context in which a provider operates.

Here’s the thing

If you’re still with us at the back trying to spot the bloke in the bear costume, this next bit really matters. Some will get this far and question why on earth we have a system measuring “outcomes” to judge provider performance like this, given how arguably unfair it is based in context. We could have a fun philosophical debate about that (and regularly do on the site), but here we’re operating through the looking glass of the internal logic of Barberology, so shush.

Once you’re in there, you might ask why, if a provider is supposed to “deliver successful outcomes for all of its students”, OfS isn’t saying that no students should fall into a demographic group experiencing outcomes of significant concern. “We did not consider it proportionate to set the threshold at 0 per cent as this was likely to have resulted in a very large proportion of the sector not satisfying the baselines”, it says. Fair enough.

But there’s the pivot – the magic moment where the moonwalking bear takes off his pretend head and reveals that it’s just a man in a costume. Until now, the impression we are given is that the exercise on outcomes is a criterion-referencing one – designed objectively, with complexity on baselines for different modes and levels riffing off internal expertise.

But the document goes on to say that OfS “analysed the likely impact of the proposed significant concern and 75 per cent baselines” in order to ensure that their introduction would be “proportionate”. It produced charts showing the performance of every provider individually (and, they are keen to stress, “anonymously”) against the different indicators, and it considered “how many providers would fall below any potential significant concern baseline”.

That’s right folks. There you were, all along, thinking this was a criterion-referenced exercise, when it was norm-referencing all along. Or in other words – at least in part – OfS has looked at the span of performance, decided in advance it wants most providers to make it on, and then designed a set of thresholds that allows it to say “look, most of you passed” but also “look, we have loads of people getting letters telling them to improve”, with some getting enhanced monitoring, some getting public conditions, and some getting a fail.

It’s all coming up

If you’re staring at your dashboard now, breathing a sigh of relief that you’ve escaped the more serious interventions, beware. Baselines were set at a level that was “more generous than the policy intention” of ensuring a high quality bar and successful outcomes “for all students regardless of their background” might have allowed for. This is the first year of the OfS’ regulation, and while the baselines “make generous allowance” for differences between demographic groups and between providers, OfS will review whether the baselines are set at the appropriate level to protect the interests of students, and plan to issue a consultation in 2020. Plot spoiler – they won’t get more generous – and some in the sector will complain about another autonomy encroachment.

Then there’s the impact of new data. March is the magic month when some could fall under these thresholds again, and of course there’s some on the register that are so new that the emergence of usable data could see them chucked back off straight away. Completion is coming as a proper metric. And remember – if right now, planning is telling you that you might miss some of the targets by March, that’s a reportable event. Time to crank up OfS’ confessions portal and fess up to the governing body.

The other thing that’s painfully obvious in all this is how arguably unfair it is on smaller providers. As we’ve explained before, these assessments are largely discipline and size agnostic – and the bigger you are, the more likely it is that you can balance your averages across subjects to hit the thresholds. Fibchester College, doing mainly business courses for HE WP students might have its whole cohort under the thresholds and be refused registration, but Fibchester Uni could easily bury the same 500 students’ outcomes in its 20,000 students and no-one would notice. That big provider might even be sporting a TEF Silver or Gold. That’s surely not sustainable in the long term – and who knows if the courts will eventually agree.

It also surely starts to sound the death knell for validation. If you can avoid the bright sunlight of direct registration by burying your outcomes in a big university that can cope with the averages, you’d switch to franchising. We’ve already spotted some that have. But how fair is that on applicants, who will now struggle to see the performance of their provider of choice because its stats have been subsumed somewhere that’s often thousands of miles away?

Game off

In an ideal world what would happen next is that providers would look at provision with weak outcomes for students and fix it. If only. What’s clearly coming down the track instead is a whole new level of gaming – the ultimate achilles heel of Barberology. The fix, says, Barber, is professionals with codes of ethics that focus on performance improvement rather than quietly dropping or contracting provision that doesn’t perform, but there’s loads of research that suggests that in schools and hospitals, it’s the latter that happens – and ministers get to say “look, it’s all better”.

That might be fine where there’s a big hospital that has to improve because it still has to put on cataract ops, or a local area with lots of schools to choose from – but higher education isn’t like that. Just look at what’s been happening in FE – where it’s now nigh on impossible to do A Levels in a college because of the pressure on “success” rates, even though that might be the right environment for you.

Fuse all this with our kind of “market” – where the protections for current students are weak, and the protections for potential students are non-existent, and we know what’s next. Universities drop courses and colleges don’t bother with HE at all. Less choice for fewer students, especially those who can’t travel. It’s all coming up.

2 responses to “The B3 moonwalking bear: OfS on minimum outcomes for students

  1. It also surely starts to sound the death knell for validation. If you can avoid the bright sunlight of direct registration by burying your outcomes in a big university that can cope with the averages, you’d switch to franchising. We’ve already spotted some that have… But, is it for that reason? Surely, it’s more about cost efficiencies – why pay OfS a fee (and QAA and everyone else) if you can avoid it by working through a university?

  2. I know I am late to this part (sorry, just been a crazy old time). But this is the most depressing read, by one of my favourite writers, that I’ve had in a long while. Normally I’d think that the assumptions were overly pessimistic, but the behaviour of OfS suggests not. There would be an interesting conversation to be had about what the “right” thresholds are (and if the threshold is going to rise, then HEIs are going to take on fewer higher-risk students, resulting in less choice for fewer students. Oh, QED

Leave a Reply