As my role title strongly suggests, I have a particular perspective on the question of the utility of a “what works” approach in widening participation. Since taking this role, I have had the chance to collaborate with colleagues across higher education, but I have also encountered concerns about “what works” in widening participation among some practitioners and researchers, such as this Wonkhe post by Julian Crockford and this article by Neil Harrison and Colin McCaig.
I do not recognise the image of a researcher that emerges from these critiques: a person running around doing half-baked experiments, indifferent to the everything except whether an asterisk appears beside the ‘treatment’ bar on the bar chart. That’s not what I do, nor what my team does nor, as far as I know, is it the way that any of the What Works Centres approach the challenges they’re tasked with investigating.
To care about “what works” is also to care about how, why and for whom
One worry many researchers express about randomised controlled trials (RCTs) – the research method most strongly associated with the “what works” approach – is that they are a “black box” into which inputs disappear and out of which outcomes emerge with no explanation. This might be true of a very basic, poorly-designed RCT, but it is not an immutable feature of the method.
A good RCT is led by a hypothesis, informed by theory and primary or secondary – often qualitative – research. With a sufficient sample size, a more sophisticated design, or sequential trials, RCTs can tell you a great deal about what why and how something works, and provide a rich set of opportunities for qualitative research. Parallel and follow-up qualitative research can give us deep insight on how students are experiencing our interventions.
For instance, Ben Castleman, at the University of Virginia, and Lindsay Page, at the University of Pittsburgh, published the results of an RCT conducted in 2012 which examined the effectiveness of text messaging and peer-to-peer support in preventing “summer melt”, where students who accept a place at university fail to take it up. They found that both interventions increased matriculation rates, but they also explored the dynamics of how, and for whom, including the impact of access to other college support and engagement from parents, and the appeal of interacting with advisors compared to peers.
In a similar vein, authors from J-PAL and IPA, which between them conduct most of anti-poverty RCTs in the world, have recently written about the importance of understanding the mechanism of action of an RCT when thinking about whether it can be generalised. This requires a theory-led approach and careful empirical design.
Knowing what works feels pretty hollow if you don’t know why, how and for whom. Most “what works” proponents are sophisticated enough researchers to have noticed this.
In practice “what works” is about the best research method for the question
I’ve watched the work of the Education Endowment Foundation (EEF) for several years, and I think the path the foundation has charted for the “what works” approach in education is a pragmatic, mixed-methods approach that far better encapsulates “what works” than the idea of a black box.
Every EEF project has a theory of change, which is worked through with the project and evaluation teams. Investigators also need to have some proof of concept that indicates that the intervention might work, before the EEF will fund it. Lastly, the EEF directs considerable investment towards process evaluations, which seek to provide the why and how, including the fundamental question: did teachers think this was something they would want to do in their classroom?
Similarly, when working with other parts of King’s, we see “what works” as a process, not an outcome, and certainly not a dogma. Our division includes widening participation, welcome, graduations, addressing attainment gaps, academic support and wellbeing services. For some of these services, it will never be appropriate or sensible to conduct an RCT. And we were clear from the beginning that we didn’t expect this. We expect everyone in the division to think about why and how they believe their service improves things for students and we work with them to design an evaluation strategy to test their assumptions.
My team has been working with our colleagues to develop theories of change that set out the pathways of influence for everything we do. We run RCTs, yes, but also interviews and focus groups, surveys and data analysis, selecting the right method for the question. Our goal is to produce excellent evaluation, whatever the method.
Let’s meet in the middle
I think that we agree on more than we disagree on. We are all here because we want to ensure that students have an excellent education and excellent opportunities, and we want to know whether we are helping them achieve those outcomes. In the face of sometimes substantial barriers, we’ve built an evaluation community, sought out knowledge and networks, and honed our research skills.
I believe that RCTs and other quantitative methods bring huge value to the evaluation of widening participation. Without them, as Vignoles and Murray, and Younger and colleagues have argued, our understanding of the effectiveness of widening participation interventions is incomplete. A range of projects, including those undertaken by the government’s behavioural insights team, the University of Surrey, the Department for Education and King’s College London have shown that it is possible, ethical and useful to run RCTs in widening participation in the UK.
However, Neil Harrison speaks compellingly about how statistical inference can be misinterpreted in the hands of the inexpert. One only has to look at the replication crisis (referring to the phenomenon in which scientific findings are not reproducible in subsequent experiments) to see clear evidence of this in the social sciences, while Andrew Gelman at Columbia maintains an entertaining and enlightening blog about the statistical foibles that make it into published research.
There is a clear risk here that the sector needs to consider as it pivots towards a more balanced combination of evaluation methods. For both qualitative and quantitative evaluations, the insights that can be drawn are only as good as the research design and implementation, and the limitations of quantitative methods are often less obvious to non‑experts.
Likewise, I agree with Julian Crockford about the complexity of identifying meaningful outcomes, and the danger of focusing on outcomes to the exclusion of all else. The phrase “hitting the target but missing the point” entered the lexicon for a reason.
So let’s resist the temptation to go deeper into our methodological silos and talk across each other about our epistemological, ethical and validity concerns with others’ approaches, and instead let’s meet in the middle, let’s talk to each other, and let’s keep doing our best for students.
This is pretty weak ‘straw man’ mischaracterisation of my views on the topic. For example, I draw extensively on the RCTs undertaken by Daphna Oyserman in my recent paper on ‘possible selves’ in WP programmes (https://www.mdpi.com/2076-0760/7/10/209) and fully support the use of RCTs where they are undertaken with appropriate epistemological rigour to answer appropriate research questions. However, those that I have seen with respect to WP in the UK to date have generally been weak – not randomised, not controlled or not even a trial!
The pathological problem comes when RCTs are fetishised to the exclusion of other ways of knowing in complex social spaces. This has led to a clamour in some quarters to RCT at all costs. My position is that quasi-experimental designs are generally a better approach to WP research as they still provide strong forms of inference, but they engage with real-world longitudinal data rather than the artificiality required to adequately control RCTs. The other fear that I have about RCTs is that a weak epistemology can lead practitioners to discount effective programmes because they get a ‘negative’ result – I’ve heard at least one example of this.
This is increasingly being understood in the US – the home of the ‘what works’ approach. Paul Lingenfelter’s book (https://www.amazon.co.uk/Proof-Policy-Practice-Understanding-Improving/dp/1579227511) charts a move away from RCTs as they fail to adequately capture complexity and often fall back on pretty crude determinism. This has been led in public health, with education following. This does not mean throwing the baby out with the bathwater. RCTs have a clear role to play in understanding relatively simple research questions that are time-limited, readily controlled and where large effect sizes are expected.
Hi Neil,
Thanks for your comment. I didn’t mean to mischaracterise your argument – rather to aggregate up a collection of critiques of what works and/or RCTs and point to a couple of considered examples if people wished to read them. The purpose of this blog post wasn’t to respond in detail to those critiques but to set out what I, as a practitioner, see as the ‘what works’ approach on a day to day basis, and how I think it could be a positive opportunity for the WP space.
Side note: I’m not convinced that QEDs are better than RCTs in WP. As you say, we’re dealing with a complex space, and although an RCT might struggle to exert enough control to get a ‘clean’ causal estimate (to the extent that that is the goal of the research, which it is of course often not), most QEDs will find that complexity totally impossible to contend with. For instance, I think your article linked in the post provides a pretty compelling case where a QED that was not well-suited to the question has led to some pretty questionable inferences. However, I do think that methodological selection is almost always better when it has been subjected to discussion and scrutiny by others, so I would love to continue this conversation sometime.
Best wishes,
Susannah
One of the things that puzzles me about RCTs in WP is why few (if any) ever go through a peer review process. Of the four that you cite, only one is in a peer reviewed journal and then a relatively obscure economics journal, not a mainstream education one. Maybe I’m just missing them, but there appears to be an almost wilful desire to avoid independent scrutiny of the work – or maybe they aren’t getting through the rigorous standards set by peer review.
I believe the DfE letters trial has been accepted for publication. The King’s trials I’d need to write up and I haven’t got around to it. But yes, education RCTs do tend to get published in economics or organisational behaviour journals.
Not really – there are plenty of RCTs in education journals and I often cite them. I am specifically talking about RCTs about WP issues and there are barely any that I can find in economics journals either. I started work on an article called “Gold standard or fools’ gold” about RCTs in WP about a year ago, but I couldn’t find enough published examples to review. They simply are not going through peer review processes and nor do they often appear at education conferences. You talk about silos, but the ‘what works’ movement is one that only really speaks to itself. If the findings are useful to the sector, why aren’t they discussed more widely with educationalists, either in person or in press?
Searching the articles cited while they may critique RCTs they do not use phrases like “half-baked”.
The proximity of the phrase “these critiques” to the literature does suggest attribution, and hence feels like a misrepresentation.
I would recommend an amendment in the paragraph which makes the distinction between the people cited and the point your trying to make, so that it is more clearly your interpretation.