Which evaluation measures should providers pick to boost access and participation?

Most higher education providers in England are now in the process of preparing their new Access and Participation Plan (APP) for the Office for Students (OfS), following a trial which involved 40 institutions – known as the “first wave”.

The University of Reading was part of this pilot group, working under renewed and elevated expectations toward evaluation and the generation of robust data to evidence “what works”.

OfS’ Standards of Evidence provide guidance on the collection of high-quality evidence to inform our understanding of how our programmes are working or not.

As part of our recent APP submission, we as the evaluation team were responsible for working with intervention and activity leads in our institution to clarify the intermediate and long-term outcomes of our programmes, especially how these would be measured.

An important question that we faced was how to maintain the integrity of validated evaluation measures – while acknowledging the practicalities of data collection in the real world.

The journey so far

As a “first wave” institution, an abundance of time was not something we had when developing our APP, choosing the evaluation outcomes and the corresponding measurement instruments.

In an ideal world, we would have longer to discuss and develop our intermediate outcomes and approach to evaluation – but this is the real world we are working in, and we hope our experiences will be of value to “second wave” institutions.

For colleagues who oversee evaluation or have responsibilities in evaluation, our experience is that it is critical to have buy-in from the top, in our case the Access and Participation Committee, who writes the APP.

Our institutional strategy for evaluation was guided by the OfS evaluation self-assessment tool, which prompted us to review and redevelop our evaluation mechanisms and structures so that it grows to a collective responsibility, through capacity building and empowerment.

So for example we created a new Access and Participation Evaluation Subcommittee, appointed “evaluation leads” for our intervention strategies, and ringfenced an evaluation workload for colleagues who are delivering our plan.

We are committed to publishing our evaluations, starting with reflection reports, as we prepare to deliver our new evaluation plans that will enable us to produce data for high-quality evaluations.

We want to ensure that evaluations will be meaningful and allow for reflexive programme development, whilst also capturing the impact of programmes that are complex and sometimes difficult to collect data on.

Choosing instruments – pick and mix?

We worked with a number of intervention and activity leads to decide on the appropriate outcome indicators and instruments. In many cases, the intermediate outcomes are best measured using surveys.

For some outcomes, validated measures such as TASO’s Access and Success Questionnaire (ASQ) or the Toolkit for Access and Participation Evaluation (TAPE) already exist – great, we think, these will be perfect for measuring the outcomes.

However, the practical and realistic considerations of activity leaders raised different points. Often, they felt that specific questions picked from different scales were together a more accurate measure of the programme, than one measure in its entirety.

For example, an activity with the aim of increasing knowledge about higher education options wanted to use the ASQ questions on university expectations (covering “I am thinking about going to university in the future”) and knowledge of what university would be like, but also to capture the possibility that the activity increased the students’ knowledge, after which they might decide that university is not the right route for them. Therefore, the activity lead wanted to include extra questions from TAPE, such as “I know enough about higher education to decide whether to go or not.”

We liken this to being in a “pick and mix” sweet shop, where multiple measurement scales (in “standardised bags” of items, or sweets) are available for users to choose from. However, the danger of picking our favourite sweets from different bags (items from different scales), as reminded in the recent TASO webinar on the recently validated ASQ, is the invalidation of the measurement scale for a particular concept.

So, what do you do when sweets are not designed to be sold separately, but the entire bag won’t work for your party?

What have we learned about sweets?

Each institution will have different evaluation needs. However, if we were to offer any advice based on our experiences, it would be to set up your bespoke “evaluation sweet shop”, with clear guidance about what can and cannot be eaten separately.

Wherever possible, to maintain the statistical rigour, measurement instruments should be used as whole surveys and as intended by the developer. Depending on the research and evaluation expertise of your colleagues, this might be something that practitioners have not come across, so it’s important not to assume that everyone knows the importance of this when choosing their sweets.

However, in reality, if using whole validated surveys is going to lead to excessively long surveys or inaccurate intermediate measures, then flexibility and common sense are required. If sub-scales can be used rather than a whole measure, do this and acknowledge the caveats. What we want to avoid is a complete pick and mix of single questionnaire items, as by doing so we erode the internal consistency of the measure and limit what we can say about results.

It is possible to create a new measure for a concept and validate this yourself. Whether this is feasible and realistic is another matter. Although there are some areas (for example, career progression) where the sector would benefit from more validated measures, we also want to avoid having a glut of instruments to measure very similar concepts, as this only makes cross-sector comparisons and the initial choice of instrument more challenging.

As a final option, in some of our evaluations we have used validated scales or sub-scales (which can be analysed separately and in their entirety), but also added some additional “pick and mix” questions to cover outcomes not fully captured by the validated instrument. In this way, we almost combine the “standardised bags” with “pick and mix” – but will ensure that analysis of “pick and mix” items does not over-claim, and is used more as contextual information rather than for statistical claims.

We must balance the need for measurement instruments to be used in a way that retains their statistical integrity and robustness, with the practical and real considerations of gathering evaluation data in real-world, complex settings.

We must avoid getting carried away like children in a sweet shop – as is often the case with the sometimes competing priorities of research and practice, compromise is key to creating a satisfactory solution for all stakeholders.