This article is more than 3 years old

The data science behind Clearing Plus

For Clearing Plus to be trusted, it must be transparent and clear about how matches are made. Paul Chandler shows us around.
This article is more than 3 years old

Paul Chandler is Principal Data Scientist at UCAS

As with any new service that uses individuals’ data to draw conclusions, it’s important to be transparent on what’s under the bonnet and explain how results are generated.

Life-changing decisions could be made on the suggestions generated, and by explaining the data science we want to make sure Clearing Plus justifies the trust that will be placed in it, and reassure both students and admissions teams that UCAS will be closely monitoring how it works in practice.

How it works

From early July, those not holding an offer or place can see their individual list of matched courses in Track (their online UCAS account) by clicking a button. From there, they can easily send an expression of interest to their chosen universities. After a conversation, the student can decide whether to officially add them to their application. As ever, admissions teams have the final say over who they admit onto their courses.

Clearing Plus works by suggesting courses to students that are typically favoured by similar applicants, and that they are eligible for.

Two critical factors are involved:

  • Available courses and a university’s own recruitment criteria.
  • A match score of students and courses based on historical acceptances.

The first of these is very simple to imagine. University of X wants to recruit to their physics course, and therefore submits physics to Clearing Plus, stipulating that it is only visible to applicants with a confirmed A level grade B in maths. They will then receive the details of all unplaced applicants who have clicked on their course to register interest. Applicants won’t see the course if they don’t have the required B (or higher) grade, so admissions teams can have confidence in those registering interest. This means that the applicant’s achieved regulated grade is used, as it would be in any other year.

Participation and relevancy

The widening participation opportunities are obvious. Admissions teams can also choose to use POLAR and SIMD as part of their criteria to effectively reach underrepresented applicants, helping them achieve a diverse student population, and support this important agenda for all of us working in higher education.

The second of these is slightly harder to conceptualise and is commonly referred to as ‘the algorithm’. The output of the algorithm is a match score (between 0 and 1) representing the relevancy of each course to an applicant. It informs the order that available courses will be displayed to each student – a good match would be top of the list.

The match score is derived from the following pieces of information:

  • The grade/subject profile of the applicant at Level 3 or equivalent (to begin with A levels, BTECs, Scottish Highers, Access to HE Diplomas, and International Baccalaureates will be used, which will be expanded in the future).
  • Any of the applicant’s main scheme course choices.

Clusters

These pieces of information are used to create a ‘cluster’ which groups a student with peers of similar characteristics from previous cycles, so the historical outcomes of ‘people like them’ can be seen. We’ve been asked if the algorithm allows for ‘mixed qualifications’, which it does as its the combination of qualifications used to create the applicant cluster. For example, if a student with both A level and BTEC grades enters Clearing, they will see the most strongly related courses that have a good record of accepting their combination (or one very similar to it).

Alongside this, every course has been grouped into ‘course clusters’ using the same pieces of information:

  • The proportion of applicants with each grade/subject profile at Level 3.
  • The proportion of applicants with specific other applications in the main scheme.

Each combination of applicant cluster and course cluster has a match score. This means each individual applicant and specific course also has a match score. The advantage of clusters is that relevant recommendations can be made without attempting to exactly replicate previous cycles. Pertinent links can be made that may not have been thought of before because the data can show that “people like you were accepted to courses like this”.

Agnosticism

Courses are grouped with other courses with a similar intake in the profile of students (regardless of the size of university or college). The match score is also agnostic of the number of applicants, with the likelihood of similar applicants being offered and accepting a place a key factor in the matching process.

We are naturally aware of the questions people have about algorithms and the bias they can introduce or perpetuate. Clearing Plus could be seen as reinforcing existing patterns of acceptances, such as men being more likely to choose STEM subjects. However, by basing matches on clusters of students who have been previously placed on courses, using factors mentioned earlier (e.g. grades and not sex), students will discover courses which may not have been on their radar in the past, but are qualified to succeed on. With all parts of the admissions process, we’ll be monitoring, measuring, and reporting on how Clearing Plus works to expand student choice.

This matching process is more in line with other online services and industry standards that applicants are used to in their daily lives, but crucially is one that could have the most impact on their lives to date, setting them up for their step into higher education, and possibly future careers.

This article is published in association with UCAS. 

8 responses to “The data science behind Clearing Plus

  1. Thank you for this piece, which is reassuring to those of use with concerns about how CP might be implemented, particularly in light of the ways in which it was initially framed by the government.

    It’s important for the sector that the algorithm’s assumptions remain as transparent as possible, and this is a good first step.

  2. this whole process is impossible to navigate through and is leaving dyslexic student like myslf behind. collge is closed no supprt avaiable for me.
    i dont want to slip through the net .
    i have worked hard to get a predicted distiction in my foundation level 3 extened in art and design.
    i, struggling to apply read what i have to do.
    where is the help now ?

  3. Thanks for this insight into the service Paul which is very helpful.

    Important for providers will be the balance between the two ‘forces’ described within this system. The first force is the data-led ‘discovery’ of possible links between providers and students when neither is aware of them. This is driven mainly by the level 3 grade/subject mix and its similarity to historic recruitment at the course (i.e. what providers actually do rather than what they aspire to do). This is in the vein of the previous PMDS/DCS systems and can lead to those rewarding “I had never even heard of X at Y but it is perfect for me!” / “We wouldn’t normally have considered a nursing applicant but we were pleasantly surprised on how suitable they were” conversations. But also, with about the same frequency, to bemusement and bafflement on both sides.

    The second broad force is a more akin to a directory look-up. This is seen in those elements where the providers and applicant expressed preferences are used (so, required min grade/subject, building on the existing HE subject choices, POLAR specs etc). This has a different philosophy, more “I know what I want, now just help me efficiently pick it out of these tens of thousands of courses/applicants”. For providers, it is closer to commercial email segment selection. This force acts on optimising (and reinforce) existing beliefs about good applicants and courses.

    Neither approach is right or wrong, but they are different. The first will probably better for broadening intakes (and guarding against provider intake grade optimism), whilst the second will probably be more efficient in terms of contact conversion, but might risk cutting the pool down unfairly. Providers might want to use the system in a way that effectively plumps for one of these two underlying logics, to avoid the risk of forces pulling in opposite directions. If majoring on this service, going through the algorithm code itself to see how it all plays out, though taxing, is the way to gain transparency on what it will do.

  4. Can you clarify what chinese walls will exist between this service and those that UCAS Media sells to universities to target candidates? Will universities be charged either for the core of this service or to effectively ‘boost their signal’?

  5. We made the decision to make Clearing Plus free, to maximise choice for students and reach for providers. For a small fee, providers can add their logo and short description, though this doesn’t improve their relevancy score.

    There’s no relationship between the Clearing Plus algorithm and UCAS Media services, they exist side-by-side, as we know students use multiple channels and sources of information and advice during Clearing. UCAS Media mailings are commissioned for a fee by providers based on their specified target criteria rather than a common algorithm.

  6. If the algorithm is so good then why not send providers some data about the numbers of unplaced applicants that they would have been matched to in 2019, their match ranking and for which courses (and how many of those applicants ended up with them in 2019). There must be a backup of the db from A level results day 2018 that this could be modelled on and it would help give some more understanding of how the algorithm works for various courses.

Leave a Reply