In recent times significant attention has been focused on the growth in the percentage of university students gaining first class honours degrees.
Though concerns about maintenance of standards across higher education have been raised by both media and politicians over the years, this latest round of scrutiny has been prompted by the proportion of such awarded to students nationally increasing from 16% in 2011 to 29% in 2018.
According to the latest data published by the Office for Students (OfS) at one institution the percentage of students awarded first class degrees touched 40%. Awards of upper second class classifications have also increased, but the rise in first class awards in particular has caught the eye. For many, this trend could lead to the devaluation of UK degrees in the eyes of employers, parents and students, and damage the reputation of UK higher education around the world.
Tales of the unexplained
OfS analysis of this trend for UK-domiciled graduates who qualified in academic years 2011 to 2018 suggested that some growth could be ascribed to increases in entry qualifications. Nonetheless, there was an element of growth which they argued could not be explained. OfS concluded that 13.9 percentage points worth of first class degree attainment evident in 2018 remained ‘unexplained’; this was a rise of 2.4 percentage points from the previous year.
Both the methodology of the OfS study and its conclusion came in for some trenchant criticism within the higher education sector. Though this rise in “good” degree outcomes at many institutions may be attributable to factors such as the improved teaching of more talented students, it is hard not to share the OfS’ view that artefactual causes – relating to the ways in which marking systems are designed and applied – may be behind at least some of the upward trajectory.
Assessment is so fundamental to what we do, that we need to stand back to see it clearly. This is the opportunity given to us by the solution offered by the UK Standing Committee on Quality Assurance – they propose that universities must publish a clear Degree Outcomes Statement specifying the criteria needed for different classes. This is important – the HE sector runs a criterion-based assessment system; firsts are not rationed according to a predetermined allocation, they are awarded for meeting the criteria. It is not the case that the awards are benchmarked to a point in time where you could count the number of students on your course who got firsts on one hand.
Bridging the gap
There’s a big challenge ahead – if we tackle the participation gap then we have to do the same with the awarding gap. So, unless there’s overall grade improvement, if we close the gaps that currently exist then students in well represented demographic groups will need to get worse outcomes.
But even then we still have some inherent problems – including the 1-100 problem. In our criterion based system the numbers have taken over, as if they have a life of their own. While that might be acceptable in the middle, the extremes of the range 0-40 and 70-100 are poorly understood and not often used. In aggregation systems this can have artefactual effects. We also understand that these are the areas where disciplinary differences are the greatest.
So, in order to counter this, over the last two years my institution, Nottingham Trent University (NTU) has moved away from the non-linear 0-100 marking scale completely. It has introduced grade-based assessment (high, mid, or low first, 2:1, 2:2 etc) with a linear numerical system (0-16) to represent each grade. The grading ensures marks are based entirely on comparing the qualities of student work with associated written descriptors of assessment criteria. We are not alone, other universities are using similar scales.
From scores to classes
Another potential contribution to unexplained grade inflation is the way in which calculations are used to aggregate scores for students across modules to generate overall degree classifications. Rumours swirl everywhere about exercises that map favourable aggregation systems. We have refined our algorithm to calculate degree awards, ensuring that it is students who have performed at a first-class standard on the majority of their credits who will be awarded a first class award classification. The University has also removed the power of examination boards to make discretionary classification decisions for students on the classification borderline.
Our 2019 finalists were the first cohort to experience the fully revised marking and classification system and so its impact on outcomes has not been known until now. The outcomes for NTU’s 2019 graduate cohort includes a 7.1% reduction in the number of first class degrees being awarded to its students compared to the previous year.
This reduction holds true across the full graduating cohort profile. Of course, some of this decrease may also relate to the academic ability of students within the two years – there are always variations year on year – but this could not account for the scale of the change. The relationship between NTU’s findings and those of the OfS’ analysis of unexplained growth suggests too that the OfS’ analysis was accurate in principle, even though – at least at NTU – it seems to have exaggerated the impact of artefactual factors.
Basis in fact
The impact of changes introduced at NTU shows both that the concerns articulated had some basis in fact and that some concerns can be addressed by looking at artefactual factors inherent in the numbers we assign to performance and the way we calculate final marks. As other institutions have been looking at their Degree Outcomes Statements, and maybe take taking similar steps, their degree data will show a similar trend in the future.
We still believe that we have many factors that explain grade improvement: a focus on better teaching, increased support for students, and intakes of students with higher previous attainment.
But, we are always prepared to respond to well-informed challenges. The OfS report on grade inflation raised issues of legitimate concern. We have made changes to our approach to assessment to maintain trust in – and reassurance about – the value of an NTU degree and continue to ensure the quality of NTU first class awards. We will be confident in making our degree outcome statement.
Students who graduate with first class honours really have performed consistently at an outstanding level. We believe that students welcome these steps as they ensure those most important elements of any educational system: openness, transparency and, above all, fairness.
Thanks for the article Mike, really interesting to read about the work you have done with Nottingham Trent’s classification system. A question which has come to me while reading this: you’ve written primarily about ‘artefactual’ causes of inflation which I’ve interpreted as human interpretations of standards shifting over time in how numerical grades are awarded. However, you’ve made changes to both the way in which grades are determined (IE markers are awarding a grade with clear criteria rather than a number), and changes to your algorithm. How have you come to the conclusion that OfS’ analysis ‘exaggerated the impact of artefactual factors’, given than you’ve changed your approach to grading and the arithmetic in the algorithm?
Iain,
You’re right – we can’t be sure of the extent of the impact of any factor, having made the change to the marking scheme, we couldn’t continue with the old algorithm and so we can’t control for its effects. I’m sympathetic to the criticisms of the limits of OfS’ ability to ‘explain’ (or leave unexplained) factors – and these factors (better teaching, better assessment, increased effort from students) are probably the gap between what OfS thinks they found and what we have taken out with the change.
Don’t league table measures and rankings have rather a part to play in explaining what’s happened, as recruitment has become more competitive?
Mike I applaud the decision to move to grade-based marking – if only more institutions could be encouraged to do so – and making a holistic decision against specific criteria. I also applaud an algorithm that looks for consistent performance across modules. But given that, I am unclear as to why you have given a 16 point numerical scale to each grade? Even against one criterion it is arguably not possible to distinguish quality to a sixteenth degree. And if you are using the algorithm rather than simple mathematical aggregation surely you have just unnecessarily introduced an increased likelihood for marker inconsistency?
Chris,
We’ve gone for 16 points as they fit onto the honours system (Exceptional 1st, High 1st, Mid 1st, Low 1st, High 2:1 Mid 2:1 etc.) which accords with a lot of practice. What we’re trying to do is reassert the primacy of the grade – “this is a mid 2;2” not this is a 55 (or the 8 that sits behind that grade in our algorithm system).
What I’ve seen of the use of the assessment matrix is it supports a categorical grade for the assessment piece. It should be easier for teams to be consistent around the grades for assessments on the 16 points rather than the 0-100 scale.
So you have high mid and low for 3rds?
Chris, yes – and we have marginal, mid and low fails. Marginal fails can be condoned if the students is passing overall.
Thanks Mike – very interesting. Can you say more about how the aggregation algorithm works, and how it differs from the previous one? Have you modelled how the most recent cohort would have fared under the old algorithm?