The post-election regroupement of the government provides a surprising and welcome opportunity to rethink the TEF before it does too much damage, before the shaping of behaviours and unplanned consequences become entrenched.
Let’s consider what might happen if the government took that opportunity—replacing TEF 2 (2017), whose first results will be announced – along with celebrations and wailing – tomorrow, with a much better TEF 3 (2018) for implementation next time round.
TEF 3 would need to start from three assumptions. First, for better and for worse we live in an era of performance management in higher education. This is dictated ultimately by public accountability, and it is an unanswerable requirement. We need performance management, not to create false differentiation between providers in order to make an artificial quasi-market look real, but because outcomes in fundamental areas matter. This means that the new public management genie won’t be pushed back into the collegial bottle, whether in research or in relation to teaching and learning. However, systems of performance management can be better or worse and should be judged by the effects they have. The task of system design is to render any new teaching and learning performance measures as valid as possible in the light of educational and social objectives, and with the least negative consequences.
Second, in the UK we are excellent at collecting comprehensive data on higher education, but less good at interpreting this data. Correlations are all too quickly turned into causes and simplistic explanations abound, such as ‘higher education generates graduate earnings’. Simple mantras can generate major distortions of the real pattern of disciplinary and institutional performance, when they become systematised into reputation judgments and funding outcomes.
Third, UK higher education is closely regulated by a potent mechanism for measuring research outcomes, the REF, while global university reputation is driven above all by standing in research publication and citation. Research-related rankings shape the sector. If we truly want the education function to gather a status equivalent to research, to genuinely place students at the centre, we need a viable method for assessing teaching and learning, one that can evolve and improved over time (and keep a step ahead of the inevitable gaming).
Creating the virtuous circle
The key to achieving the best possible and least damaging performance management system, is to create a virtuous circle between real outcomes, performance measure and the resulting competitive position. Any ranking worth its salt must be grounded in real outcomes. Improvements in real performance, in a comparative sense, must lead to new behaviours that generate improvements in the relative position of the institution or discipline or the virtuous circle has not been established. There is no incentive to improve.
For example, in the case of measures of university reputation based on surveys of academic staff, as used in some ranking systems, the virtuous circle is absent. These rankings, which reward the established university names regardless of actual effort, are not worth the paper on which they are printed. In a ranking system like this universities have incentives to boost their marketing, not their research or teaching and learning.
Yet measures of research performance based on the volume of high citation papers produced by a university do tell us about real outcomes, and arguably, that kind of ranking has driven growth in the quantity and quality of research. Likewise the quality assessment in the REF fosters improvements in research, all else being equal, and this is a primary reason why UK universities achieve stellar research outcomes from a modest national outlay on research and development. Perhaps the principal problem with the REF impact measure is that it is unclear whether the evidence-base captures real impacts, rather than simulations of impact, and thus it is unclear whether the virtuous circle has been created.
The limitations of TEF’s proxies
In TEF 2 there is no virtuous circle. The bottom line is that there is no guarantee that positive effects on outcomes and behaviour will be generated, leading to better teaching and learning in UK higher education. This is because as designed, the indicators used in TEF 2 do not measure teaching and learning. They measure proxies, and there is no necessary relation between the proxy outcomes and the actual teaching and learning outcomes. TEF 2 is a competition game that despite the best intentions is decoupled from real classrooms and real learning achievement.
Graduate employment rates and salaries tell us about many things at once. They are governed by a mix of elements: the social backgrounds of students and their continuing impact on the career through social networking, the formal educational experience, the extra-curricular experience, the macro-economy and the local micro-economy, and so on. Any statistical attempt to separate the different causal elements, which continually affect each other, and isolate the singular ‘effects’ of teaching and learning, is ultimately artificial and assumption-driven.
This means that not only do outcomes related to graduate employment fail to identify educational effects accurately, comparisons of performance over time, or between different institutions or different disciplines, have no intrinsic merit either. This data has secured a hold on the public imagination based on the mantra that ‘education causes earnings’, but educationists know that this mantra is seriously misleading. Graduate outcomes are very important, and providers do affect those outcomes, but graduate returns data should not be hijacked for the purpose of false measures of what has been learned.
The use of student satisfaction data is also problematic. There is a long history of studies in social choice theory about the conditions under which inter-personal comparisons of welfare are valid and the NSS does not mean those conditions. Student satisfaction surveys are important in themselves, telling us something we need to know, it’s just that we cannot validly use them to measure something other than satisfaction. A key problem here is that by importing student satisfaction measures into the TEF problematic, with its financial drivers, we create an incentive for institutions, and their individual disciplines, to dumb down the academic demands of programmes so as to boost student survey outcomes. The American study by Richard Arum and Josipa Roksa, published as ‘Academically Adrift’, found that there was some evidence that this had happened following the longer use of student satisfaction surveys and student ratings of academic faulty in the United States.
In short, two of the principal measures to be used in TEF 2 will not lead to a necessary improvement in teaching and learning, and they are likely to have perverse effects. These are very serious weaknesses.
Making TEF better
What then should TEF 3 do? Inescapably, TEF 3 needs to be based on solid, well worked through measures of actual student learning outcomes, including value added measures. Proxies won’t work and there are no short cuts. Generic measures of academic learning can tell us relatively little because in higher education learning occurs through immersion in complex domains of knowledge.
Measuring learning outcomes within one discipline in one institution is straightforward. The challenge is to devise a system which enables comparability across disciplines and between different kinds of institution, operating in varying contexts. This is difficult but not impossible. In Europe, the collaborative Tuning project on learning outcomes is currently testing measures of learning across a range of discipline clusters, and in different kinds of institutions, in several countries. It is not beyond the whit of UK higher education to do something similar. What is needed is hard, patient graft over time to devise measures of teaching and learning that will work, with the minimum negative effects on either student performance or inter-institutional fairness, and well piloted before they are implemented.