[Anaxagoras] when he speaks…he thinks…
fire made of fires, and water of waters…
Concedes not any void in things
Nor limit to cutting bodies down…
Right here remains
A certain slender means to skulk from truth…
Who holds that all things lurk co-mixed with all
While that one only comes to view
(Lucretius from de rerum natura, translated Leonard 1916)
The name of the pre-Socratic philosopher Anaxagoras has come down to us almost exclusively as the provider of the type example of a particular logical fallacy, the Fallacy of Division. Anaxagoras’ material philosophy was one in which materials were infinitely divisible into smaller parts all sharing the properties of the whole. The parts of water are wet, those of fire burn. It doesn’t matter how much you divide it up, the characteristics persist.
Lucretius in his de rerum natura, arguing for the much more familiar atomic view of materials makes fun of this, just as we moderns might, explaining that this simple view is “a slender means to skulk from truth”, that the world is so much more complex with “things…co-mixed with all”, and so it is downright silly to expect that when we divide things that the parts have the same characteristics, that “one only comes to view”.
Except of course, as researchers, we can be just as guilty of the same logical fallacy in the practice of research. The worst excesses of pseudo-science of the 19th and 20th century, eugenics as well as later forms of scientifically justified racism, rested on the assumption that differences in the properties of populations can tell us something about qualities of individuals. We don’t need to look too far in our fractured and polarised society to see the same assumptions playing out with terrible consequences.
Statistical thinking
One of our best tools to combat these failures is the rigorous application of statistical thinking. A critical analysis of the measurements we make and the questions that we ask. We should immediately understand that, when the variance in a measurement overwhelms the difference between means, that prediction about individuals is impossible. We should constantly ask whether the measurement we are making actually addresses the question we are asking. And we should constantly test those questions for our own bias and assumptions.
To fail at this constant process of rigour, testing and criticism, is not merely poor scholarship;. It is scholarship that is easily co-opted by political winds and the imposition of outside power. And, yet we have done precisely this to ourselves.
The history of the rising importance of citations as measurement of research is fascinating. It is interesting to speculate whether a different measurement would have gained the same kind of currency if it had been made available instead. One way to read the history is that there was a sudden and desperate need for some measure, any measure, to help quantitatively track and manage the vast scale of researchers and research outputs as they expanded globally in the decades following the second world war. Deciding on resource allocation required at that scale required some form of summary quality measure. It just happened to be citation counts that were available.
What do citation metrics measure?
The question of what it is that citations actually measure only really arose after the data became readily available. In a comprehensive review from 2008, Bornmann and Daniel track these studies showing how the interest in understanding what citations could mean really only arose in the 1970s after the publication of the first Science Citation Index in 1963. Prior to that, citation counts had been used, but mostly as a speculative proxy for more important things.
What the best of these studies show is that, on average, the number of citations in aggregate tends to correlate with other evaluations of impact, influence, or prestige. As a proxy citation counts seem to be useful, at least for evaluations of some populations of researchers or outputs. Of course at the same time there are many examples, both anecdotes and more rigorous analyses that show examples where these correlations and associations break down. There is another logical fallacy, that of composition, which is to hold that the properties of the parts must be the properties of the whole. The idea that a single individual who doesn’t fit means any understanding of the whole is also flawed. This is just as silly a position.
We try to justify the use of citations as proxies using correlations that are statistical properties of populations not of individuals. The data is questionable enough as it is. But to use citations to evaluate individual works or researchers is not just – as many critics have argued – a statistical error given the scale of population differences and variance, but also a version of the fallacy of division. It’s not just that citation distributions are skewed and long-tailed or that they are poor at predicting the value of a specific article or book. It is a logical fallacy so well understood that it has a name and a history of millennia of discussion to even try.
The Loch Ness Monster
If we had a causal model of how referencing behaviour of individual researchers that showed how individual behaviours would lead to those correlations at the population level there could be some justification. But after 50 years of efforts, Blaise Cronin, arguably the leading scholar in the area concluded that the “…quest for a theory of citation [is] about as likely to succeed the search for the Loch Ness Monster”. Not only are the statistics flawed but no rigorous and comprehensive theoretical link can be made between what we are measuring and what we want to evaluate. If we are to be principled in applying what is valuable in citation counts as a tool, then we must restrict ourselves to evaluations at a level of aggregation where those associations are shown to be sufficiently robust.
As a community of researchers we are at our worst when we fail to rigorously apply the tools we use to critically analyse our research claims, to our own processes. The darkest parts of our history have been associated with this form of logical error: assuming that the properties of an aggregation can be attributed to its individuals, assuming that the available measurements are good proxies of those properties, and above all failing to question why it is just those properties that we have focussed on and whether it is the result of a biased perspective.
Do we really want to continue to do that to ourselves as researchers?
(Cameron Neylon’s briefing paper “The Complexities of Citation: How theory can support effective policy and implementation” is available from the Jisc repository)