Data ethics: what if data predicts a student will leave?

There is a reasonably well known philosophical puzzle called “the trolley problem”.

It’s a moral puzzle that goes along the lines of; “there is a trolley thundering down a hill out of control, heading towards a group of five workers. The workers are oblivious and will remain so until it is too late.

However, there is a fork in the track leading to a solitary worker – that worker is equally oblivious to the tram and will also remain oblivious until it is too late. Next to you is a lever that controls the points that will divert the tram onto this branch line.

The question is: do you pull the lever or allow the tram to continue on its current course? Most people, when questioned, say they will pull the lever, justifying it as acting in the greater good. The thing about the trolley problem is that there are variations that really make you question your judgement. What if there was no branch line, but pushing someone in front of the tram would be sufficient to stop it?

A ‘thinking machine’ capable of doing accidental harm, the self-driving car presents us with an example of a constant trolley problem. Of course, these systems are designed to have safety at their core to avoid doing harm, and yet we are forced to hard-code these choices; we have to tell these morally-neutral machines that one choice outweighs the other, that there is a value-judgement to be made. We have to code them to hit an animal rather harm a human driver, or drive off a cliff rather than hit a group of bystanders, or hit a single person to save multiple occupants.

What should universities do?

At the AMOSSHE – the student services organisation – conference, we discussed the case of a university who had been presented with their very own trolley problem. This university was using some incredibly sophisticated data analytics and machine learning that they had found could predict the likelihood of a student dropping out. Now, this wasn’t just a ‘student x is a first in family low socio-economic group so has a 10% chance of dropping out’, this was data analytics saying this student will drop out; the data predicts this, we’ve modelled it, tested it and observed it to be true and the observation is repeatable. It’s difficult to express just how compelling this was – this university was literally saying “What do we do?”

So, what would you do? Do you accept the algorithm as fact and put in revolving doors on enrolment? Of course not, and I’m not suggesting for a moment that the university in question was suggesting this, but what are the decisions that need making?

Choosing the individual

You could use this data to intervene and to trigger processes, ignore the supposed certainty of the predictive algorithm and go for the hope that this is the one case that is different – that you can do something that the algorithm couldn’t account for and this will be the thing that tips the scales. But what if the level of sophistication of the algorithm was such that it could say “if you direct attention to this student, they will still fail to engage and furthermore your intervention will divert resources away from five other students with less intensive support needs but who, as a result of this diversion, are now predicted to drop out?” Faced with this prediction, would you be able to still choose the individual? Or do you believe that if you save the one, you can save each one of the five? What if for each individual saved, you were faced with five more at risk?

Choices and obligations

So, do we instead choose to ignore the information we have been given? Do we accept that we can only act on what we know to be the moral choice and let fate decide the rest? Are we not equally morally obligated to consider all the information we have available to us no matter how unpleasant this prospect may be? To extend our self-driving car comparison; when faced with the ability to choose between the lesser of two evils, do we tell the ‘AI’ to switch off and leave things to fate?

At Wonkhe’s ‘Secret Life of Students’ conference, we heard how the power of data can challenge the assumptions that we make of others. We also heard that, when dealing with data, we need to be cautious that we do not lose sight of the individual; that there is no homogenous ‘student voice’ or ‘student experience’; there are ‘student voices’ and ‘student experiences.’

At this point in time, our analytics struggle with the messy nuances of our chaotic and unpredictable individualism but excel at our more predictable group behaviours. Conversely, our brains are limited in their ability to comprehend big data and see the patterns but instead understand what the individual in front of us is thinking.

The issue facing us is our analytics are improving exponentially; the more data they have, the more accurate the prediction. But how long is it before our systems decide that the lever at the side of the track is just an unnecessary variable?

7 responses to “Data ethics: what if data predicts a student will leave?

  1. An interesting argument and one that has come up in a couple of the institutions I have worked in. Too often the resource has been piled into chasing those who have completely disengaged early on and are unlikely to be brought back at the expense of people who are more engaged but struggling and where targeted support would benefit them and probably get them to graduation.

    I would take issue with whether any algorithm is sufficiently reliable to accurately predict who will drop out and there is a lot to be said for the experienced staff members working in the engagement area whose judgement may be as, or if not more, reliable.

    However a certain amount of this is driven by our outdated applications process, that sees too many students in institutions or on courses that they are not suited to or do not like. Deal with that problem and then there may be more resource available to deal with all the potential drop outs that the machine predicts

  2. There is almost nothing about the premise set up in the article that rings true. The supposed certainty of the analytics sounds more like it belongs in the movie “Minority Report” than the world we live in in 2019. Neither does the choice about how or whether to intervene bear any resemblance to “The Trolley Problem”. We serve our students by ensuring there is always a commitment to act on and not just analyse data that we think might help. We may not alter the outcome every time but we will have discharged our duty as educators. In acting, we learn how to intervene more effectively with the next individual and also understand how we calibrate our overall approach at an organisational level (as highlighted in David Ealey’s comment).

    At the risk of giving the premise too much credence, why save 5 at the expense of 1 or 1 at the expense of 5 when there’s no reason why you can’t at least try to save all 6?

  3. It seems to me that an “the algorithm is right” approach could be a self-fulfilling prophecy. If the algorithm “learns” that students in situation A will leave even with support, while students in situation B will stay if supported, and support is then given on that basis, then the obvious future result is that more students in A will leave in future and more in B will stay – “proving” the algorithm right and reinforcing its calculation the next time round.

    There is a major potential – especially given the opacity of AI decisions and difficulty in determining “why” they make that decision – for this to result in discrimation, if it were to turn out that e.g. most of the students in situation A were black, and most of the students in situation B were white – and because of the self-fulfilling issue to replicate and amplify any existing discrimination.

  4. Beyond the data in the real world…. Perhaps the experience of a student entering University and dropping out is as valid an experience for them to receive, as it is for the student who enters and leaves with a first class degree.

  5. Analysing data about a person and using AI technology/systems to create new ‘data’ which may be ‘suspect’ (ref: David Ealey) seems to have some reputational risk associated with it under GDPR (and damages if incorrect?). If this is personal data which is deemed ‘sensitive’ under the GDPR classification rules, are there not serious policy issues regarding its handling and sharing, however well-meaning? Insurance issues? Mental health assessments made by a staff member might be considered personal and confidential, not to be used without permission. Perhaps it is not GDPR-compliant to use such data ‘as fact’ and input to an AI algorithm or application? Has this set of issues already been handled in HE policies anywhere?

Leave a Reply