Can algorithms ever be fair?
Algorithms and data are increasingly influencing the justice system. Professor Sofia Olhede and Dr Patrick Wolf explain the basics all lawyers should understand.
What is an algorithm?
An algorithm is a list of rules that are automatically followed in step by step order to solve a problem.
When considering their consequences, algorithms cannot be separated from the data used to operate them. Indeed, dirty data can trip up the cleanest algorithm, leading to questions of fairness and bias in automated decision-making that we shall explore.
Machine learning is a category of algorithm that allows software applications to become more accurate in predicting outcomes without being explicitly programmed.
As researchers in 2017 summed up:
"Machine learning can impact people with legal or ethical consequences when it is used to automate decisions in areas such as insurance, lending, hiring, and predictive policing. In many of these scenarios, previous decisions have been made that are unfairly biased against certain subpopulations, for example those of a particular race, gender, or sexual orientation. Since this past data may be biased, machine learning predictors must account for this to avoid perpetuating or creating discriminatory practices."
Algorithms in decision making
Recent years have raised considerable debate about algorithm-assisted decisions and fairness, especially in legal and policy contexts. See the 2017 UK parliamentary inquiry: Algorithms in decision-making inquiry.
The watershed moment was perhaps the deployment of a commercial system in the US called Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), designed to provide risk assessments to assist the criminal justice system. Public attention swiftly followed a ProPublica 2016 story about COMPAS titled Machine Bias: There's software used across the country to predict future criminals. And it's biased against blacks. This led to a wider, ongoing debate about bias and fairness in the context of automated decision-making.
Bias and statistics
Bias has a recognised technical meaning in the field of statistics, starting with a population from which we wish to draw conclusions (such as the population of all criminal offenders in a given jurisdiction, and their average likelihood to re-offend) based on a sample of collected data, and an algorithm which calculates a conclusion using these data as a set of self-executing steps.
If the algorithm and data sampling mechanism is systematically "off", so that the calculation is not correct "on average" with respect to the entire population, then we say the resulting conclusions are biased.
Typical bias examples:
- selection bias - drawing conclusions, say, on the basis only of offenders who were apprehended, rather than all offenders
- reporting bias - where, by polling offenders directly, we might expect them to under-report their own likelihood to re-offend
Bias can be insidious and hard to recognise - even more so as data become dirtier and algorithms more complex. Some vivid examples are provided by confusing reported pothole prevalence with smartphone prevalence, or CEOs with white men. A moment's reflection will convince you that bias can be present because we are not getting a complete picture of the entire population from which we wish to draw conclusions, and also because humans have drawn biased conclusions before, and algorithms are picking up on these and simply codifying them into practice.
What is fair?
Even setting bias aside, how might we characterise an algorithmic conclusion as "fair"? To begin to answer this question, we must reconcile the dictionary definition of fairness, which gives us the concept of equal treatment of those in the population about which conclusions are drawn, devoid of favouritism or discrimination, with a mathematical definition applicable to algorithms.
For an algorithm (a set of self-executing steps), assessing the fairness of treatment would come back to the mathematical principle underlying the direction of the self-executing steps. Usually this principle can be cleanly stated as an algorithmic design criterion, such as: "Assign a likelihood-of-re-offence score that is maximally consistent with known re-offences in a certain set of historical data." This principle is the same one by which websites recommend new products for you to try, but in the context of the justice system we can see quite clearly how issues of algorithmic bias and fairness might arise!
Algorithmic design criteria
In the discussion surrounding COMPAS and criminal risk assessments, squaring the dictionary and mathematical definition of "fair" quickly led to new and newly relevant questions for algorithm researchers to consider.
Even the simplest notions of fairness must, once formulated mathematically, be understood at the level of individuals affected by algorithmic conclusions and decisions, and at the level of groups of individuals with common characteristics.
Much recent research has focused on how algorithmic design criteria might need adjustments to avoid taking account of legally protected characteristics that we might find in data, such as gender or race. As gender and race are often associated with other measured variables that may not be legally protected, doing so becomes rapidly very complex.
It might well be "fair" and appropriate in some abstract sense to adjust algorithmic conclusions systematically for a given group – such as offenders in a certain age bracket – whereas adjusting conclusions differently for individuals within the group might be seen as unfair. Such group-level adjustments are also possible within the setting of algorithmic decision-making.
Biased samples of data can obviously lead to unfairness because we inherit the bias in the data. But what is "fair" in a more abstract setting? This is more contentious, and arises in settings where groups of individuals might claim unequal treatment under the law. For example, city-dwellers might be less likely (on the basis, say, of police presence per capita) to be apprehended for a given class of criminal act than country-dwellers.
Should an algorithm then account for this, and treat apprehended offenders differently if they live in different-sized locales? More data gives us more options, and potentially the ability to give a more nuanced response in any particular decision about an individual--just as we would expect experienced judges to do in criminal law.
The questions remain as to when and whether more nuanced responses ought to be applied, and unlike relying on the experience of a judiciary, we have yet to formalise algorithmic decision-making to the point where it can take into account reason, common sense and precedent, as well as a modicum of prudent personal wisdom, assessment and judgement.
Views expressed in our blogs are those of the authors and do not necessarily reflect those of the Law Society.
A Chouldechova, Fair Prediction with Disparate Impact: A study of bias in recidivism prediction instruments, Big data 5 (2), 153-163.
Kleinberg, Mullainathan and Raghavan, Inherent Trade-Offs in the Fair Determination of Risk Scores, Proceedings of Innovations in Theoretical Computer Science (ITCS), 2017.
Kusner, M.; Loftus, C.; Russell, C. and Silva, R. (2017) Counterfactual Fairness. Advances in Neural Information Processing Systems (NIPS) 30
Dwork C, Hardt M, Pitassi T, Reingold O, Zemel RS. 2012 Fairness through awareness. In Goldwasser, S. (ed) Innovations in theoretical computer science, pp. 214–226. New York, NY:Association for Computing Machinery.