Table of Contents
This section goes into more detail on testing for bias and applying definitions of fairness to the datasets. It therefore requires a higher level of technical comprehension.
We train simple models on each dataset to serve as a baseline before we try any of the possible interventions. You can run our analysis yourself on Binder, for the finance use case click here and the recruiting use case click here.
We start by looking at the Adult dataset for the finance use case.
Demographic parity requires that we treat all demographic groups equally. We start by investigating the disparity between the sexes. We do this with box plots of the model scores. A higher score means the model thinks the individual is more likely to be a high earner. It's clear that there is a major disparity between men and women, with men being awarded systematically higher scores. This model therefore does not achieve demographic parity by some margin.
We do the same thing for race. Again we see that there is a major disparity between races, with White and Asian individuals systematically receiving higher scores, and Black and Native Americans receiving much lower scores.
Conditional demographic parity requires that all demographic groups are treated equally, once certain legitimate risk factors are taken into account. We will take hours worked per week as the legitimate risk factor. We bin individuals according to how many hours per week they work, and compare the distribution of scores within those bins first between the two sexes. As we can see in each bin women receive lower scores than the men. In other words, the model believes that women who work the same number of hours as men are less likely to be high-earners.
We repeat this, grouping individuals by race, and again see that the disadvantaged demographics are less likely to be high-earners, even if they work the same number of hours.
Equalised odds requires that we treat groups equally once outcomes are taken into account. That is to say, among high earners the model predictions are similar or the same for all demographic groups. Likewise among low earners the model treats different groups equally. Once again we compare score distributions using box plots. We see that high earning women generally receive lower scores, and low earning women consistently receive lower scores.
We see a similar picture when comparing the different races. White and Asian high earners are likely to receive higher scores than high earners of other races, and similarly for low earners.
We notice something interesting when measuring equalised odds, which is that women under the baseline model actually receive higher accuracy predictions than men. This is ultimately because women are far less likely to be high earners, and so the model mostly predicts that they are low earners with confidence. This is reflected in the box plot above where we see the score distribution for low earning women is extremely narrow and concentrated around the low scores.
The question therefore is who is being disadvantaged by this model? On the one hand we might argue that women, who receive systematically lower scores are disadvantaged, as they have less access to the positive, privileged outcome. On the other hand, the classification accuracy for women is better, which we might argue advantages them. Specifically the higher classification accuracy means that they are less likely to be marketed to or granted a credit card or a loan which they cannot afford, and hence less at risk of default and the negative consequences that come with that. This observation once again underlines the need for understanding the context of the problem, and making a careful determination of what unfairness actually means.
Next we repeat a similar analysis on the synthetic recruiting data.
As we did for the Adult data, we use box plots to visualise the disparity between the sexes. In this dataset women are also systematically given lower scores, indicating the model believes they are less likely to be hired.
We see a similar patter when splitting by race, with White individuals systematically more likely to be hired.
For the recruiting data we use years of experience as the legitimate risk factor. We bin individuals according to how many years of experience they have, and compare the distribution of scores within those bins between the two sexes. As we can see in each bin women receive lower scores than men, showing that given a man and a woman with the same experience, the model will still be less confident that the woman should be hired.
Repeating this for race we again see a similar pattern, Black applicants are less likely to be hired even if they have the same amount of experience.
Equalised odds requires that we treat groups equally once outcomes are taken into account. Here we plot the distributions of model predictions for successful applicants and unsuccessful applicants.
Once again we see a similar picture when comparing White and Black applicants, with Black successful applicants generally receiving lower scores and Black unsuccessful also receiving lower scores.
We trained simple models without any attempt to mitigate bias, instead only optimising for performance. On both datasets we observe that the model fails to meet the requirements of all definitions of fairness we test for. Hence we should not expect to be able to achieve these notions of fairness without intervention.
However we also observe that there are some clear tensions between different notions of fairness. For example, while disadvantaged groups receive lower scores on average, they also tend to be classified more accurately. Determining fairness in this context requires us to balance the harm caused by incorrect decisions with the advantage conferred by a positive prediction. Striking such a balance is highly context dependent, and must be done through careful consideration of the task and possible impacts rather than through application of any particular algorithm.
Next we will investigate a number of bias mitigation algorithms from the literature, their open source implementations, and the results of using them to train unbiased models.