Machine learning, which is a subset of Artificial Intelligence (AI), is a powerful process where algorithms are trained to make statements, decisions or predictions based on the data they are given. The use of machine learning in medical devices has become increasingly popular in recent years, especially in diagnostics, where it is being used to diagnose heart disease or detect cancer in mammography.
While AI clearly has a lot of potential, it is also hugely dependent on data. As the saying goes, you are what you eat, meaning these AI systems are only as good as the data they have been fed.
When we select data to use for machine learning, it must be representative, good quality and large enough to try to eliminate bias. Bias in AI occurs when some features are given more importance than others, in order to better generalise larger datasets that have various other features. This can, however, cause the algorithm to ‘learn the wrong thing’ by not considering all aspects of the dataset.
As a result, developers can unwittingly code their own bias into their algorithms.
How to avoid bias in machine learning
Removing bias completely is impossible, but it is important to be aware of where it might come from.
When collecting data, it is important to be aware of how the data was originally sourced and in which environment. Data gathered in one city might not be representative of a country as a whole for example, therefore it’s important to have good coverage and be representative of different ethnic groups, gender, and age. For example, if the intent is to create an algorithm to diagnose heart disease, the data must include people with and without heart disease and cover all genders, ages, and races within the target group.
Bias can also occur when pre-processing the data before use. This might occur when removing or replacing invalid or duplicate data that is considered irrelevant, whereby important data could accidentally be removed by developers. It’s key to carefully consider how any piece of data could potentially influence an algorithm before deleting it, to ensure the outcome is as representative as possible.
Data labelling is another key area where bias might occur in machine learning development. For example, a person might be asked to look at a picture and label it depending on what is in the picture, to help the AI understand different images. It is important to have a diverse group of people carry out such labelling, in order to avoid introducing the bias of one person into the dataset.
When applying modelling techniques to the data, it’s important to be aware of potential false positives and false negatives that might occur, especially when diagnosing health conditions. If we consider the heart disease algorithm example again, if the model shows higher false positives or false negatives for women than men, the data might not be covering each gender equally. By splitting the test data based on gender, you can identify whether the false positive and false negatives are higher for women than men, or reverse. Then, you can improve your model or explore different modelling techniques to address this.
Even with a good quality dataset however, your algorithm can still be biased. This bias can be unconscious and can go unnoticed, especially when working with ‘black box’ algorithms where you can’t follow its exact calculations and have no idea whether they are biased or not, or why.
As an example, when Google released word2vec, which transforms words into vectors to understand correlations and connections between words, they did not realise that their neural network was biased for two years. Being able to represent words as vectors means you can add, subtract and multiply them. For example, when entering ‘Paris – France + Italy’ into the word2vec algorithm you would get Rome, but when entering ‘doctor – man + woman’ you would get nurse. This neural network had been trained on an enormous dataset gathered from media and the internet, so the problem was not that there was not enough data, but that the data included unconscious bias from society.
Another example of unconscious bias in society is that women are less likely to be diagnosed with heart disease because medical students are taught less about female symptoms than about men’s symptoms. This is further amplified when looking at heart disease data, which seems to be dominated by male data, with women making up only 25% of participants across 31 landmark clinical trials for congestive heart failure between 1987 and 2012. If an AI model were to be used to diagnose heart disease, it would most likely pick up on this bias and be quicker to diagnose heart disease in men, and possibly fail to diagnose it in women.
Sexist bias was also found in Amazon’s recruitment algorithm, which was supposed to automate their recruiting process. The algorithm was designed to go through job applications and rate the applicants using AI models. However, Amazon soon realised it showed bias against women as it punished resumes that included the word ‘women’s’ by downgrading the score it gave them. The Amazon developers had used data going back 10 years, which contained bias against women since the technology industry was mainly dominated by men, with men forming 60% of Amazon’s employees during this period.
Consciously including more women in heart disease clinical trials, or adding female symptoms to the training data for an algorithm that diagnoses heart attacks, could help eliminate unconscious bias within hospitals. Similarly, by teaching AI models to value more gender-equal terminology, Amazon could have a more gender balanced team.
The ultimate goal would be to de-bias society altogether, but until that happens, we can at least try to reduce gender bias in our machine learning algorithms. As we continue to incorporate machine learning algorithms into our way of life, it will be important to have diverse teams of developers that are aware of bias and have processes in place for identifying and mitigating against it, before releasing their algorithms to the market.