AI and codified sexism

Social biases and oppressions are dangerously perpetuated through data.

In 2015, software engineer Jacky Alciné drew attention to the fact that Google Photos’ image-identifying algorithm  classified him and his black friends as “gorillas”. Google was “appalled” at its mistake and promised to rectify the issue. However, more than two years later, WIRED investigated whether any fixes had been made and instead found that Google had censored searches for “gorilla”, “chimp” and “monkey”. It is difficult to imagine how an internationally leading corporation in commercial AI has been unable to fix this error. This unfortunate situation emphasises the ease and scale at which AI codifies discrimination. 

As technology is built by humans, it should come as no surprise that artificial intelligence systems inherit our biases. With most black box AI systems being trained with biased data, it is clear that the growing use of AI programming can threaten the equality that women worldwide have worked so hard towards. Essentially, algorithms that learn from discriminatory data will exhibit these same patterns in outputs. 

If there is sexism embedded within data, AI systems will not only display the same prejudice in their output, but also amplify these patterns of oppression with their computational powers. As most tech entrepreneurs, programmers and computer scientists are male and/or white, it is no surprise that women, particularly women of colour, are left marginalised in datasets. For example, according to Caroline Perez, a British feminist journalist and activist, seatbelts, headrests and airbags in cars have been designed using data that refers to male physicality, leaving women 47% more likely to be seriously injured and 17% more likely to die than a man in a similar accident. Similarly, with the use of biased AI in facial recognition systems, resume hiring tools and the consumer credit industry, outputs only reinforce the gender bias in society. It is also important to note that AI systems are optimised for efficiency and that being ‘antisexist’ and ‘antiracist’ is extremely difficult to code for, and more importantly, in the well-oiled machine of capitalism, unprofitable. 

This issue of codified inequality has also come to the attention of cyberfeminists such as Joy Buolamwini, a computer scientist and digital activist. Buolamwini’s MIT-based research brought attention to racial and gender bias in AI services used by mega corporations such as IBM, Amazon and Microsoft. More specifically, she found that the input of various facial recognition algorithms consisted of images that were 80% white persons and 75% male. As a result, the system had a high accuracy of 99% in correctly recognising male faces. Black women, however, were only recognised 65% of the time. This study highlights the ingrained and complex nature of bias within AI. It becomes clear that focusing only on gender is not likely to address other intersectionality biases that it is so closely intertwined with. 

Fortunately, all is not lost. There are certain measures that we must take in order to limit the prejudice that can emerge. First, Buolamwini suggests creating more inclusive coding practices. According to the World Economic Forum, only 22% of AI professionals are women.  Equality and accessibility is important — those who code will determine just how effective an algorithm is, and more importantly, who it works best for. Having a diverse team that is able to check each other’s blind spots, and that is dedicated to breaking cycles of oppression, is crucial, especially if we are to change the current trajectory of AI bias. 

The next step involves considering how we code; what data scientist and author of ‘Weapons of Math Destruction’ Cathy O’Neil calls a data integrity check. This ensures that context is taken into account and unbalanced data sets are not used, preventing AI from widening the gender gap. Let us take Amazon’s recruiting tool as an example. The system was designed to use AI to find the best job candidates, giving resumes a rating ranging from one to five stars. Using data from past patterns of resumes submitted to Amazon and which were successful sounded like reasonable data to input. However, failing to take into account men’s dominance within the tech industry, and perhaps their bias in choosing successful applications, meant that Amazon’s system had taught itself a gendered preference and punished resumes that included the word “women’s”. Thus, a data integrity check involves consciously considering context and how input data, whether it be pictures or previous credit scores or criminal offences, are influenced by ongoing marginalisation within society. 

Another issue to address is the definition of success within an AI system. Does the output take into account equality and freedom or is it only optimised for the most efficient and profitable end result? Along with this, O’Neil raises the importance of accuracy. Black box AI systems do not show their internal processes and thus there is little to no accountability for their outputs. It is especially destructive when these results are taken as objective, particularly because they create life-threatening realities for people based on proxies and data that may not even be related. Judges in Idaho and Colorado for example, have resorted to using machine-generated risk scores to determine guide sentencing decisions. This LSI-R model includes a lengthy questionnaire for prisoners to fill out. Whilst some questions such as, “how many previous convictions have you had?” are directly related to the criminal offence and circumstances of the individual, others such as, “the first time you have been involved with police”, are discriminatory towards minority groups, who we know are overpoliced. The inclusion of questions asking about a criminal’s upbringing, family or friends are not relevant to the criminal case or sentencing. Unfortunately, this is then used to gauge recidivism risk, which then can create new feedback loops of incarceration, poverty and ultimately raise that same recidivism risk if they ever again encounter the legal system. 

The creation of these feedback loops seems obvious, and yet nothing is done to change them. This is simply because there is much money to be made from inequality. It is clear that corporations cannot be trusted to keep themselves in check and thus we see that the role of keeping AI transparent and accountable lies first with us. This is not to say that AI cannot be used for good. Consider the work of Mira Berstein, a mathematician with a PhD from Harvard University. She created an AI model for a non-profit company called Made in a Free World that detects slave labour. Its goal is to help companies detach themselves from slavery-built components in their products and services. The key element here is that the model only points to suspicious places and leaves the rest to human investigators. Then, whatever is discovered comes back to Bernstein so she is able to continuously improve the model with this feedback. What we see here is a good AI model, one which does not overreach and create destructive realities for vulnerable individuals. 

Ultimately, we see that there is much progress to be made and that the role of cyberfeminists, data activists and politically-aware mathematicians within the AI sphere is becoming more and more crucial. The virtual internet world is not the neutral utopia it presents itself as and it is evident that it is inextricably intertwined with the sexist oppression of the physical. Failing to interrogate the biases inscribed within cyberspace only hinders the ability of cyberfeminists to challenge the patriarchy. It is through the web that feminists can continue to connect women from all over the world and work to overcome the male landscape of the physical and the internet.