Bias
Mitigating Bias
According to Hort et al., there are three main points in model creation at which bias mitigation (also known as “achieving fairness”) could be attempted, based on their examination of 341 publications. These are:
- Pre-processing: bias mitigation in the training data, to prevent it from reaching machine learning models;
- In-processing: bias mitigation while training the models; and
- Post-processing: bias mitigation on previously trained models.
Pre-processing involves approaches such as relabelling (making the ground truth labels closer to the ideal, unbiased labels) or sampling (reweighing, redistributing, or otherwise adapting the impact on training), synthetic data generation (to supplement current data), cleaning the data (removing gender/race markers; removing certain words), adversarial debiasing (using a specially trained model alongside the main model), capping outliers (to make the data more representative) etc., on the data set before training occurs (Hort et al., 2023).
In-processing can use many of the same techniques (e.g., adversarial training, reweighing, etc.), but occurs during the training of the model. Mitigation at this stage can include other approaches having to do with the architecture of the model (such as sensitive attribute embedding and addition of bias correction layers).
Post-processing can be useful when re-training the entire model is out of scope, and the choice of approach will depend on the type of bias in the model and the level of desired fairness. Some post-processing approaches include ranking (re-ordering recommendations, etc.), calibrating the model’s predictions to the true probabilities of outcomes, and equalizing thresholds to ensure equal false positive and false negatives across different attribute groups, among others (Hort et al., 2023).
Companies and researchers have been working on bias mitigation for years; after releasing GPT-3 (ChatGPT’s precursor) in summer 2020, OpenAI determined that it could “curtail GPT-3’s toxic text by feeding the program roughly 100 encyclopedia-like samples of writing by human professionals on topics like history and technology but also abuse, violence, and injustice” (Johnson, 2021). Nonetheless, when ChatGPT was first released, OpenAI’s CEO, Sam Altman, suggested that people could “thumbs down” racist and sexist ChatGPT output in order to “improve” the tech. This led many to express dismay that this multi-billion-dollar company was relying on users to address such fundamental problems. Steven T. Piantadosi, head of the computation and language lab at the University of California, Berkeley said, “What’s required is a serious look at the architecture, training data and goals…That requires a company to prioritize these kinds of ethical issues a lot more than just asking for a thumbs down” (Alba, 2022).
Earlier in this section, we talked about a workshop where GPT-3 was tested on generating text about religions using the prompt “Two ___ walk into a….” the results showed that GPT-3 mentioned violence rarely when talking about other religions but generated something violent nine out of 10 times when prompted about Muslims. Abid et al. demonstrated that using positive adjectives in adversarial (re)training reduced the number of violence mentions about Muslims by 40 percentage points (Abid et al., 2021).
However, feeding the model fact-based articles and injecting positive text are not the only bias mitigation techniques. In 2021, Facebook AI researchers were prompting chatbots to produce insults, profanity, and even hate speech, which human workers labelled as unsafe. These were then used to train the models to recognize toxic speech (Johnson, 2021).
Rather than attempting to reduce bias in an extant tool, some groups are choosing to build their own. Latimer (named after African American inventor Lewis Latimer) is an LLM designed to mitigate bias and build equity, offering “a more racially inclusive language model experience” (Clark, 2023). Latimer builds on Meta’s Llama 2 model and OpenAI’s GPT-4, emphasizing African American history and culture in the datasets, thereby integrating “the historical and cultural perspectives of Black and Brown communities” (Clark, 2023).
Among the numerous mysteries about how LLMs function is that the models tend to generate more toxic output as they get bigger; OpenAI researchers say they don’t understand why that is (Johnson, 2021). Experiences in February 2024 with Google’s new Gemini tool have shown that bias mitigation is not as straight-forward or effective as it seems: Opinion: Female popes? Google’s amusing AI bias underscores a serious problem; Google Left in ‘Terrible Bind’ by Pulling AI Feature After Right-Wing Backlash.