5 Ways To Avoid Bias in Machine Learning Models


Machine intelligence is dubbed as a “holy grail” of the computing world and can pave the path for “singularity” — a point when machines match human intelligence. But at this point, machine learning (ML) singularity is far from achievable. While the saying “There are three kinds of lies: lies, damned lies, and statistics” may not be entirely true, statistics also tend to portray a skewed picture. Put differently, how statistical results are interpreted can cause falsehoods, and machine learning decisions (statistical analyses at heart) are bias-prone. 

Let us first look at how machine learning aids decision-making to understand how bias might creep in. 

How Do ML Models Make Decisions?

ML models use neural nets to interpret statistical data. Neural nets comprise multiple decision-making nodes linked together, analyzing input data to generate a corresponding output. In the initial stages of ML training, all the nodes are equally important to decision-making, and training data that is fed to the model must match a predetermined output. To bring about this match, the neural net undergoes incremental changes via a feedback loop, with its component nodes changing their relative importance for more accurate decision-making. 

It is this weightage assigned to each node that influences the decision flow. After many iterations and feedback loops, the ML model achieves that perfect balance in node weightage to automatically generate the desired decisions when fed test data – at this stage, the ML model is considered “trained.” 

This has plenty of applications in the real world, from analyzing visual traffic data for routing traffic to facial recognition in security. 

Learn More: Data Science vs. Machine Learning: Top 10 Differences   

The Risk of Bias Creeping Into the Decision-making Process 

In reality, it is difficult to make ML replicate human decision-making processes effectively and ethically. A massive challenge for ML training and implementation is the risk of bias. A recent studyOpens a new window revealed that bias is among the two biggest in AI/ML today for nearly half of industry professionals. Just 15% of AI/ML teams are currently addressing this issue. Consider the case of Microsoft’s experimental AI chatbot, Tay. Microsoft designed Tay to interact with 18-24 year-olds on Twitter, to get training on conversational speech, and respond in a near-human way. Within hours, online miscreants trained Tay into repeating racist, sexist, and anti-semitic slurs, taking advantage of its content-neutral algorithms. 

Source: ML study by Carnegie Mellon UniversityOpens a new window

While I don’t blame the neural net used – the negative inputs caused Tay to become biased in the absence of any preventive mechanisms. 

Learn More: Beyond the Hype: Combining Machine Learning with Operational Analytics 

How Does Bias Creep into ML?

There are several ways bias can be introduced to the ML decision-making process: 

  • The human factor – Machine learning trends to mimic human behavior (however crudely), and human behavior is often biased. For example, more primitive forms of facial recognition technology are biased towards recognizing caucasian and male faces. A recent study by the National Institute of Standards and Technology (NIST) found that facial recognition was most accurate for “lighter male” and least accurate for “darker female” across providers. 

Source: NIST study findings published by Harvard University Opens a new window

  • Poor quality of training data – If the training data is incomplete or does not provide a proper balance in the range of data supplied, then the neural net will be trained in a similarly skewed manner. 
  • Model performance mismatch – This is what happens when ML training data does not match the data it will be tested on (real-world or test data).  For example, let’s say you are training an ML model to recognize animals, and you have used the following for training: 

Source:  freeCodeCampOpens a new window

But when you test the ML model (i.e., try it in a real-world scenario, it will get far more high-resolution images that are difficult to identify based on the training it has received). Model performance mismatch leads to incorrect decisions with a high risk of bias. 

Obviously, we need a way to detect bias, and upon detection, remove or negate it. While this isn’t easy, you can take a few actions to lessen the impact of bias. 

Learn More: Adaptive Insights CPO on Why Machine Learning Is Disrupting Data Analytics 

5 Best Practices to Minimize Bias in ML

There are several steps you can take when developing and running ML algorithms that reduce the risk of bias. 

1. Choose the correct learning model 

There are two types of learning models, and each has its own pros and cons. In a supervised model, the training data is controlled entirely by the stakeholders who prepare the dataset. Ensure this group of stakeholders is equitably formed and have received unconscious bias training. On the other hand, an unsupervised model depends on the neural network itself to detect bias trends. This means that there should be some difference between the input data and the output result, factoring in bias prevention techniques so that the neural network learns to distinguish between what’s biased and what isn’t. 

2. Use the right training dataset 

The current state of machine intelligence is only as good as its training data. The training data you feed into the neural network must be comprehensive, balanced, replicating real-world scenarios like demographic composition, and not contain humans’ biased predispositions. A good rule of thumb is to try and not reuse datasets – for example, data from an area with an ethnically diverse population cannot be applied to a region with predominantly a single race and vice versa. 

3. Perform data processing mindfully 

Machine intelligence involves three types of data processing: pre-processing, in-processing, and post-processing.  When you prepare datasets in pre-processing, bias can creep in during formatting before it is fed in the neural network. Any data that could introduce a bias should be excluded in this step. With in-processing, the data is manipulated as it passes through the neural network itself – so, the weighting of the neural nodes must be correct to prevent a biased output. Finally, ensure there is no bias when interpreting data for human-readable consumption in the post-processing stage. 

4. Monitor real-world performance across the ML lifecycle 

No matter how carefully you choose the learning model or vet the training data, the real-world can throw up unexpected challenges. It is important to not consider any ML model as “trained” and finalized, not requiring any further monitoring. Also, try and use real-world data for testing ML wherever possible so that bias can be detected and corrected before it creates a situation affecting human lives negatively.

5. Make sure that there are no infrastructural issues

Apart from data and the human factor, the infrastructure itself could cause bias. For example, if you rely on data collected via electronic or mechanical sensors, then equipment problems can introduce bias. This is often the hardest type of bias to detect and needs careful consideration, with investment in the latest digital and technology infrastructure. 

These five best practices should form the starting point in the discussion around bias in machine learning. 

Depending on the application, the algorithmic structure, and the statistical model, other options can be used to evaluate potential bias conditions and correct them. Another important measure is introducing the study of ethics as part of technical education so that programmers, data scientists, and business leaders approach ML with an acute understanding of its risks. 

What are your recommendations for reducing bias risk in ML? Let us know your thoughts on LinkedInOpens a new window , TwitterOpens a new window , or FacebookOpens a new window . We would love to hear from you!