How to Prevent Data Discrimination

essidsolutions

Algorithms can be less biased than humans and improve successful outcomes of things. However, algorithmic bias is never completely eradicated, since data models are trained by humans. Here are a few ways in which data discrimination can be minimized.

Bias in algorithms is a hot topic due to the risk of imparting our ‘human condition’ onto machine learning models. While technology has evolved, this ‘human condition’ has been unchanged in millennia. Constituting undesirable behaviors, such as affect heuristic – basing decisions on emotion instead of reason, and confirmation bias – seeking out information that supports preconceived ideas.

Thinkers as far back as Plato discussed a ‘World of Ideal Forms’ as the basis from which our imperfect world is derived, even indicating that the world we live in is not capable of reaching a utopian state. This may mean algorithmic bias is never completely eradicated, since data is collected by, and data models are trained by imperfect humans.

Don’t Panic! Algorithms can be Better Than a Reflection of Ourselves.

The important message to consider is that algorithms can be less biased than humans (do as we say, not as we do). Like self-driving cars, the realistic objective is not to eliminate accidents altogether, but to reduce the number of road accidents which are typically caused by human error (according to NHTSA, 94% of all road accidents are caused by human error).

If algorithms can be less biased than humans and improve successful outcomes of things, like mortgage applications and finding the right job candidate which has already been demonstrated to some degree, why worry?

Many cases picked up by the media are the result of awry, yet unintentional consequences. Safeguards clearly need to be in place for these eventualities, but also for algorithms with ulterior motives, purposely created to be exploitative or favor certain types of outcomes.

How do we Create the Transparency Needed to Mediate on Objectivity?

IBM has produced an AI fairness 360 toolkit or algorithm, which will be able to create an ‘integrity ranking’ for algorithms by identifying bias. This sort of technology can assess at the data gathering, model training, and even the real-time decision-making stages, suggesting bias mitigation accordingly.

A concern for business is that a sub-optimal real-time decision can cause negative consequences – but retrospective decisions, in areas, such as fraud, may be too late and harm businesses even more. Not just financially, but reputationally. Take airport security for example – it’s an inconvenience for many of us, but it puts passengers’ minds at ease, and is worth the pain to avoid catastrophes, such as terrorism that can cause the loss of whole businesses, and more significantly, human lives.

However, how fair is the fairness algorithm? In policing, racial prejudices impacting data by causing disproportionate numbers of arrests within certain ethnic groups may be identified by the fairness toolkit and compensated for accordingly. Although, if algorithms are altered to compensate for this original bias; over time, this could result in another ethnic group becoming disproportionately affected by a negative feedback loop unless a diverging trend is picked up at an early stage.

The algorithms in the fairness toolkit require an advanced understanding of how the data can be misused (i.e., knowing what the potential discriminatory variables are). Criteria, such as race or religion, may be obvious. Perhaps less obvious is the fact that a customer address can discriminate against an area, where households are perceived to be of low-income earnings, which could be used to affect decision making on credit approvals.

Gathering data to obtain this understanding, in the first place, may also be an obstacle. If legislation prevents data gatherers from collecting certain information, an algorithm might use underlying correlating data when it’s trained to elicit inherent bias, therefore being present, but unidentified and subsequently impossible for the fairness toolkit to correct.

While the toolkit is a promising development, its limitations mean the following steps should still be taken to mitigate bias prior to the need of a corrective tool:

1. Ensure Data Quality

Scarcity of data and poor data can prevent a data initiative from ever getting off the ground. Having an organization that is not bought into a data-centric vision means potential insight that could give an organization the edge over the competition is misreported or squandered sitting on individuals’ C: drives. Implementing and enforcing processes around data collection mitigates this risk and provides a solid foundation.

2. Harvest a Wide Variety of Training Data

Bias can be present due to poor training data sets. For instance, only training facial recognition technology on white male faces obtained from an internet search engine leads to poor results as the algorithm ‘overfits’ this type of face; not doing well when analyzing non-white or female faces. Although customer data may be the only applicable data source for an organization to train with, training algorithms on entire customer portfolios rather than limiting models per customer will incorporate a greater level of diversity and likely improve machine learning predictions.

3. Use a Diverse Group of Data Scientists

Similar to using a wide variety of data, a wide variety of people creating the algorithmic models goes some way towards mitigating bias. Using the same facial recognition example, it is white men that are more likely to select training data of white male faces to subconsciously benefit themselves. Therefore, employing a collaborative mix of people can allow these subconscious biases to cancel each other out.

4. Be Transparent

Most algorithms currently produce a decision with no explanation of how it got there. Unless the result is clearly wrong based on human knowledge and experience, there is no way for a human to intuitively decide if the result is correct (an algorithm arriving at the correct answer, but for the wrong reasons is also undesirable). **Allowing visibility into how algorithms work through methods, such as explanation steps will increase trust between business and consumer as it generates greater understanding.**

5. Test, Test, and Test Again

New data is constantly being created, which is why testing can never be overlooked. If testing is not practiced on an ongoing basis, over time, an algorithm may become overfitting or underfitting, making it unsuitable either through bias or providing no actionable insight. Keep on top of algorithms by reviewing them regularly to ensure they remain fit for purpose.

As with technological advances throughout history, we need to examine how we implement algorithms in various situations and measure the outcomes they produce.

Identifying and addressing bias in the humans who are developing algorithms, not least the data used to train them, will help to ensure that machine learning and artificial intelligence-driven systems benefit the ‘many’, not just the ‘few’ who would exploit them.