AI Bias and Data Quality

Artificial intelligence is not immune to human bias. Regardless of where AI is headed in terms of self-propagating algorithms and fully automated decision-action processes, the initial AI architecture is human-generated. 

For better or for worse, humans have both implicit and explicit biases. Our brains are processing multiple environmental inputs (i.e., data) even while we’re sleeping. Thus, we use heuristics (in general, “a rule of thumb”) to help hasten decision making. Those heuristics degrade into biases over time, where we start overgeneralising in favour of one thing over another. So, it’s not surprising that AI is patterning this downhill flow as well.  

Data quality plays a key role in mitigating the infiltration of biases. But, what is “high-quality data”?

Is the data both accurate and relevant?

This requires laying out an initial question or set of questions centred on a problem (or a problem set) you’re attempting to solve. Bias can manifest at any stage of the data collection, processing, and analytics cycle. As such, you need to maintain awareness of the likely bias “leaks” that occur while you’re determining data accuracy and relevancy. By virtue of sifting and cleaning data, you’re also making conscious (and possibly unconscious or subconscious) decisions about data accuracy and relevancy. 

Is the data complete and valid? 

Completeness and validity are a by-product of your problem definition, which -- once again -- presents us with another “bias” trajectory. It can mean, quite simply, that there are no missing or null values within the relevant population parameters, and the data measures what you’ve intended it to quantify. At each stage of our data quality decision making, we need to differentiate between an actual bias and simply focusing on a particular problem/solution goal.

Thus, conscientious data quality discernment should strike a balance between the need to “define” (i.e., identifying a problem and searching a solution space) and the relevancy and applicability of that definition with respect to the data and our own possible biases. The persistent self-analytical question to ask, before we expedite our offloading of such tasks to AI, is “why did I choose to do X instead of Y?”

Mitigating AI Bias: Human Intervention

As AI takes on increasingly complex data science tasks, such as defining data accuracy, relevancy, and completeness, and then automatically preparing the data for analysis, we’ll still need humans to run “bias” interference. 

Ideally, this will be a multi-stage and ongoing system that includes cataloguing our “known-known” biases (which will need to be updated on a periodic basis) and how they tend to surface over time. AI will then have a starting reference point for bias “self-checking” and, ultimately, self-redirection; this is another hallmark of the intelligence that many in the AI realm are aiming for.

Discovery of new or unintended biases based on data selection decisions is also crucial to AI bias mitigation.  

But, it cannot stop there. We must continue to analyse and have the power to adjust our AI-driven systems, especially if we intend on automating vital functions that can have life or death consequences. While AI has great potential as a more objective analytical system, it is not, and will not be, a perfect solution to every human problem.

Recent Posts


Big future for big data at Big Data LDN


Data projects delivered at cost deficit – how DataOps changes this


Accelerating People Analytics