As we increase our reliance on machine learning, and automated systems that are built on usage data and consumer insights, one thing that researchers need to work to avoid is embedding unconscious bias, which is often already present in their source data, and can therefore be further amplified by such systems.
For example, if you were looking to create an algorithm to help identify top candidates for an open position at a company, you might logically use the company's existing employees as the base data source for that process. The system you create would then inevitably be skewed by that input. More males already employed might see male applicants weighted more heavily in the results, while fewer people of certain backgrounds or races could also sway the output.
Given this, it's important for AI researchers to maintain awareness of such bias, and mitigate it where possible, in order to maximize opportunity, and eliminate pre-existing leanings from input data sets.
Which is where this new research from Google comes in - this week, Google has launched its Know Your Data (KYD) dataset exploration tool, which enables researchers to identify existing biases within their base data collections, in order to combat pre-existing bias.
As you can see in this example, using image caption data, the tool enables researchers to examine their datasets for, for example, the prevalence of male and female images within a certain category. Through this, research teams may be able to weed out bias at the core, improving their input data, thereby reducing the impact of harmful, embedded stereotypes and leanings based on existing premises.
Which is an important step. At present, the KYD system is fairly restricted as to how it can extract and measure data examples, but it points to an improved future for such analysis, which could help to lessen the impacts of bias within machine learning systems.
And given that more and more of our interactions and transactions are being guided by such processes, we need to be doing all we can to combat these concerns, and ensure equal representation and opportunity through these systems.
We have a long way to go on this, but it's an important step for Google's research, and for broader algorithmic analysis.
You can read Google's full overview of its evolving KYD system here.