Actively Identifying Bias in Data Sets Can Mitigate Analytical Errors, Skewed Outcomes, and Negative Consequence in Machine Learning
It is vital to actively identify bias in data sets before and after entering them into machine learning algorithms, according to presenters at EdTech Maryland Meetup held online on Oct. 21.
Andrew Hampton, Assistant Professor of Psychology Christian Brothers University, stressed that there is enormous potential for machine learning to democratize education, but he cautioned that programmers must identify biases in the data, so they aren’t perpetuated by the algorithms that power artificial intelligence (AI).
“It’s Important to recognize the potential for these systems to perpetuate biases, unless we consciously identify and address these biases in the data,” Hampton said. “Machine learning algorithms are naive and will find patterns that may not be appropriate. They may use things like race and socio-economic status to make predictions without knowing the implications or context of this data.”
Phil Horwitz, Chief Architect at JBS Custom Software Solutions, agreed that the quality of the data is at the heart of the matter and that biases are almost always present before any machine learning takes place. He noted that under sampling and oversampling in the data can cause a majority of the issues. He also warned that data sets with wide disparities in numbers can cause issues with AI system outputs.
Horwitz gave the example of a system he worked on for credit card fraud where he had to prep the data because there was such a wide disparity between the large number of false fraud cases reported versus the small number of true fraud cases.
“Ultimately these machine learning systems are going into a production environment to solve real world problems, and there could be negative consequences if we implement naively,” Horwitz said.
Both presenters also recognized that most of these systems are designed primarily by white men providing an additional challenge to eliminating bias.
“We are usually the ones putting these systems together, so we need to recognize our biases and be cognizant that these aren’t passed on,” Hampton said.
The online discussion was moderated by Ed Mullin, Senior Management & Fractional CIO/CTO Consultant at Think Systems, Inc., who likened the current AI and machine learning industry to the early days of clinical research which improved as it matured by adopting standards and best practices.
Horwitz observed that the industry is improving rapidly by learning from past mistakes. He said more tools and resources are released nearly every week to help identify and mitigate bias in data sets and avoid mistakes that have been made in the past.
“We need to leverage these new tools and resources to ensure we don’t repeat naive implementation mistakes from the past,” Horwitz said. “A lot of progress can be made with increased awareness of these tools and resources.”
Horwitz encouraged the use of both human and AI-machine learning tools to detect bias in data sets.
Realizing AI and Machine Learning Benefits
Both presenters agreed that as the influence of machine learning in the educational ecosystem evolves and expands, consideration must be given to a broad range of novel, ethical, and practical issues, including identifying and mitigating bias.
However, the presenters remained confident that these issues can be addressed and that the potential benefits of AI and machine learning are worth it.
These systems help identify the most efficient path for each individual student to move through an educational environment by predicting what a student should learn next. The systems also learn to optimize the interface for each user.
“Machine learning helps us take massive steps forward to realize efficiency in educational environments,” Hampton said. “Ideally these adaptive systems will deliver the right content to the right student at the right time, no matter the student’s background.”
These systems can also evaluate curriculum strengths and weaknesses to optimize the effectiveness of a course, determine where help and support would best be provided.
“It goes both ways,” Horwitz said. “They’re not just helping students; they’re also helping improve content and curriculum.”
“This is an opportunity to radically democratize the educational system,” Hampton said. “The opportunity is essentially here, but we need to make sure we keep the best interests of students as the priority.”