The growth of social media and the advancement of mobile technology has created exponentially more ways to create and share information. Advanced data tools, such as AI and data science are being employed more often as a solution for processing and analyzing this data. Artificial Intelligence (AI), combines computer science with robust datasets and models to facilitate automated problem-solving. Machine Learning (ML) models, a subfield of AI that uses statistical techniques that enables computers to learn without explicit programming, use data inputs to train actions and responses for users. This data is being leveraged to make critical decisions surrounding governmental strategy, public assistance eligibility, medical care, employment, insurance, and credit scoring.
As one of the largest technology companies in the world, Amazon Web Services (AWS) relies heavily on AI and ML as the solution they need for storing, processing, and analyzing data. But, in 2015, even with their size and technical sophistication, they discovered bias in their hiring algorithm. It was biased to favor men because the data set it referenced was based on past applicants over the previous 10 years, which contained a much larger sample of men than women.
Bias was found in an algorithm COMPAS (Correctional Offender Management Profiling for Alternative Sanctions), which is used by US court systems to predict offender recidivism. The data used, the chosen model, and the algorithm employed overall, showed that it produced false positives for almost half (45%) of African American offenders in comparison to Caucasian American offenders (23%).
Without protocols and regulations to enforce checks and balances for the responsible use of AI and ML, society will be on a slippery slope of issues related to bias based on socioeconomic class, gender, race, and even access to technology. Without clean data, algorithms can intrinsically create bias, simply due to the use of inaccurate, incomplete, or poorly structured data sets. To avoid bias, it starts with accurately assessing the quality of the dataset, which should be:
- Clean and consistent
- Representative of a balanced data sample
- Clearly structured and defined by fair governance rules and enforcement
Defining AI Data Bias
The problem that exists with applying Artificial Intelligence to make major decisions is the presence and opportunity for bias to cause significant disparities in vulnerable groups and underserved communities. A part of the problem is volume and processing methods of Big Data, but there is also the potential for data to be used intentionally to perpetuate discrimination, bias, and unfair outcomes
“What starts as a human bias turns into an algorithmic bias,” states Gartner. In 2019, Algorithmic bias was defined by Harvard researchers as the application of an algorithm that compounds existing inequities in socioeconomic status, race, ethnic background, religion, gender, disability, or sexual orientation and amplifies inequities in health systems. Gartner also explained four types of algorithmic bias:
- Amplified Bias: systemic or unintentional bias in processing data used in training machine learning algorithms.
- Algorithm Opacity: end-user data black boxes, whether intrinsic or intentional, cause concern about levels of integrity during decision-making.
- Dehumanized Processes: views on replacing human intelligence with ML and AI are highly polarized, especially when used to make critical, life-changing decisions.
- Decision Accountability: there exists a lack of sufficient reporting and accountability from organizations using Data Science to develop strategies to mitigate bias and discrimination.
A study by Pew Research found that “at a broad level,” 58% of Americans feel that computer programs will always reflect some level of human bias – although 40% think these programs can be designed in a way that is bias-free. This may be true when you’re looking at data about shipments in a supply chain or inventory predicting when your car needs an oil change, but human demographic, behaviors, and preferences can be fluid and subject to change based on data points that may not be reflected in the data sets being analyzed.
Chief data and analytics officers and decision-makers must challenge themselves by ingraining bias prevention throughout their data processing algorithms. This can be easier said than done, considering the volume of data that many organizations process to achieve business goals.
The Big Cost of Bias
The discovery of data disparities and algorithmic manipulation to favor certain groups and reject others has severe consequences. Due to the severity of the impact of bias in Big Data, more organizations are prioritizing bias mitigation in their operations. InformationWeek conducted a survey on the impact of AI bias on companies using bad algorithms. It revealed that bias was found to be related to gender, age, race, sexual orientation, and religion. In terms of damages to the businesses themselves, they included:
- Lost Revenue (62%)
- Lost Customers (61%)
- Lost Employees (43%)
- Paying legal fees due to lawsuits and legal actions against them (35%)
- Damage to their brand reputation and media backlash (6%)
Solving Bias in Big Data
Regulation of bias and other issues created by using AI, or having poor-quality data are in different stages of development, depending on where you are in the world. For example, in the EU, an Artificial Intelligence Act is in the works that will identify, analyze, and regulate AI bias.
However, the true change starts with business leaders who are willing to do the leg work of ensuring diversity and responsible usage and governance remain at the forefront of their data usage and policies “Data and analytics leaders must understand responsible AI and the measurable elements of that hierarchy — bias detection and mitigation, explainability, and interpretability,” Gartner states. Attention to these elements supports a well-rounded approach to finding, solving, and preventing issues surrounding bias in data analytics.
Lack of attention to building public trust and confidence can be highly detrimental to data-dependent organizations. Implement these strategies across your organization as a foundation for the responsible use of Data Science tools:
- Educate stakeholders, employees, and customers on the ethical use of data including limitations, opportunities, and responsible AI.
- Establish a process of continuous bias auditing using interdisciplinary review teams that discover potential biases and ethical issues with the algorithmic model.
- Mandate human interventions along the decision-making path in processing critical data.
- Encourage collaboration with governmental, private, and public entities, thought leaders and associations related to current and future regulatory compliance and planning and furthering education around areas where bias is frequently present.
Minimizing bias in big data requires taking a step back to discover how it happens and preventive measures and strategies that are effective and scalable. The solution may need to be as big as big data to successfully surmount the shortcomings present today and certainly increasing in the future. These strategies are an effective way to stay informed, measure success, and connect with the right resources to align with current and future algorithmic and analytics-based bias mitigation.