Balancing AI Bias with Ethical Data Collection

In this article, Adonis Celestine of Applause discusses how to ensure training data for AI is ethically collected, along with legislation and consumer concern around AI.

We’re not even halfway through the year yet, but artificial intelligence (AI) is already making its case for being the biggest thing to happen in 2023 â€“ and beyond. The popular AI chatbot, ChatGPT (generative pre-trained transformers), set records for consumer use earlier this year, with 100 million active usersOpens a new window just two months after launching.Â

AI applications like ChatGPT have the potential to help people with their jobs and in their everyday lives. Everyone sees the possibilities, and the AI market is expected to reach $407 billion by 2027Opens a new window . Despite all of the excitement, there are fears and concerns about AI and the datasets being used to train these algorithms. ChatGPT does not reveal what data sets it is trained on for competitive reasons, but how data is collected and used for AI has become a hot topic for ethical concerns over privacy and bias.Â

Legislation Around AIÂ

The widespread adoption of AI technology like ChatGPT has happened very quickly, but legislation is still playing catch up. However, there are a few instances where straightforward regulations are already in effect. In cases where AI was developed with data that is governed and protected by privacy laws and not intellectual property regulations or copyright laws, the federal trade commission (FTC) is already stepping in. In instances where an organization violated privacy in collecting data and information or where consumers did not consent to their data being used to train AI algorithms, the FTC is issuing fines and even calling for the deletion and destruction of these algorithms and associated data.

The destruction of algorithms is a steep price to pay for organizations that have violated privacy laws while building their own AI applications. In these instances, companies must surrender all the data they used to train their algorithms and just as importantly, are at risk for falling behind in a quickly expanding and incredibly competitive market. Privacy violations can set organizations back months or years as they recreate algorithms using ethically-collected data.Â

Adding to the pain, the actual building of this software is an expensive process. Software developers are paid well for their work, especially those associated with AI and machine learning. Organizations violating privacy laws have likely lost huge initial investments, plus have potentially damaged customer trust and, consequently, their reputation.

Meanwhile, in Europe, the currently proposed European AI Act,Opens a new window which is still being negotiated, could involve fines of up to 30 million euros or 6% of a company’s revenue for the year, should it be violated. This law will apply to all companies that operate in the European Union and will certainly influence legislation across the rest of the world.Â

See More: How ChatGPT Could Spread Disinformation Via Fake Reviews

Consumer Concern

The explosion of AI has not come without concern from the consumers using this technology. A surveyOpens a new window from Applause earlier this year looked into the matter and discovered that more than half of the respondents thought AI should be regulated depending on its use, and more than one-third thought it should always be regulated.Â

Another concern involving AI is bias. AI outputs can be faulty or biased when algorithms are not trained properly or do not use enough data. When asked about bias in AI technology, respondents for this survey shared concerns:

19% of respondents were very concerned about bias in AI
41% of respondents were somewhat concerned, depending on the situation
26% of respondents were slightly concerned

The same survey also uncovered some interesting findings on the performance of AI so far. Three out of 10 said they were somewhat or extremely dissatisfied with their experience with AI â€“ but nearly one-third said they would use AI chatbots more frequently if these bots responded more accurately.

From both the perspective of consumers and legislators, it is clear that there are concerns about the development, use, and future of AI technology. Given these concerns, the best thing organizations can do when developing AI technology is to ensure the data being collected to train algorithms is done ethically and legally.Â

See More: Is Open-Source Data Science the Key to Unbiased AI?

Ethically Collecting Training Data

If your company is looking to join the wave of organizations building AI applications, you’ll need to make sure you have the data needed to train your AI algorithm. And you’ll need to make sure that data is ethically collected. Here are some tips to ensure this is the case:

Ask participants if they have opted in: When sourcing data, it is critical to make sure participants have agreed explicitly that data can be collected and used to train AI algorithms. When buying data from a third party, buyers need to ensure this condition has been met as well.Â
Look at terms and conditions and privacy policies as they relate to AI training use cases: If you plan to use your own customer’s data for AI training purposes, they will need to know that, along with how data will be used (like to improve your products).Â
Keep bias in mind: When looking at your data, it is important to be conscious of bias and the need to reflect the diversity of your audience and customers. You should make sure your data includes samples from people of different races, genders, ages, demographics, and disabilities.Â
Consider synthetic balanced data: Another option for ethically sourcing training data involves methods like SMOTE, or Synthetic Minority Oversampling Technique, which can help balance out datasets and remove bias.Â
Purpose limitation and disposal: Only collect data that is necessary and use it for the purposes that customers gave consent to. Retaining data longer than it is required poses a data security risk. A data management and disposal strategy is essential for data collection activities.Â

The past few months have proven that AI technology is not just a fad. It’s here to stay and has already demonstrated some incredible use cases for supporting individuals and organizations in accomplishing their goals. The potential is exciting as AI technology becomes even more widely adopted and more companies look to build and deploy AI applications. However, there are very real concerns and consequences to consider.Â

Alleviating those concerns and avoiding consequences starts with the datasets companies use to build AI. These datasets must be ethically and legally collected and used, and diversity and bias must be considered when building a product that is accessible and useful for all.Â

How are you using ethical data collection to manage AI data bias? Share with us on FacebookOpens a new window , TwitterOpens a new window , and LinkedInOpens a new window . We’d love to hear from you!

Image Source: Shutterstock

Balancing AI Bias with Ethical Data Collection

Legislation Around AIÂ

Consumer Concern

Ethically Collecting Training Data

MORE ON AI BIAS

Contact ESSID Solutions

Reach out to us for a free consultation on big data consultancy and development services.