Sign up for our newsletter and get the latest big data news and analysis.

Interview: Kathy Baxter, Architect of Ethical AI Practice at Salesforce

I recently caught up with Kathy Baxter, one of the elite AI ethicists in the world and Salesforce‘s Principal Architect of Ethical AI, to discuss the good, the bad and the ugly of ethical data collection. Kathy develops research-informed best practice to educate Salesforce employees, customers, and the industry on the development of responsible AI. She collaborates and partners with external AI and ethics experts to continuously evolve Salesforce policies, practices, and products. Prior to Salesforce, she worked at Google, eBay, and Oracle in User Experience Research. She received her MS in Engineering Psychology and BS in Applied Psychology from the Georgia Institute of Technology.

insideBIGDATA: What are some of the most important considerations when approaching ethical data collection?

Kathy Baxter: Ethical data collection encompasses several different components. Most importantly, it’s critical that companies recognize what data is actually necessary. It’s become almost habitual for large companies to scoop up every piece of data they can get their hands on in case they might need it for analytical and/or forecasting purposes. In this case, whatever data is collected, it’s important that it is secure and anonymized to prevent data breaches.

Next, it’s important to always ensure you have informed consent from the individual providing the information. In my opinion, organizations need to do a better job when it comes to communicating the data they are collecting and how it will be used. In fact, a recent study from Salesforce found that over half of Americans don’t trust organizations to collect or use their data ethically.

It’s also important to know the origin of the data you are using. What many don’t realize is that, several frequently used data sets to train AI models are collected illicitly online without permission. Just because you can get access to a data set doesn’t mean you are legally or ethically permitted to use it.

One final component I want to touch on is the importance of representativeness and accuracy. In order to have data sets that are fully representative and an accurate representation of the whole, you need to understand the origins of your data to know what kinds of bias may be present. Who is included (represented), who is not, and why? You also need to know if the data you are using is first hand (e.g., user-supplied demographics and interests, behavioral like what a user clicked on or searched for) or based on inferences (e.g., guessing someone’s demographics or interests based on other signals).

insideBIGDATA: Why is ethical data collection important? Why should other organizations care?

Kathy Baxter: The most prominent issue is that organizations are using consumer data to make decisions that can amplify existing societal bias and cause real harm. There are several examples that exist when it comes to technology magnifying harmful bias. For instance, just look at AI-based hiring tools that only recommend white men or facial recognition technology that has resulted in the arrests of innocent people of color.

When it comes to collecting and analyzing data, AI technologies are oftentimes making decisions based on the wrong factors, it perpetuates bias in the model, putting both the organization and consumer at risk. For the success of businesses and the good of society, we need data to be accurate, and that means eliminating as much bias as possible.

Organizations that are value-driven and build a culture of ethics will be better positioned to collect data, accurately, and ethically. Every individual with access to customer data must be educated on how to handle it ethically.

insideBIGDATA: Which ethical frameworks are effective? Have you seen this go wrong?

Kathy Baxter: Most of today’s biggest tech companies have instilled ethical frameworks in order to address issues that may arise in terms of collecting and processing personal data. It’s great to see these measures in place because without them, organizations run the risk of harming both their business and their consumers. However, addressing ethical issues is not an easy task and I find that one of the major pitfalls of ethical frameworks is in regards to practicality. To make sure your organization is actually making strides towards ethical data collection it’s important to establish a set of metrics and/or measurements to work against because let’s face it, actions speak louder than words. And just as importantly, it is critical to the executives have created the right incentive structure to reward ethical data collection and use, as well as meaningful consequences when unethical data handling is identified.

insideBIGDATA: Can you share best practices for ethical data collection?

Kathy Baxter: First and foremost, I recommend staying informed and following regulations such as the EU’s GDPR, California’s Consumer Privacy Act (CCPA), and India’s Personal Data Protection legislation that is expected to be enacted early this year. It’s also important to remember that access to your customers’ data is a privilege, not your right. As an organization, you should limit your data collection to only what you absolutely need to make business decisions and do everything in your power to keep it secure and anonymized to prevent data breaches. If there are only some cases where more data is needed, consider progressive consent (i.e., begin by asking for the least data possible then ask for access to more data once it is needed). Consumers will be more likely to grant access to more data once a trusted relationship has been developed and value to the customer has been demonstrated.

Carefully control where data is stored, who has access to it, audit that access, and have a centralized view of how the data is being used. Too often multiple individuals and groups within a company have access to a customer’s data and there is no coordination over how that data is used or updated.

Be transparent with your customers about what data you are collecting and give customers control to edit or delete it. Take steps to ensure the data you have is accurate and you are respecting user preferences.

Make decisions based on data you have directly collected from users (e.g., demographics, interests) or observed (e.g., searches, clicks). However, don’t make assumptions about customers based on a single data point or action. If someone is actually interested in a product/topic, they will demonstrate that interest over time, as opposed to a one-time search or purchase (e.g., a baby shower gift for a coworker, your child borrowing your phone to search for something) or an accidental click.

Be more inclusive and avoid bias. Beware of making decisions based solely on demographics. Even if personalization or targeting based on stereotypes ends up being correct most of the time (e.g., targeting makeup ads at women), you may be missing a large portion of potential customers (e.g., men, non-binary, transgender individuals interested in makeup) or unknowingly causing offense. Once you identify bias in your business processes or decision making, you need to eliminate it from the process before using that data to train other AI systems. How? Focus on three core areas: employee education, product development and customer empowerment.

insideBIGDATA: What advice do you have for other organizations about AI ethics?

Kathy Baxter: For the success of businesses and the greater good of society, we need AI to be accurate, and that means eliminating as much bias as possible. Organizations have a responsibility to ensure fair and accurate data sets — it’s an ongoing effort that requires awareness and commitment. Too many companies are making inferences on sensitive consumer information that the consumer never wanted shared. Although there’s no universal fix, here are four strategies organizations should keep in mind:

  • Identify underlying bias’s in systems and processes. Examine the decisions your systems are making based on sensitives variables.
  • Interrogate the assumptions made about data that was collected. To determine if you are making decisions based on unfair criteria like race, gender, geography, or income, you need fairness through awareness. That means collecting sensitive variables to see correlations in the data but not actually making decisions based on those sensitive variables.
  • Engage with people who may be affected by the technology. Those who work closely to implement AI technologies must consider direct feedback from their customers to understand what impacts the technology could have on any given group of people and society as a whole.
  • Don’t revert to the “move fast and break things” mindset. It’s more important than ever before to allow time for mindful and productive thinking to ensure AI technologies are accurate and fair. This will allow organizations to create a more complete, inclusive and safe product offering.

insideBIGDATA: The COVID-19 pandemic created an influx of data (healthcare, personal) and accelerated many digital transformation projects. What will be important to consider ethically as many businesses adapt to handle this amount of new data?

Kathy Baxter: The COVID-19 pandemic has shown that consumers are becoming increasingly reliant on digital channels, forcing many organizations to shift their business models to an all-digital landscape — thus, accelerating the growth of worldwide data. Organizations are trying to move quickly to adapt. But moments of crisis don’t permit you to move fast at all costs.

That said, although there is more new data than ever before, organizations need to ensure that as they adapt and digitalize, ethics practices are incorporated into efforts to scale from the ground up. It is more important than ever to:

  • Co-locate ethics practices within existing and new infrastructure, processes and frameworks
  • Incorporate ethics into your organization’s incentive structure
  • Instill ethics-by-design knowledge across a workforce.

Sign up for the free insideBIGDATA newsletter.

Join us on Twitter: @InsideBigData1 – https://twitter.com/InsideBigData1

Leave a Comment

*

Resource Links: