Ethical Hacking Practices Prove Successful in Building Trusted AI Products

[Image: nadia_snopek / Adobe Stock]

Orlando Lugo

Sarah Tan

September 17, 2024 6 min read

Co-authored by Hannah Cha, Orlando Lugo, and Sarah Tan

At Salesforce, our Responsible AI & Technology team employs red teaming practices to improve the safety of our AI products by testing for malicious use, intentional integrity attacks, benign misuse, and identifying responsible AI issues.

Recently, we engaged in both internal and external red teaming exercises to identify potential risks within our products and mitigate them accordingly. Leveraging the extensive subject matter expertise of employee developers, internal bug bounties identify vulnerabilities in AI models and applications. Furthermore, engaging external vendors helps to cover additional risk surfaces in our products that internal red teaming may have missed. Utilizing diverse red teaming practices is crucial for covering a broad risk spectrum and improving resilience towards evolving vulnerabilities.

What is a Bug Bounty and Why is it Important?

In a bug bounty, individuals, either internal or external, are incentivized by the organizer to uncover and report potential vulnerabilities to an organization before they can be exploited by bad actors. Salesforce was one of the first enterprise companies to establish a Bug Bounty Program, and continues to scale its impact through initiatives like live hacking events to bolster security against AI-related threats.

Recently, we conducted an internal bug bounty where employees submitted reports of suspected vulnerabilities they found in our developer tool, Agentforce for Developers (Formerly known as Einstein for Developers), an AI-powered developer tool to assist in the development of Apex code and Lightning Web Components (JavaScript, CSS, and HTML). To incentivize participation, we offered a cash prize for the highest-scoring submission based on our bug bounty rubric that prioritized broad bias, safety, and privacy considerations.

The findings were insightful and helped inform the next iteration of product improvements to further mitigate potential bias. Some submissions, for example, explored how the tool might inadvertently prioritize certain customer attributes, such as gender when developing code to predict purchasing behavior. Orlando Lugo, Responsible AI Product Manager, described the implications of deploying such code: “If such code were deployed, it could lead to biased decision-making in Salesforce applications, potentially affecting fairness and inclusivity.”

Raaghavv Devgon, Product Vulnerability Engineer, had the highest scoring submission, naming him the winner of the A4D Bug Bounty and prize money.

In-House Bug Bounty Leads to Action & Employee Morale

To expand on initial vulnerabilities identified in the bug bounty event, the Responsible AI & Technology team conducted additional red teaming based on researched AI trust & safety dimensions, which include truthfulness, robustness, safety, fairness, privacy, and ethics, to explore risk surfaces that the submissions didn’t sufficiently cover. This was done by manually creating adversarial prompts for each of these dimensions, and evaluating whether the model’s responses presented potential vulnerabilities.
Upon further collaboration with the A4D Product team and AI Research team, we expanded on the initially broad AI trust & safety dimensions to create a narrower responsible AI criteria for A4D. These criteria included more product-specific dimensions to evaluate the model on, such as toxicity, sensitive content leaks, or bias against protected groups. Again, we manually created prompts for each of these dimensions, and evaluated model responses. Through this iterative testing, we identified potential risks and subsequently refined our AI systems and guardrails to ensure that generated outputs promote safe outcomes. As a result of these new guardrails, we were able to reduce problematic outputs from adversarial prompts by 90%.

Consequently, conducting internal red teaming encourages and incentivizes employees to identify and critically engage with responsible AI issues, biases, or potential harm in our products prior to their release. Devgon states that the “bug bounty is a critical component of vulnerability management, and it incentivizes employees to [invest their time and expertise] to find critical bugs.”

Similarly, Salesforce employees have indicated they want to become more involved in improving our AI systems. To further encourage their involvement, we have also done lightweight, internal red teaming, where employees are encouraged to report any responsible AI bugs encountered as they try out various products. Internal red teaming allows employees to make a difference while leveraging the diverse perspectives within the company to uncover a wider range of responsible AI concerns.

Our Learnings From External Red Teaming Exercises

In addition to organizing internal red teaming events and in line with our White House AI Voluntary Commitments, Salesforce engaged an external vendor to perform various penetration tests to broaden our risk spectrum. Leveraging third parties with wide-ranging expertise and a global perspective can be helpful as they engage diverse perspectives, helping to uncover a broader range of risks. Sarah Tan, Responsible AI Director, explained: “External red teaming complements internal red teaming; both to uncover blind spots that might be missed with just external or internal red teaming alone.” We outsourced testing for Agentforce for Developers to the vendor, which simulated attacks on our product using adversarial prompts with the intention of making the product generate biased or toxic outputs.

We recommend businesses that engage external partners for adversarial testing keep in mind the following:

Ensure the vendor has a successful track record of adversarial testing and has worked in your field before.
Domain expertise is essential: A deep understanding of internal domains, products, and terminology, along with the necessary development skills to automate and scale the generation of meaningful and useful prompts, is required to ensure alignment with the product’s goals and objectives.
Provide a list of what “good” and “bad” quality adversarial prompts look like to align expectations between you and the vendor.
Sample their prompts and outputs in the red teaming process, creating moments of review and feedback along the way.
Narrow the scope of the testing to a specific product suite or select risk areas. Stress testing a smaller surface area drives more and better-quality results than trying to “break” everything at once.

Looking Forward

Our internal and external red teaming practices have been instrumental in identifying and mitigating vulnerabilities in our AI-powered tools, ensuring that they meet our high responsible AI standards. By leveraging the diverse perspectives of our own employees and external subject matter experts, we have been able to identify numerous potential risks and mitigate them accordingly, many times before our products ever go to market.
As we continue innovating, we remain committed to responsible AI development and fostering an inclusive and equitable experience for our users. In subsequent blog posts, we will go into further detail about additional internal red teaming exercises and their findings.

___________

Acknowledgements: Hannah Cha interned at Salesforce in Summer 2024, working on the A4D Bug Bounty and external vendor engagement alongside Orlando Lugo and Sarah Tan. Kathy Baxter worked on the external vendor engagement. Special thanks to our collaborators from the A4D team and AI Research, including Walter Harley, Ananya Jha, Pooja Reddivari, Yingbo Zhou, Young Mo Kang, Mahesh Kodli, and many other contributors throughout the course of this project.

Note: Salesforce employees exposed to harmful content can seek support from Lyra Health, a free benefit for employees to seek mental health services from licensed clinicians affiliated with independently owned and operated professional practices. Additionally, the Warmline, an employee advocacy program for women (inclusive of all races and ethnicities), Black, Indigenous, and Latinx employees who represent all gender identities, and members of the LGBTQ+ communities offers employees 1:1 confidential conversations with advocates and connects employees to resources to create a path forward.

An AI agent interacts with users on a computer screen, surrounded by icons and chat bubbles.

AI in Customer Experience: How to Stay Ahead

11 min read

In a World of AI Agents, Who’s Accountable for Mistakes?

6 min read

Orlando Lugo Product Manager, Ethical & Humane Use

Orlando is the Product Inclusion Lead in Salesforce’s Office of Ethical and Humane Use of Technology. His current mission is to help make Salesforce products more equitable and inclusive from design and development to configuration and use. Before joining Salesforce, Orlando worked in The Seattle Read More

More by Orlando

Sarah Tan Researcher Principal, Salesforce

Sarah Tan is the Director of Responsible AI at Salesforce and holds a Visiting Scientist position at Cornell University. She co-founded the Trustworthy ML Initiative and serves as President of Women in Machine Learning (WiML). Sarah earned her PhD in Statistics from Cornell University, with further Read More

More by Sarah