Quick Take: GenerativeAI has most recently captured the public’s attention for its applications in the workplace. Salesforce’s AI Research team, however, is also exploring how it can be applied to solve problems in other fields — and in society. This story details how Salesforce partnered with an academic institution and a biomedical company to apply an AI language model to protein design with fascinating results.
“AI – especially generative AI – is having such a moment right now,” Salesforce Director of AI Research Nikhil Naik recently said during a Blazing Trails podcast interview.
The “moment” is thanks to headline-making products like ChatGPT, which have generated thousands of headlines — along with questions and ethical concerns — over the past few months. And while using this type of technology to write articles, essays, and not-so-great songs is all the rage right now, the work that Naik and others have been doing for the past five years within Salesforce’s AI Research program has some bigger applications.
For example, they’ve been able to train generative AI on conversational language which is then turned into development code through a large-scale language model called CodeGen. It’s an exciting application of technology for the workplace, but its impact can go even further.
Part of our goal with AI research is to apply Salesforce AI to problems that can have a broader impact on society.
Nikhil Naik, Director of AI Research, Salesforce
Through their AI for Society initiative, the team is also applying AI research to some of society’s biggest challenges. So far, Salesforce Research work has ranged from implementing AI for more equitable and balanced economic policies, to using computer vision AI for tracking great white sharks, all the way to determining optimal treatment paths for breast cancer patients via artificial intelligence.
From patterns to proteins
It’s mind-bending stuff, but the work is actually built around a simple strategy. “We identify AI techniques that we are very good at, and then identify problems where the AI could be applied,” said Naik.
This approach recently led to the development of ProGen, a Salesforce AI language model trained on the world’s largest protein database.
Yes, you read that right — Salesforce trained its AI models on proteins.
While that might sound like a stretch – taking technology largely used today to develop chatbots and automated user flows to design proteins – the team had uncovered a commonality between the two use cases that made a lot of sense.
“AI models ingest a large amount of text and they learn to predict the next word that might come after a given word,” said Naik. “And just by training using this pretty simple method, you can train an AI algorithm to generate very realistic language about any topic that you might be interested in. And what we realized is that the same technology can be applied to generating proteins.”
And if you can develop novel proteins, the Salesforce Research team believed, that could eventually open the door to new medicines, vaccines, or sustainability innovations — to name a few.
So, in 2020, Naik and his team set out to apply generative AI, especially large language models and their associated techniques, to the problem of protein design. Why protein design? It’s an area where creation and research can be done at exponential speed, meaning “we can accelerate the discovery of novel drugs and useful industrial chemicals,” according to Naik.
Creating an ‘amino alphabet’
The Salesforce team created an “alphabet” using amino acids, the building block of all proteins. Those “letters” come together to form proteins, and then, the same way you can train a large language model to predict the next word and to generate sentences in English, they trained a large language model using a database of 280 million protein sequences to generate novel proteins.
Naik and team were excited by their progress, but they didn’t have the capabilities to test whether their AI language model for generating proteins would be able to actually create something useful. So, they teamed up with the Fraser Lab at UCSF and medical startup Tierra Biosciences to vet their research.
The Salesforce Research team first sent Tierra about 100 AI-generated proteins to synthesize and create test tube versions of them. The results from Tierra indicated that the proteins were functional, and they were then sent to the University of California San Francisco’s Fraser Lab for further research.
Fraser Lab compared the artificial proteins to proteins found in nature. “The lab tests showed that we can design proteins that are 60-70% dissimilar to anything ever seen in nature, but that are still functioning proteins, containing biological activity. And that is an important scientific milestone for the future of drug discovery and industrial chemical design,” Naik said.
Another compelling finding that came from the trials: ProGen-created proteins were 73% biologically active, whereas only 59% of naturally-occurring proteins were.
Ethical by design
Given the serious implications of this work, it was clear to Naik and everyone involved that applying this technology in an ethical way was incredibly important.
Working in lockstep with Salesforce’s Ethical AI Council and Office of Ethics under Salesforce’s Chief Ethical and Humane Use Officer, Paula Goldman, every step of the process underwent ethical reviews “that help us deploy AI in a careful manner.”
Given the tremendous opportunities and challenges emerging in the space, Goldman and her teams leveraged and built on Trusted AI Principles to help guide the process and, specifically for ProGen, protocols that “should be put in place to ensure safe usage and limitation of unintended harmful effects.”
Proteins are just the start
Simple and straight-forward work, right? Maybe not, but it’s worth pointing out that the ability to create new designs of proteins never-before-seen in nature is truly groundbreaking and could potentially be used for medicine and other domains.
Since those experiments, Naik noted that researchers have already built on his team’s work to show its applications in a variety of domains. “I think in the near future, we will see an explosion of research and commercial activity in this space.” And that is already happening, as Naik and his Salesforce AI Research team attempt to identify potential treatments for rheumatoid arthritis, multiple sclerosis, and other neurological and autoimmune disorders by leveraging their work with ProGen.
That is the true endgame — less about the proteins created and more about the approach Naik and his team used, and how that can be applied elsewhere. How can a similar approach be applied to global challenges like sustainability or climate change or food supply? That’s where the true power of AI can really expand, according to Naik.
“This is just the beginning for the use of AI,” Naik proudly said. “We are just at the beginning of this revolution.”
Go deeper:
- See how Salesforce is building Generative AI we can trust.
- Read 5 guidelines for responsible generative AI development.
- Go in-depth on how AI can generate new proteins.
- Read how AI-designed proteins can potentially find new medical treatments here.