Service console showing a chat window with Einstein helping to answer questions

What Is Retrieval Augmented Generation (RAG)?

How to take generative AI prompts to the next level with retrieval augmented generation or RAG.

Ari Bendersky

In 2023, Canada-based Algo Communications found itself facing a challenge. The company was poised for rapid growth, but it couldn’t train customer service representatives (CSRs) quickly enough to keep up with its expansion. To tackle this challenge, the company turned to a novel solution: generative AI.

Algo adopted a large language model (LLM) to help onboard new CSRs faster. In order to train them to answer complex customer questions with accuracy and fluency, Algo knew it needed something more robust than an off-the-shelf LLM, which is typically trained on the public Internet and lacks the specific business context needed to answer questions accurately. Enter retrieval augmented generation, better known simply as RAG.

By now, many of us have already used a generative AI LLM through chat apps like OpenAI’s ChatGPT or Google’s Gemini (formerly Bard) to help write an email or craft clever social media copy. But getting the best results isn’t always easy — especially if you haven’t nailed the fine art and science of crafting a great prompt.

Here’s why: An AI model is only as good as what it’s taught. For it to thrive, it needs the proper context and reams of factual data — and not generic information. An off-the-shelf LLM is not always up to date, nor will it have trustworthy access to your data or understand your customer relationships. That’s where RAG can help.

RAG is an AI technique that allows companies to automatically embed their most current and relevant proprietary data directly into their LLM prompt. And we’re not just talking about structured data like a spreadsheet or a relational database. We mean retrieving all available data, including unstructured data: emails, PDFs, chat logs, social media posts and other types of information that could lead to a better AI output.

Calculate your ROI with Agentforce.

Find out how much time and money you can save with a team of AI-powered agents working side by side with your employees and workforce. Just answer four simple questions to see what's possible with Agentforce.

How does retrieval augmented generation work?

In a nutshell, RAG helps companies retrieve and use their data from various internal sources for better generative AI results. Because the source material comes from your own trusted data, it helps reduce or even eliminate hallucinations and other incorrect outputs. Bottom line: You can trust the responses to be relevant and accurate.

To achieve this improved accuracy, RAG works in conjunction with a specialised type of database — called a vector database — to store data in a numeric format that makes sense for AI and retrieve it when prompted.

“RAG can’t do its job without the vector database doing its job,” said Ryan Schellack, director of AI product marketing at Salesforce. “The two go hand in hand. When you see a company talk about supporting retrieval augmented generation, they are at minimum supporting two things: a vector store for storing information and then some type of machine-learning search mechanism designed to work against that type of data.”

Working in tandem with a vector database, RAG can be a powerful tool for generating better LLM outputs, but it’s not a silver bullet. Users must still understand the fundamentals of writing a clear prompt.

Agentblazer Characters

Join the Agentblazer community.

Connect with Agentblazers from around the world to skill up on AI, discover use cases, hear from product experts and more. Grow your AI expertise — and your career.

Faster response times to complex questions

After adding a tremendous amount of unstructured data to its vector database, including chat logs and two years of email history, Algo Communications started testing this technology in December 2023 with a few of its CSRs. They worked on a small sample set: about 10% of the company’s product base. It took about two months for the CSRs to get comfortable with the tool. During implementation, company leadership was excited to see CSRs gain greater confidence in answering in-depth questions with the assistance of RAG. At this point, the company started rolling out RAG wider across the company.

“Exploring RAG helped us to understand we were going to be able to bring in so much more data,” said Ryan Zoehner, vice president, commercial operations for Algo Communications. “It was going to allow us to break down a lot of those really complex answers and deliver five- and six-part responses in a way that customers knew [there] was someone technically savvy answering them.”

In just two months after adding RAG, Algo’s customer service team was able to more quickly and efficiently complete cases, which helped them move on to new enquiries 67% faster. RAG now touches 60% of its products and will continue to expand. The company also started adding new chat logs and conversations into the database, reinforcing its solution with even more relevant context. Using RAG has also allowed Algo to cut its onboarding time in half, enabling it to grow faster.

RAG is making us more efficient,” Zoehner said. “It’s making our employees happier with their job and is helping us onboard everything faster. What has made this different from everything else we tried to do with LLMs is it allowed us to keep our brand, our identity and the ethos of who we are as a company.”

With RAG providing Algo’s CSRs with an AI assist, the team has been able to dedicate more time to adding a human touch to customer interactions.

“It allows our team to spend that extra little bit on making sure the response is landing the right way,” Zoehner said. “That humanity allows us to bring our brand across everything. It also gives us quality assurance across the board.”