Skip to Content
Skip to Footer

It’s no secret that great AI requires data. From asking an AI model to build a marketing plan to having it summarize postings in a Slack channel at work, AI can’t be successful without data.  

But not all data is created equal, and to get the right results from generative AI, companies need more than what’s found in spreadsheets and other structured data formats.

But not all data is created equal, and to get the right results from generative AI, companies need more than what’s found in spreadsheets and other structured data formats.

Raveendrnathan Loganathan, EVP, Software Engineering

Within every business, there’s also a massive amount of information trapped in “unstructured” data, such as documents, images, audio and video recordings, and social media feeds. This unstructured data could be highly valuable, providing businesses with AI insights that are more accurate and comprehensive because they are grounded in customer information. But many organizations lack the technical wherewithal to see, access, integrate, and make use of their unstructured data in any trusted way. 

It can be a real Catch-22. Here, Raveendrnathan Loganathan, EVP of Engineering for Salesforce Data Cloud, shares how businesses can effectively tap into unstructured content, gather knowledge, index the data efficiently, and pull insights with retrieval augmented generation (RAG)

Q. First thing’s first – tell us more about unstructured data.

Traditionally, companies have handled data that is found in tabular format with rows and columns. That includes customer engagement data gathered through our CRM applications. However, businesses have vital information in other formats like PDFs, emails, audio files, videos, images, and more. The data in those formats is called unstructured, and there’s huge value in unlocking it. And, thanks to the power of large language models (LLMs) and generative AI, we can now do just that.

Q. What’s an example of how businesses should tap into that unstructured data to enhance customer experiences and business outcomes?

Imagine a customer who needs help with a recent purchase. Typically they start the conversation with the company’s chatbot, which is powered by AI. For the experience to be both relevant and positive, the entire exchange needs to be grounded in that specific customer’s data: Their recent product purchase, their warranty information, any past conversations they’ve had, and more. The LLM should also be tapping into company data, such as the latest learnings from other customers who have bought similar products and internal knowledge base articles. 

What’s challenging is that some of the information might reside in transactional databases (structured information), while the rest might be in unstructured files, such as warranty contracts or knowledge base articles. And all of that data, whether it’s structured or unstructured, is wrapped in various levels of privacy rules and governance to protect customer information. Both types of data need to be accessed, and the right data needs to be utilized, otherwise, the exchange with the chatbot is at best frustrating and at worst inaccurate. 

Q. You mentioned chatbots — how well do consumer AI chatbots make use of both structured and unstructured data? 

Consumer AI chatbots have gone through transformations over the past few years, from rule-based to AI-powered chatbots with natural language processing (NLP) and machine learning, and now generative AI-powered chatbots with LLMs. 

While unstructured and structured data are contributing more and more to training models, consumer models aren’t infused with ‌real-time and contextual information. So, for example, an LLM won’t be able to answer questions about events that transpired over the last day. 

While unstructured and structured data are contributing more and more to training models, consumer models aren’t infused with ‌real-time and contextual information.

Raveendrnathan Loganathan, EVP, Software Engineering

Obtaining the best and most accurate AI responses requires augmenting LLMs with proprietary, real-time, structured, and unstructured data from within a company’s own applications, warehouses, and data lakes. That’s why infusing AI with relevant business and company knowledge is important, and there’s a lot more work that remains to be done in this field.

Q. OK, so companies need both types of data. How should they bring both structured and unstructured data into the AI mix? 

An effective way of making those models more accurate is with RAG. RAG typically enables companies to use their structured and unstructured proprietary data to make generative AI more contextual, timely, trusted, and relevant. It lets an LLM that was trained on public domain data be augmented with a company’s private enterprise knowledge, ensuring greater accuracy, consistency, and relevancy. 

Q. Is RAG a product or a process? How would a company go about implementing it?

Think of it as a concept that improves a user’s trust in an AI model because the model is pulling from recent, relevant data. 

Let’s imagine that a seller is creating a quote for a sales opportunity and needs information from a previous quote — which is in PDF format. To unlock what’s in that PDF, the AI needs a query pipeline that connects it to the PDF, breaks the document down into logical chunks, converts the text and images to numbers, and then indexes those numbers in a database suitable for the model to scan and search when necessary.

Now, the information from the PDF is ready to be used. But that’s only half the equation. That data should only be used when it’s relevant. So let’s say a sales manager wants to find every quote that has a specific termination clause. RAG will need to pull the right data from their opportunities table (which is structured data) and combine it with content from PDF quotes that have similar clauses (unstructured data). Then, using the right security and privacy filters, it can generate an accurate list of quotes with those clauses for the manager. Without that query pipeline, the model isn’t pulling from all the right sources and the manager won’t get a complete list of quotes. 

Q. What is Salesforce’s approach to unstructured data?

We’re making AI easier and more useful for customers by making searching their data easier. The new Data Cloud Vector Database — which builds on Data Cloud — goes further than today’s method of searching CRM through keywords and allows for vector embedding, which means it combines structured and unstructured data seamlessly and transforms it into a numerical representation. This vector embedding format makes it easier for an AI system to process the data, compare it against queries, and respond with a useful answer, meaning customers get more relevant content. 

That’s not all, though. With our Search Index functionality, customers can ingest data, derive knowledge, vectorize, index, and serve unstructured content from a variety of sources such as Salesforce clouds, hyperscaler storage systems, third-party applications, or zero copy partners like Amazon, Snowflake, Google, Databricks, IBM, and more. Customers also benefit from our vector database’s unique multimodality, which means they can interact with an AI chatbot in Spanish and receive an accurate response, even if the knowledge base articles are in English, and help videos are in Japanese. In all of this, we’re giving customers something more than just information — we’re giving them context to unlock another layer of knowledge.

In all of this, we’re giving customers something more than just information — we’re giving them context to unlock another layer of knowledge.

Raveendrnathan Loganathan, EVP, Software Engineering

Q. How can customers use RAG?

Compiling a company’s unstructured and structured data ensures customers have the most relevant information for any enterprise scenario. 

For example, when a furniture company customer calls a service agent for help assembling their purchase, RAG can index and search call logs, social media posts, and knowledge articles to see how other customers solved the issue. This information is then fed to the agent’s Service Cloud console nearly instantly. 

Financial institutions can use RAG to provide real-time information on market or financial data to their employees, who can take that information and blend it with a customer’s own unique banking needs to give them actionable advice based on their situation.

In fact, Salesforce customer Royal Bank of Canada (RBC) U.S. Wealth Management is exploring the use of RAG technology to improve internal processes and provide accurate and up-to-date information to advisors and other employees. The system offers contextual assistance, ensuring personalized support, and continuously learns and improves. RBC anticipates benefits such as improved efficiency, enhanced knowledge sharing, more accurate information, and improved decision-making within the organization.

Q. How does Salesforce improve search with RAG? 

CRM search is based on keywords. The new Data Cloud Vector Database goes further — it also enables customers to perform semantic searches and retrieve information in sales or service workflows based on meaning or intent. Customers can also combine keyword and vector search to enable a hybrid search experience, which gives them more relevant content. This is a big step forward for enterprises. We’re giving customers something more than just information — we’re giving context to unlock another layer of knowledge. 

Q. What should companies do to prepare their data for AI?

First, you have to know where all of your data is and understand its quality. Is it stored in a data lake? Is it trapped in different applications across the company? Is it good enough for your generative AI models? 

Second, you have to ensure your data is fresh, relevant, and retrievable, so you can combine structured and unstructured data for the best outputs.

Finally, you have to activate that data across your applications and build the right pipelines so RAG can pull that data when prompted and provide the answers you need. 

The Einstein 1 Platform and Data Cloud are solutions that help enterprises navigate these steps. Using these tools together, customers can build integrated, federated, intelligent, and actionable solutions across every customer touchpoint while also reducing complexity. It’s the foundation every company needs to succeed in the new AI era.

More information:

Astro

Get the latest Salesforce News