
Semih Yavuz
author title Research DirectorSemih Yavuz is a Research Director at Salesforce AI Research, leading a team focused on improving the factuality, groundedness, and reasoning capabilities of large language models in knowledge-intensive applications. His work involves developing state-of-the-art embedding and re-ranker models for knowledge retrieval across diverse domains, including code, multi-modal, and multilingual contexts, while refining retrieval-augmented generation (RAG) by enhancing how LLMs consume and integrate knowledge in complex reasoning. His team is focused on pushing the boundaries of the research to develop accurate, scalable, and reliable AI systems and driving product impact with them in the CRM domain.


The SFR-Embedding-Mistral marks a significant advancement in text-embedding models, building upon the solid foundations of E5-mistral-7b-instruct and Mistral-7B-v0.1.

World’s #1 CRM introduces its first sales LLM Sales reps are constantly on the move, transitioning from one customer site to another, with meetings scheduled back-to-back. The demands of managing a complex pipeline…

TLDR We trained a series of 7B LLMs named XGen-7B with standard dense attention on up to 8K sequence length for up to 1.5T tokens. We also fine tune the models on public-domain…
Lead Author: Xi Ye TL;DR: We propose RnG-KBQA, a Rank-and-Generate Approach for Question Answering over Knowledge Bases, which enables answering natural language questions over large-scale knowledge bases. Our approach is capable of answering…
TL;DR: We propose controllable counterfactuals (CoCo) to evaluate dialogue state tracking (DST) models on novel scenarios, which results in significant performance drop of up to 30.8% for state-of-the-art DST models. Using CoCo for…