
Shafiq Joty
author title Senior Director, ResearchShafiq (raihanjoty.github.io) directs the NLP group's work on large language modeling (LLM) and generative AI. Some of his group's recent projects include SFR-RAG, SFR-Judge, SFR-RAG-Agent and xGen. He is also a tenured Associate Professor (currently on leave) in the School of Computer Science and Engineering (SCSE) at NTU. He was a founding manager of the Salesforce Research Asia (Singapore) lab. His research contributed to 35+ patents and more than 170+ papers in top-tier NLP and ML conferences and journals. He severed as a PC chair of SIGDIAL-2023, best paper award committee of ICLR-23, NAACL-22 and a (senior) area chair for all the NLP and ML conferences.


The SFR-Embedding-Mistral marks a significant advancement in text-embedding models, building upon the solid foundations of E5-mistral-7b-instruct and Mistral-7B-v0.1.

As the development and deployment of large language models (LLMs) accelerates, evaluating model outputs has become increasingly important. The established method of evaluating responses typically involves recruiting and training human evaluators, having them…

Retrieval Augmented Generation (RAG) has not only gained steam as one of the most invested areas of research in generative AI but also gathered considerable popularity and commercialization opportunities. RAG is typically applied…

TL;DR: With CodeChain, a pretrained large language model (LLM) can solve challenging coding problems by integrating modularity in generation samples and self-improve by employing a chain of self-revisions on representative sub-modules. CodeChain can…

TLDR We trained a series of 7B LLMs named XGen-7B with standard dense attention on up to 8K sequence length for up to 1.5T tokens. We also fine tune the models on public-domain…