Natural language has long served as a powerful bridge between humans and machines, enabling intuitive interaction and precise control over AI systems. While fields such as image editing, audio synthesis, and video generation thrive on abundant data, text pairs to train a powerful generative model, many critical domains are facing a significant challenge: the scarcity of labeled data. Complex fields such as molecular research, motion generation, and time series often lack sufficient textual annotations, limiting the effectiveness and potential of current generative AI methods.
To overcome this critical gap, our team at Salesforce Research introduces Text2Data, an innovative framework specifically designed to generate high-quality, controllable data from limited textual input. Text2Data effectively addresses the complexities inherent in low-resource scenarios, making generative AI accessible for more specialized, challenging applications.
Why Text2Data Matters
Current methods typically rely on substantial labeled training data to achieve effective text-to-data control. However, practical scenarios involve costly or impractical labeling processes. This primarily restricts supervised learning and limits the use of advanced generative models for text-to-data generation tasks. When training generative models with limited resources, issues like poor generation quality, model overfitting, bias, and lack of diversity arise. Traditional strategies, such as data augmentation and semi-supervised learning, often fall short. This is due to the nuances and ambiguities of text, computational inefficiency, or catastrophic forgetting, where previously learned information deteriorates when new data is introduced.
How Text2Data Works

Figure 1: Overview of Text2Data. The model leverages unlabeled data (i.e., blue module) to discern the overall data distribution while the optimal set of model parameters Θ is obtained. Then the model is fine-tuned on labeled data (i.e., red module) by constraint optimization that gives the optimal set of parameters as Θ ∩ Θ′ , where Θ′ is the optimal set of parameters if fine-tune the model without constraint.
As illustrated in Figure 1, Text2Data introduces a two-step approach leveraging powerful unsupervised diffusion models:
- Unsupervised Distribution Mastery: Using unlabeled data, Text2Data first captures the inherent data distribution without any textual annotations, laying a robust foundation of a base model for the subsequent fine-tuning.
- Controllable Fine-tuning: The model is then carefully fine-tuned using limited textual labels through a novel constraint optimization strategy. This technique ensures the model (parameters) remains close to its originally learned distribution, effectively mitigating catastrophic forgetting.
From a theoretical perspective, our method relies on the following constraint optimization on learning objective:

- The first line is the main learning objective of a controllable generative model.
- The second to the third line enforce the parameter space not to deviate far from the original parameter space learned during pre-training.
Key Innovations
- Unique Constraint-Based Optimization: This method prevents overfitting and ensures the fine-tuned model maintains its integrity, crucially preserving previous learning.
- Theoretical Backing: Provides confidence bounds that enhance our method’s reliability and effectiveness in various low-resource environments.
- Comprehensive Experimentation: Across molecules, human motion, and time series, Text2Data showcases its versatility and superiority over current baselines.
Proven Results

Figure 2: Evaluate controllability on Molecule dataset according to different proportions of paired data. Green solid line corresponds to Text2Data and two dashed lines are baseline comparisons, in which blue line is EDM and orange line is EDM-finetune. Properties of generated molecules are predicted by classifier ϕc. MAE is computed between properties of generated molecules and intended properties. Lower MAE indicates better performance.

Figure 3: Visualization of generated molecules when the polarizability increases from “very low” to “very high” suggested in textual description.
Across diverse datasets—including molecules (QM9), human motions (HumanML3D), and financial time series—Text2Data consistently demonstrates enhanced controllability and superior data quality compared to existing methods. Notably, it outperforms baseline diffusion models significantly in scenarios with sparse textual annotations, underscoring its potential to revolutionize AI applications in specialized domains.
For example, Figure 2 illustrates the MAE trend between properties of generated molecules and the intended one as the proportion of labeled training data rises. Text2Data achieves superior performance than EDM-finetune and EDM (baseline model) on all properties by a remarkable margin. We also depict the molecules generated as the text descriptor for polarizability shifts from “very low” to “very high” in Figure 3. Polarizability indicates the inclination of the molecule to form an electric dipole moment under an external electric field. As α values rise, we expect to see molecules with less symmetrical forms, as evidenced in Figure 3. This trend suggests the validity of generated molecules by Text2Data and its fine-grained controllability. More experimental results on more modalities can be found in our paper.
Conclusion
Text2Data represents a significant leap forward in generative AI, particularly in low-resource scenarios. By effectively leveraging both unlabeled and labeled data, it addresses the critical challenge of data scarcity and enhances the controllability and quality of generated data. This innovative framework not only opens new avenues for research and application in specialized domains but also sets a new standard for generative AI models. As we continue to refine and expand Text2Data, we are confident that it will play a pivotal role in advancing the capabilities of AI systems across a wide range of industries and applications.
Explore More
- Read our paper
- Check out our code on GitHub
- Check out more AI Research blogs
- Salesforce AI Research Website
- Follow us on X: @SFResearch, @Salesforce
Acknowledgments
Full Author List: Shiyu Wang, Yihao Feng, Tian Lan, Ning Yu, Yu Bai, Ran Xu, Huan Wang, Caiming Xiong, Silvio Savarese