author name
Akhilesh Deepak Gotmare
TL;DR: CodeRL is a new framework for program synthesis through holistic integration of pretrained language models and deep reinforcement learning. By utilizing unit test feedback as part of model training and inference, and…
We use smaller language models as generative classifiers to guide generation from larger language models. We show that this method can make generations friendlier, reduce bias and toxicity, and achieve zero-shot controllable generation of unseen topics.