My research is centered around explaining decisions from deep networks and fixing them through human feedback in order to make them more interpretable, transparent and unbiased.
TL; DR: We find that current self-supervised learning approaches suffer from poor visual grounding and receive improper supervisory signal when trained on complex scene images. We introduce CAST to improve visual grounding during…