Nitish Shirish Keskar

April 20, 2020

Summary We investigate NVIDIA’s Triton (TensorRT) Inference Server as a way of hosting Transformer Language Models. The blog is roughly divided into two parts: (i) instructions for setting up your own inference server,…

AI Research

Nitish Shirish Keskar

November 8, 2017

Most neural architectures for machine translation use an encoder-decoder model consisting of either convolutional or recurrent layers. The encoder layers map the input to a latent space and the decoder, in turn, uses this latent representation to map the inputs to the targets.

Enter a valid e-mail address

Select your Country

Select a state/province

Yes, I would like to receive the Salesforce 360 Highlights newsletter as well as marketing emails regarding Salesforce products, services, and events. I can unsubscribe at any time.

I agree to the Privacy Statement and to the handling of my personal information. In particular, I consent to the transfer of my personal information to other countries, including the United States, for the purpose of hosting and processing the information as set forth in the Privacy Statement. Learn More

I understand that these countries may not have the same data protection laws as the country from which I provide my personal information. For more information, click here.

Please read and agree to the Master Subscription Agreement

By registering, you confirm that you agree to the processing of your personal data by Salesforce as described in the Privacy Statement.