Shashi Kumar

I am a Lead ML Engineer at Level AI, where I work on Automatic Speech Recognition (ASR) and Machine Learning.

Previously, I was at Samsung R&D Institute India - Bangalore, where I worked on ASR, Speech Enhancement and Speaker Adaptation under the supervision of Dr. Shakti P. Rath. I also participated in DiCOVA 2021 challenge organized by Interspeech 2021 where the aim was to detect COVID-19 using cough sounds. On final leaderboard, our team was ranked 5th out of 29 teams.

I finished my undergrad from Indian Institute of Technology (IIT), Guwahati with a major in Electronics and Communication Engineering (ECE). I worked on Image Aesthetics and Emotion Analysis in Natural Images for my Bachelor Thesis Project under the supervision of Prof. Amit Sethi. In the Second year, I interned at Chubu University, Japan under the supervision of Prof. Yuji Iwahori. I have also worked on multiple problems in NLP domain.

Email  /  CV  /  Google Scholar  /  Linkedin

profile photo
Research

I am deeply interested in research around Machine Learning and its applications in multiple domains. At present, I am working on multiple things like improving factorization of latent spaces for generative models, improving ASR performance under different conditions, modelling conversational text into DAGs for better understanding.

joint-vae-ff Improved far-field speech recognition using Joint Variational Autoencoder
Shashi Kumar, Shakti P. Rath, Abhishek Pandey
arXiv, 2022
pdf

We propose joint training of acoustic model (AM) with joint VAE based speech enhancement. Approximations made in original joint VAE formulation has been relaxed and their effects on Word Error Rate (WER) has been analyzed.

joint-vae-speaker Speaker Normalization Using Joint Variational Autoencoder
Shashi Kumar, Shakti P. Rath, Abhishek Pandey
Interspeech, 2021
YouTube Video / pdf

We propose to map Speaker Independent (SI) features to Speaker Normalized (SN) space. CMLLR normalized space is chosen as SN space. Additionally, we achieve WER similar as Speaker Adaptive Training (SAT) methods in a single pass decoding.

joint-vae-whisper Whisper Speech Enhancement Using Joint Variational Autoencoder for Improved Speech Recognition
Vikas Agrawal, Shashi Kumar, Shakti P. Rath
Interspeech, 2021
YouTube Video / pdf

To counter lack of formants etc in whisper speech, We propose to map whisper speech to normal speech and train AM jointly with this enhancment model. We show a significant improvement in WER.

dicova2021 SRIB Submission to Interspeech 2021 DiCOVA Challenge
Vishwanath Pratap Singh*, Shashi Kumar*, Ravi Shekhar Jha*, Abhishek Pandey
arXiv, 2021
Challenge Link / Leaderboard

The main aim of this challenge was to detect covid-19 using cough sounds. Our submission which used an ensemble of multiple models alongwith segment and frame level handcrafted features achieved 5th rank out of 29 teams.

hetero-enc Far-Field Speech Enhancement Using Heteroscedastic Autoencoder for Improved Speech Recognition
Shashi Kumar, Shakti P. Rath
Interspeech, 2019
pdf

We propose a more generalized loss based on non-zero mean and heteroscedastic co-variance distribution for the residual variables in regression MLE estimates. We also propose suitable architectures for the final loss. Overall, we show a significant improvement in WER on AMI SDM set.

joint-vae-orig Joint Distribution Learning in the Framework of Variational Autoencoders for Far-Field Speech Enhancement
Mahesh K. Chelimilla, Shashi Kumar, Shakti P. Rath
ASRU, 2019
link

We propose novel modifications in the conventional VAE to model joint distribution of the far-field and close-talk features for a common latent space. We show a significant improvement in WER on AMI SDM set.

pcb-def PCB Defect Classification Using Logical Combination of Segmented Copper and Non-copper Part
Shashi Kumar, Yuji Iwahori, M. K. Bhuyan
CVIP, 2017
pdf

We propose handcrafted features to classify defects in PCBs. The proposed approach is deployed in actual Industrial use.


Huge thanks to Jon Barron for the template.