Shashi Kumar

I am a Lead ML Engineer at Level AI, where I work on Automatic Speech Recognition (ASR) and Machine Learning.

Previously, I was at Samsung R&D Institute India - Bangalore, where I worked on ASR, Speech Enhancement and Speaker Adaptation under the supervision of Dr. Shakti P. Rath. I also participated in DiCOVA 2021 challenge organized by Interspeech 2021 where the aim was to detect COVID-19 using cough sounds. On final leaderboard, our team was ranked 5th out of 29 teams.

I finished my undergrad from Indian Institute of Technology (IIT), Guwahati with a major in Electronics and Communication Engineering (ECE). I worked on Image Aesthetics and Emotion Analysis in Natural Images for my Bachelor Thesis Project under the supervision of Prof. Amit Sethi. In the Second year, I interned at Chubu University, Japan under the supervision of Prof. Yuji Iwahori. I have also worked on multiple problems in NLP domain.

Email / CV / Google Scholar / Linkedin

Research

I am deeply interested in research around Machine Learning and its applications in multiple domains. At present, I am working on multiple things like improving factorization of latent spaces for generative models, improving ASR performance under different conditions, modelling conversational text into DAGs for better understanding.

	Improved far-field speech recognition using Joint Variational Autoencoder Shashi Kumar, Shakti P. Rath, Abhishek Pandey arXiv, 2022 pdf We propose joint training of acoustic model (AM) with joint VAE based speech enhancement. Approximations made in original joint VAE formulation has been relaxed and their effects on Word Error Rate (WER) has been analyzed.
	Speaker Normalization Using Joint Variational Autoencoder Shashi Kumar, Shakti P. Rath, Abhishek Pandey Interspeech, 2021 YouTube Video / pdf We propose to map Speaker Independent (SI) features to Speaker Normalized (SN) space. CMLLR normalized space is chosen as SN space. Additionally, we achieve WER similar as Speaker Adaptive Training (SAT) methods in a single pass decoding.
	Whisper Speech Enhancement Using Joint Variational Autoencoder for Improved Speech Recognition Vikas Agrawal, Shashi Kumar, Shakti P. Rath Interspeech, 2021 YouTube Video / pdf To counter lack of formants etc in whisper speech, We propose to map whisper speech to normal speech and train AM jointly with this enhancment model. We show a significant improvement in WER.
	SRIB Submission to Interspeech 2021 DiCOVA Challenge Vishwanath Pratap Singh, Shashi Kumar, Ravi Shekhar Jha, Abhishek Pandey arXiv, 2021* Challenge Link / Leaderboard The main aim of this challenge was to detect covid-19 using cough sounds. Our submission which used an ensemble of multiple models alongwith segment and frame level handcrafted features achieved 5th rank out of 29 teams.
	Far-Field Speech Enhancement Using Heteroscedastic Autoencoder for Improved Speech Recognition Shashi Kumar, Shakti P. Rath Interspeech, 2019 pdf We propose a more generalized loss based on non-zero mean and heteroscedastic co-variance distribution for the residual variables in regression MLE estimates. We also propose suitable architectures for the final loss. Overall, we show a significant improvement in WER on AMI SDM set.
	Joint Distribution Learning in the Framework of Variational Autoencoders for Far-Field Speech Enhancement Mahesh K. Chelimilla, Shashi Kumar, Shakti P. Rath ASRU, 2019 link We propose novel modifications in the conventional VAE to model joint distribution of the far-field and close-talk features for a common latent space. We show a significant improvement in WER on AMI SDM set.
	PCB Defect Classification Using Logical Combination of Segmented Copper and Non-copper Part Shashi Kumar, Yuji Iwahori, M. K. Bhuyan CVIP, 2017 pdf We propose handcrafted features to classify defects in PCBs. The proposed approach is deployed in actual Industrial use.

Huge thanks to Jon Barron for the template.