Research
I am deeply interested in research around Machine Learning and its applications in multiple domains. At present, I am working on multiple things like improving factorization of latent spaces for generative models,
improving ASR performance under different conditions, modelling conversational text into DAGs for better understanding.
|
|
Improved far-field speech recognition using Joint Variational Autoencoder
Shashi Kumar, Shakti P. Rath, Abhishek Pandey
arXiv, 2022
pdf
We propose joint training of acoustic model (AM) with joint VAE based speech enhancement. Approximations made in original joint VAE formulation has been relaxed and their effects on Word Error Rate (WER) has been analyzed.
|
|
Speaker Normalization Using Joint Variational Autoencoder
Shashi Kumar, Shakti P. Rath, Abhishek Pandey
Interspeech, 2021
YouTube Video / pdf
We propose to map Speaker Independent (SI) features to Speaker Normalized (SN) space. CMLLR normalized space is chosen as SN space. Additionally, we achieve WER similar as Speaker Adaptive Training (SAT) methods in a single pass decoding.
|
|
Whisper Speech Enhancement Using Joint Variational Autoencoder for Improved Speech Recognition
Vikas Agrawal, Shashi Kumar, Shakti P. Rath
Interspeech, 2021
YouTube Video / pdf
To counter lack of formants etc in whisper speech, We propose to map whisper speech to normal speech and train AM jointly with this enhancment model. We show a significant improvement in WER.
|
|
SRIB Submission to Interspeech 2021 DiCOVA Challenge
Vishwanath Pratap Singh*, Shashi Kumar*, Ravi Shekhar Jha*, Abhishek Pandey
arXiv, 2021
Challenge Link / Leaderboard
The main aim of this challenge was to detect covid-19 using cough sounds. Our submission which used an ensemble of multiple models alongwith segment and frame level handcrafted features achieved 5th rank out of 29 teams.
|
|
Far-Field Speech Enhancement Using Heteroscedastic Autoencoder for Improved Speech Recognition
Shashi Kumar, Shakti P. Rath
Interspeech, 2019
pdf
We propose a more generalized loss based on non-zero mean and heteroscedastic co-variance distribution for the residual variables in regression MLE estimates. We also propose suitable architectures for the final loss. Overall, we show a significant improvement in WER on AMI SDM set.
|
|
Joint Distribution Learning in the Framework of Variational Autoencoders for Far-Field Speech Enhancement
Mahesh K. Chelimilla, Shashi Kumar, Shakti P. Rath
ASRU, 2019
link
We propose novel modifications in the conventional VAE to model joint distribution of the far-field and close-talk features for a common latent space. We show a significant improvement in WER on AMI SDM set.
|
|
PCB Defect Classification Using Logical Combination of Segmented Copper and Non-copper Part
Shashi Kumar, Yuji Iwahori, M. K. Bhuyan
CVIP, 2017
pdf
We propose handcrafted features to classify defects in PCBs. The proposed approach is deployed in actual Industrial use.
|
|