Shashi Kumar

Ph.D. Student — EPFL & Idiap Research Institute, Switzerland

I am a third-year Ph.D. student at École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland and Idiap Research Institute, Martigny, Switzerland, advised by Prof. Andrea Cavallaro and Prof. Petr Motlíček. My research focuses on efficient adaptation of foundation models across speech and language. Before the Ph.D., I worked in industry building and deploying production ASR systems at Samsung Research and Level AI.

Shashi Kumar

Efficient Adaptation of Foundation Models

Parameter-efficient fine-tuning with factorized latent spaces. My work on FVAE-LoRA (NeurIPS 2025) factorizes task-relevant and residual features in low-rank adapters to improve robustness across modalities.

Speech & Language with LLMs

Understanding how LLMs can be coupled with speech encoders for ASR. I study robustness under domain shift , prompt sensitivity and modality compression in SpeechLLM architectures.

Multitask & Unified Speech Models

Building single models that jointly handle transcription, speaker change detection, endpointing, and entity recognition. Replacing fragile cascaded pipelines (TokenVerse, EMNLP 2024; TokenVerse++, ASRU 2025).

Sequence Alignment with Optimal Transport

A novel differentiable sequence-alignment framework based on 1D optimal transport, enabling the model to learn a single alignment and perform ASR in an E2E manner.

For a full list, see my Google Scholar profile.

2025 NeurIPS ★ Idiap PhD Paper Award
Latent Space Factorization in LoRA
S. Kumar, Y. Kaloga, J. Mitros, P. Motlicek, and I. Kodrasi
Proposed FVAE-LoRA, a parameter-efficient fine-tuning method that separates task-relevant and residual features, improving robustness across image, text, and audio tasks.
2025 arXiv Preprint
A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport
Y. Kaloga*, S. Kumar*, P. Motlicek, and I. Kodrasi
Introduced a novel differentiable sequence-alignment framework based on 1D optimal transport, enabling end-to-end ASR with learned alignments.
2025 ICASSP — SALMA Workshop ★ Best Paper Award
Performance Evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
S. Kumar, I. Thorbecke, S. Burdisso, E. Villatoro-Tello, M. K E, K. Hacioglu, P. Rangappa, P. Motlicek, A. Ganapathiraju, and A. Stolcke
Evaluated SLAM-ASR across domain shifts and speech perturbations, identifying robustness gaps and providing guidance for reliable LLM-based ASR.
2024 EMNLP
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR
S. Kumar, S. Madikeri, J. Pablo Zuluaga Gomez, I. Thorbecke, E. Villatoro-tello, S. Burdisso, P. Motlicek, K. Pandia D S, and A. Ganapathiraju
A transducer-based framework unifying transcription, speaker change detection, endpointing, and NER in a single model, outperforming cascaded pipelines.
2026 ICASSP
Reducing Prompt Sensitivity in LLM-based Speech Recognition Through Learnable Projection
S. Burdisso, E. Villatoro-Tello, S. Kumar, S. Madikeri, A. Carofilis, P. Rangappa, M. K E, K. Hacioglu, P. Motlicek, and A. Stolcke
2025 ICASSP
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models
S. Kumar, S. Madikeri, J. Pablo Zuluaga Gomez, E. Villatoro-Tello, I. Thorbecke, P. Motlicek, M. K E, and A. Ganapathiraju
Best Paper Award — SALMA Workshop at ICASSP 2025
Idiap PhD Paper Award — for "Latent Space Factorization in LoRA" (NeurIPS 2025)
Reviewer for Interspeech 2025 and ARR cycles.
Presented at NeurIPS 2025, Interspeech 2025, and EMNLP 2024.
Aug 2023 – Present

Doctor of Philosophy (Ph.D.)

EPFL, Lausanne & Idiap Research Institute, Martigny, Switzerland
Advisors: Prof. Andrea Cavallaro, Prof. Petr Motlíček · Expected: Aug 2027
Aug 2013 – May 2017

Bachelor of Technology (B.Tech.)

Indian Institute of Technology (IIT) Guwahati, India
Electronics and Communication Engineering (ECE)