Analyzing Whisper’s Representation Learning of Prosodic Stress Patterns
- Samuel S Sohn (Rutgers University)
- Kavindya Dalawella (Rutgers University)
- Sten Knutsen (Rutgers University)
- Karin Stromswold (Rutgers University)
Abstract
Prosody serves as a critical cue in spoken language, disambiguating meaning, signaling communicative intent, and shaping linguistic interpretation. While recent work has shown that large speech models can accurately annotate stress, less is known about how different stress types are internally represented. This paper investigates how OpenAI's Whisper organizes prosodic stress in its latent space by analyzing representations from models fine-tuned on phrasal, lexical, and contrastive stress productions. Using minimal pairs with controlled segmental overlap, we quantify segmental and suprasegmental structure via silhouette analysis and examine cross-stress relationships using Fused Gromov-Wasserstein distances and manifold visualization. We find a principled dissociation: phrasal stress is encoded primarily segmentally, exhibiting weak and variable suprasegmental structure, while lexical and contrastive stress show strong, shared suprasegmental organization. These representational asymmetries explain observed patterns of transfer and interference across stress types in downstream classification performance. Overall, Whisper's representations reflect an implicit optimization for the most statistically reliable acoustic cues, paralleling accounts of statistical learning in phonological acquisition. Our results demonstrate how large speech models can serve as tools for probing the structure of prosodic representations and offer computational evidence for distinctions among stress types central to phonological theory.
Keywords: prosodic stress, automatic speech recognition, representation learning, acoustic realization
How to Cite:
Sohn, S. S., Dalawella, K., Knutsen, S. & Stromswold, K., (2026) “Analyzing Whisper’s Representation Learning of Prosodic Stress Patterns”, Proceedings of the Annual Meetings on Phonology 2(1). doi: https://doi.org/10.7275/amphonology.3707
Downloads:
Download PDF
122 Views
31 Downloads