Self-Supervised Speech Representations in a Pre-train Speech Model Represent Key Rapid Automatized Naming Variability in Autism
Abstract
Individuals with autism experience significant yet nuanced difficulties with pragmatic language, such as differences in speech prosody (e.g., rate, rhythm, intonation) and speech-gaze coordination, which can be challenging to measure quantitatively without powerful tools. Thus, fine-grained, comprehensive characterization of speech and related cognitive-linguistic skills is important for informing intervention strategies grounded in a clearer etiological understanding of pragmatic differences in autism. This study used Hidden-unit Bidirectional Encoder Representations from Transformers (HuBERT), a state-of-the-art, self-supervised, pre-trained speech model, to represent speech and gaze differences during a rapid naming task (RAN) in autistic individuals relative to non-autistic controls. Using Pearson’s correlations, average distance metrics were analyzed for associations with acoustic (e.g., speech rate), performance-based (e.g., naming time), and gaze metrics (e.g., regressions) of RAN, to examine the potential link between HuBERT distance measures and the attentional coordination of speech and gaze. Analyses revealed that the HuBERT distance metric was significantly correlated with more speech errors, slower speech rate, longer naming time, and more visual regressions. Together, findings demonstrate the utility of self-supervised pre-trained speech models, such as HuBERT, to capture meaningful variability in the cognitive-linguistic patterns of autism without need for pre-defined acoustic features or speech-to-text alignment.
Keywords: autism, self-supervised speech, speech, HuBERT
How to Cite:
Ethridge, S., Lau, J., Chernyak, B. R., Voigt, R., Goldrick, M., Keshet, J. & Losh, M., (2025) “Self-Supervised Speech Representations in a Pre-train Speech Model Represent Key Rapid Automatized Naming Variability in Autism”, Society for Computation in Linguistics 8(1): 42. doi: https://doi.org/10.7275/scil.3173
Downloads:
Download PDF
31 Views
9 Downloads