Emergent Gestural Scores in a Recurrent Neural Network Model of Vowel Harmony
- Caitlin Smith (Johns Hopkins University)
- Charlie O\'Hara (University of Southern California)
- Eric Rosen (Johns Hopkins University)
- Paul Smolensky (Johns Hopkins University)
Abstract
In this paper, we present the results of neural network modeling of speech production. We introduce GestNet, a sequence-to-sequence, encoder-decoder neural network architecture in which a string of input symbols is translated into sequences of vocal tract articulator movements. We train our models to produce movements of lip and tongue body articulators consistent with a pattern of stepwise vowel height harmony. Though we provide our models with no linguistic structure, they reliably learn this harmony pattern. In addition, by probing these models we find evidence of emergent linguistic structure. Specifically, we examine patterns of encoder-decoder attention (degree of influence of specific input segments on model outputs) and find that they resemble the patterns of gestural activation assumed within the Gestural Harmony Model, a model of harmony built upon the representations of Articulatory Phonology. This result is significant as it lends support to one of the central claims of the Gestural Harmony Model: that harmony is the result of the harmony-triggering gestures extending to overlap the gestures of surrounding segments.
Keywords: vowel harmony, gestural phonology, neural network, speech production, RNN
How to Cite:
Smith, C., O\'Hara, C., Rosen, E. & Smolensky, P., (2021) “Emergent Gestural Scores in a Recurrent Neural Network Model of Vowel Harmony”, Society for Computation in Linguistics 4(1), 61-70. doi: https://doi.org/10.7275/qyey-4j04
Downloads:
Download PDF