Modeling unsupervised phonetic and phonological learning in Generative Adversarial Phonology
- Gašper Beguš (University of Washington)
Abstract
This paper models phonetic and phonological learning as a dependency between random space and generated speech data in the Generative Adversarial Neural network architecture and proposes a methodology to uncover the network’s internal representation that corresponds to phonetic and phonological features. A Generative Adversarial Network (Goodfellow et al. 2014; implemented as WaveGAN for acoustic data by Donahue et al. 2019) was trained on an allophonic distribution in English, where voiceless stops surface as aspirated word-initially before stressed vowels except if preceded by a sibilant [s]. The network successfully learns the allophonic alternation: the network’s generated speech signal contains the conditional distribution of aspiration duration. Additionally, the network generates innovative outputs for which no evidence is available in the training data, suggesting that the network segments continuous speech signal into units that can be productively recombined. The paper also proposes a technique for establishing the network’s internal representations. We identify latent variables that directly correspond to presence of [s] in the output. By manipulating these variables, we actively control the presence of [s], its frication amplitude, and spectral shape of the frication noise in the generated outputs.
Keywords: artificial intelligence, neural networks, generative adversarial networks, language acquisition, speech, phonetic learning, phonological learning, voice onset time, allophonic distribution
How to Cite:
Beguš, G., (2020) “Modeling unsupervised phonetic and phonological learning in Generative Adversarial Phonology”, Society for Computation in Linguistics 3(1), 138-148. doi: https://doi.org/10.7275/nbrf-1a27
Downloads:
Download PDF