Paper

Modeling unsupervised phonetic and phonological learning in Generative Adversarial Phonology

Author
  • Gašper Beguš (University of Washington)

Abstract

This paper models phonetic and phonological learning as a dependency between random space and generated speech data in the Generative Adversarial Neural network architecture and proposes a methodology to uncover the network’s internal representation that corresponds to phonetic and phonological features. A Generative Adversarial Network (Goodfellow et al. 2014; implemented as WaveGAN for acoustic data by Donahue et al. 2019) was trained on an allophonic distribution in English, where voiceless stops surface as aspirated word-initially before stressed vowels except if preceded by a sibilant [s]. The network successfully learns the allophonic alternation: the network’s generated speech signal contains the conditional distribution of aspiration duration. Additionally, the network generates innovative outputs for which no evidence is available in the training data, suggesting that the network segments continuous speech signal into units that can be productively recombined. The paper also proposes a technique for establishing the network’s internal representations. We identify latent variables that directly correspond to presence of [s] in the output. By manipulating these variables, we actively control the presence of [s], its frication amplitude, and spectral shape of the frication noise in the generated outputs.

Keywords: artificial intelligence, neural networks, generative adversarial networks, language acquisition, speech, phonetic learning, phonological learning, voice onset time, allophonic distribution

How to Cite:

Beguš, G., (2020) “Modeling unsupervised phonetic and phonological learning in Generative Adversarial Phonology”, Society for Computation in Linguistics 3(1), 138-148. doi: https://doi.org/10.7275/nbrf-1a27

Downloads:
Download PDF

135 Views

36 Downloads

Published on
01 Jan 2020