Paper

The stability of segmental properties across genre and corpus types in low-resource languages

Authors
  • Uriel Cohen Priva (Brown University)
  • Shiying Yang (Brown University)
  • Emily Strand (Brown University)

Abstract

Are written corpora useful for phonological research? Word frequency lists for low-resource languages have become ubiquitous in recent years (Scannell, 2007). For many languages there is direct correspondence between their written forms and their alphabets, but it is not clear whether written corpora can adequately represent language use. We use 15 low-resource languages and compare several information-theoretic properties across three corpus types. We show that despite differences in origin and genre, estimates in one corpus are highly correlated with estimates in other corpora.

Keywords: corpus, phonology, low-resource, stability

How to Cite:

Cohen Priva, U., Yang, S. & Strand, E., (2020) “The stability of segmental properties across genre and corpus types in low-resource languages”, Society for Computation in Linguistics 3(1), 1-9. doi: https://doi.org/10.7275/fttf-fq95

Downloads:
Download PDF

135 Views

40 Downloads

Published on
01 Jan 2020