Extended Abstract

Reconciling Historical Data and Modern Computational Models in Corpus Creation

Author
  • Joseph Rhyne (Cornell University)

Abstract

Historical linguistics has been greatly aided by digital corpora, and the modern computational models for corpus creation have achieved unprecedented success. However, they are essentially incompatible with limited historical data: the amount of data needed to train the neural network taggers is not available for these languages. To address this problem, this paper develops an approach to historical corpus creation that uses methods for low-resource languages, such as model transfer (Fang and Cohn 2017), and exploits the relationships between past languages and their modern descendants. Here, we achieve a first pass POS tagging in a pipeline for historical corpus creation.

Keywords: Historical Linguistics, Corpus Linguistics, Natural Language Processing, Low-Resource Language, Model Transfer, Slavic, Computational Linguistics

How to Cite:

Rhyne, J., (2020) “Reconciling Historical Data and Modern Computational Models in Corpus Creation”, Society for Computation in Linguistics 3(1), 470-473. doi: https://doi.org/10.7275/dnn5-xk94

Downloads:
Download PDF

104 Views

28 Downloads

Published on
01 Jan 2020