Reconciling Historical Data and Modern Computational Models in Corpus Creation
- Joseph Rhyne (Cornell University)
Abstract
Historical linguistics has been greatly aided by digital corpora, and the modern computational models for corpus creation have achieved unprecedented success. However, they are essentially incompatible with limited historical data: the amount of data needed to train the neural network taggers is not available for these languages. To address this problem, this paper develops an approach to historical corpus creation that uses methods for low-resource languages, such as model transfer (Fang and Cohn 2017), and exploits the relationships between past languages and their modern descendants. Here, we achieve a first pass POS tagging in a pipeline for historical corpus creation.
Keywords: Historical Linguistics, Corpus Linguistics, Natural Language Processing, Low-Resource Language, Model Transfer, Slavic, Computational Linguistics
How to Cite:
Rhyne, J., (2020) “Reconciling Historical Data and Modern Computational Models in Corpus Creation”, Society for Computation in Linguistics 3(1), 470-473. doi: https://doi.org/10.7275/dnn5-xk94
Downloads:
Download PDF