Reconciling Historical Data and Modern Computational Models in Corpus Creation

Joseph Rhyne

doi:10.7275/dnn5-xk94

Options

Extended Abstract

Reconciling Historical Data and Modern Computational Models in Corpus Creation

Author

Joseph Rhyne (Cornell University)

Abstract

Historical linguistics has been greatly aided by digital corpora, and the modern computational models for corpus creation have achieved unprecedented success. However, they are essentially incompatible with limited historical data: the amount of data needed to train the neural network taggers is not available for these languages. To address this problem, this paper develops an approach to historical corpus creation that uses methods for low-resource languages, such as model transfer (Fang and Cohn 2017), and exploits the relationships between past languages and their modern descendants. Here, we achieve a first pass POS tagging in a pipeline for historical corpus creation.

Keywords: Historical Linguistics, Corpus Linguistics, Natural Language Processing, Low-Resource Language, Model Transfer, Slavic, Computational Linguistics

How to Cite:

Rhyne, J., (2020) “Reconciling Historical Data and Modern Computational Models in Corpus Creation”, Society for Computation in Linguistics 3(1), 470-473. doi: https://doi.org/10.7275/dnn5-xk94

Downloads:
Download PDF

390 Views

126 Downloads

Published on
2020-01-01

License

Creative Commons Attribution 4.0

Authors

Joseph Rhyne (Cornell University)

Publication details

Pages: 470-473
Submitted on: 2019-10-15

File Checksums (MD5)

PDF: 4022d5fe56c2e87a710936eabfc42382

Reconciling Historical Data and Modern Computational Models in Corpus Creation

Abstract

Harvard-Style Citation

Vancouver-Style Citation

APA-Style Citation

Non Specialist Summary