Automating Gloss Generation in Interlinear Glossed Text
- Angelina McMillan-Major (University of Washington, Seattle)
Abstract
Interlinear Glossed Text (IGT) is a rich data type produced by linguists for the purposes of presenting an analysis of a language\'s semantic and grammatical properties. I combine linguistic knowledge and statistical machine learning to develop a system for automatically annotating low-resource language data. I train a generative system for each language using on the order of 1000 IGT. The input to the system is the morphologically segmented source language phrase and its English translation. The system outputs the predicted linguistic annotation for each morpheme of the source phrase. The final system is tested on held-out IGT sets for Abui [abz], Chintang [ctn], and Matsigenka [mcb] and achieves 71.7%, 80.3%, and 84.9% accuracy, respectively.
Keywords: documentary tools, machine learning, IGT
How to Cite:
McMillan-Major, A., (2020) “Automating Gloss Generation in Interlinear Glossed Text”, Society for Computation in Linguistics 3(1), 338-349. doi: https://doi.org/10.7275/tsmk-sa32
Downloads:
Download PDF