Paper

Automating Gloss Generation in Interlinear Glossed Text

Author
  • Angelina McMillan-Major (University of Washington, Seattle)

Abstract

Interlinear Glossed Text (IGT) is a rich data type produced by linguists for the purposes of presenting an analysis of a language\'s semantic and grammatical properties. I combine linguistic knowledge and statistical machine learning to develop a system for automatically annotating low-resource language data. I train a generative system for each language using on the order of 1000 IGT. The input to the system is the morphologically segmented source language phrase and its English translation. The system outputs the predicted linguistic annotation for each morpheme of the source phrase. The final system is tested on held-out IGT sets for Abui [abz], Chintang [ctn], and Matsigenka [mcb] and achieves 71.7%, 80.3%, and 84.9% accuracy, respectively.

Keywords: documentary tools, machine learning, IGT

How to Cite:

McMillan-Major, A., (2020) “Automating Gloss Generation in Interlinear Glossed Text”, Society for Computation in Linguistics 3(1), 338-349. doi: https://doi.org/10.7275/tsmk-sa32

Downloads:
Download PDF

98 Views

32 Downloads

Published on
01 Jan 2020