Multiple alignments of inflectional paradigms
Abstract
Most models of inflectional morphology rely at their core on the identification of recurrent and diverging material across inflected forms. Across theoretical frameworks, this can be expressed in terms of morpheme segmentation, rules, processes, patterns or analogies. Finding these recurrences in large structured lexicons is an important step in empirical computational morphology, where analyses are induced bottom-up from inflected forms. This can be done by aligning all the forms in each paradigm, a task of Multiple Sequence Alignments which is well known in other fields such as evolutionary biology and historical linguistics. In this paper, we present the specific problems which arise when aligning inflected forms, provide a simple alignment format, define evaluation measures and compare two implemented methods on 13 inflectional lexicons. Our intent is to provide the conditions for the inter-operability of future systems, and for incremental improvements in this fundamental step for quantitative morphology.
Keywords: inflection, paradigms, stem, marker, multiple sequence alignment, MSA, alignment, LCS, longest common subsequence, quantitative, typology
How to Cite:
Beniamine, S. & Guzmán Naranjo, M., (2021) “Multiple alignments of inflectional paradigms”, Society for Computation in Linguistics 4(1), 216-227. doi: https://doi.org/10.7275/ymc0-p491
Downloads:
Download PDF
214 Views
71 Downloads