Paper

Meaning-Informed Low-Resource Segmentation of Agglutinative Morphology

Author
  • Caleb Belth (University of Utah)

Abstract

Morphological segmentation is both an interesting acquisition problem and an important task for natural language processing. Most current computational approaches either use supervised machine learningówhich tends to lead to the best-performing modelsóor operate over bare surface forms of words. However, the empirical conditions of language acquisition seem to fall somewhere in between: children do not have access to pre-segmented input, yet their knowledge of morphological structure develops alongside semantic knowledge. Inspired by this, we suggest a simple computational model, which builds on experimental evidence that children can strip a suffix off of closely-related word forms. The model is unsupervised, but is able to exploit features to identify how differences between closely-related surface forms are marked. Trained on hundreds to a few thousand words from languages with agglutinative morphology, the resulting model outperforms an unsupervised model that does not exploit such features, and in some settings even outperforms a supervised model trained on both features and ground-truth segmentations.

Keywords: morphological segmentation, agglutinative morphology, low-resource learning

How to Cite:

Belth, C., (2024) “Meaning-Informed Low-Resource Segmentation of Agglutinative Morphology”, Society for Computation in Linguistics 7(1), 96–106. doi: https://doi.org/10.7275/scil.2134

Downloads:
Download PDF

203 Views

55 Downloads

Published on
24 Jun 2024
Peer Reviewed