Meaning-Informed Low-Resource Segmentation of Agglutinative Morphology

Caleb Belth

doi:10.7275/scil.2134

Options

Paper

Meaning-Informed Low-Resource Segmentation of Agglutinative Morphology

Author

Caleb Belth (University of Utah)

Abstract

Morphological segmentation is both an interesting acquisition problem and an important task for natural language processing. Most current computational approaches either use supervised machine learningówhich tends to lead to the best-performing modelsóor operate over bare surface forms of words. However, the empirical conditions of language acquisition seem to fall somewhere in between: children do not have access to pre-segmented input, yet their knowledge of morphological structure develops alongside semantic knowledge. Inspired by this, we suggest a simple computational model, which builds on experimental evidence that children can strip a suffix off of closely-related word forms. The model is unsupervised, but is able to exploit features to identify how differences between closely-related surface forms are marked. Trained on hundreds to a few thousand words from languages with agglutinative morphology, the resulting model outperforms an unsupervised model that does not exploit such features, and in some settings even outperforms a supervised model trained on both features and ground-truth segmentations.

Keywords: morphological segmentation, agglutinative morphology, low-resource learning

How to Cite:

Belth, C., (2024) “Meaning-Informed Low-Resource Segmentation of Agglutinative Morphology”, Society for Computation in Linguistics 7(1), 96–106. doi: https://doi.org/10.7275/scil.2134

Downloads:
Download PDF

345 Views

89 Downloads

Published on
2024-06-24

Peer Reviewed

License

Creative Commons Attribution 4.0

Authors

Caleb Belth (University of Utah)

Publication details

Pages: 96–106
Submitted on: 2024-06-10
Accepted on: 2024-06-17

File Checksums (MD5)

PDF: No checksum could be calculated.

Meaning-Informed Low-Resource Segmentation of Agglutinative Morphology

Abstract

Harvard-Style Citation

Vancouver-Style Citation

APA-Style Citation

Non Specialist Summary