Paper

Phonologically Informed Edit Distance Algorithms for Word Alignment with Low-Resource Languages

Authors
  • Richard T McCoy (Johns Hopkins University)
  • Robert Frank (Yale University)

Abstract

We present three methods for weighting edit distance algorithms based on linguistic information. These methods base their penalties on (i) phonological features, (ii) distributional character embeddings, or (iii) differences between cognate words. We also introduce a novel method for evaluating edit distance through the task of low-resource word alignment by using edit-distance neighbors in a high-resource pivot language to inform alignments from the low-resource language. At this task, the cognate-based scheme outperforms our other methods and the Levenshtein edit distance baseline, showing that NLP applications can benefit from information about cross-linguistic phonological patterns.

Keywords: edit distance, word alignment, low-resource, cognate detection

How to Cite:

McCoy, R. T. & Frank, R., (2018) “Phonologically Informed Edit Distance Algorithms for Word Alignment with Low-Resource Languages”, Society for Computation in Linguistics 1(1), 102-112. doi: https://doi.org/10.7275/R5251GC0

Downloads:
Download PDF

86 Views

28 Downloads

Published on
01 Jan 2018