Paper

Stranger than Paradigms: Word Embedding Benchmarks Don't Align With Morphology

Authors
  • Timothee Mickus (University of Helsinki)
  • Maria Copot (Ohio State University)

Abstract

Word embeddings have proven a boon in NLP in general, and computational approaches to morphology in particular. However, methods to assess the quality of a word embedding model only tangentially target morphological knowledge, which may lead to suboptimal model selection and biased conclusions in research that employs word embeddings to investigate morphology.
In this paper, we empirically test this hypothesis by exhaustively evaluating 1,200 French models with varying hyperparameters on 14 different tasks.
Models that perform well on morphology tasks tend to differ from those which succeed on more traditional benchmarks.
An especially critical hyperparameter appears to be the negative sampling distribution smoothing exponent: Our study suggest that the common practice of setting it to 0.75 is not appropriate: its optimal value depends on the type of linguistic knowledge being tested.

Keywords: word embeddings, distributional semantics, morphology

How to Cite:

Mickus, T. & Copot, M., (2024) “Stranger than Paradigms: Word Embedding Benchmarks Don't Align With Morphology”, Society for Computation in Linguistics 7(1), 173–189. doi: https://doi.org/10.7275/scil.2142

Downloads:
Download PDF

52 Views

13 Downloads

Published on
24 Jun 2024
Peer Reviewed