Stranger than Paradigms: Word Embedding Benchmarks Don't Align With Morphology

Timothee Mickus; Maria Copot

doi:10.7275/scil.2142

Options

Paper

Stranger than Paradigms: Word Embedding Benchmarks Don't Align With Morphology

Authors

Timothee Mickus (University of Helsinki)
Maria Copot (Ohio State University)

Abstract

Word embeddings have proven a boon in NLP in general, and computational approaches to morphology in particular. However, methods to assess the quality of a word embedding model only tangentially target morphological knowledge, which may lead to suboptimal model selection and biased conclusions in research that employs word embeddings to investigate morphology.
In this paper, we empirically test this hypothesis by exhaustively evaluating 1,200 French models with varying hyperparameters on 14 different tasks.
Models that perform well on morphology tasks tend to differ from those which succeed on more traditional benchmarks.
An especially critical hyperparameter appears to be the negative sampling distribution smoothing exponent: Our study suggest that the common practice of setting it to 0.75 is not appropriate: its optimal value depends on the type of linguistic knowledge being tested.

Keywords: word embeddings, distributional semantics, morphology

How to Cite:

Mickus, T. & Copot, M., (2024) “Stranger than Paradigms: Word Embedding Benchmarks Don't Align With Morphology”, Society for Computation in Linguistics 7(1), 173–189. doi: https://doi.org/10.7275/scil.2142

Downloads:
Download PDF

360 Views

123 Downloads

Published on
2024-06-24

Peer Reviewed

License

Creative Commons Attribution 4.0

Authors

Timothee Mickus (University of Helsinki)
Maria Copot (Ohio State University)

Publication details

Pages: 173–189
Submitted on: 2024-06-11
Accepted on: 2024-06-17

File Checksums (MD5)

PDF: No checksum could be calculated.

Stranger than Paradigms: Word Embedding Benchmarks Don't Align With Morphology

Abstract

Harvard-Style Citation

Vancouver-Style Citation

APA-Style Citation

Non Specialist Summary