BLiMP: A Benchmark of Linguistic Minimal Pairs for English

Alex Warstadt; Alicia Parrish; Haokun Liu; Anhad Mohananey; Wei Peng; Sheng-Fu Wang; Samuel R. Bowman

doi:10.7275/zejz-qs04

Options

Abstract

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

Authors

Alex Warstadt (New York University)
Alicia Parrish (New York University)
Haokun Liu (New York University)
Anhad Mohananey (New York University)
Wei Peng (New York University)
Sheng-Fu Wang (New York University)
Samuel R. Bowman (New York University)

Abstract

We introduce BLiMP (The Benchmark of Linguistic Minimal Pairs), a human-solvable challenge set for evaluating language models (LMs) that covers a broad range of major grammatical phenomena in English. BLiMP consists of over 30 datasets, each containing 1000 minimal pairs isolating specific contrasts in syntax, morphology, or semantics. Like GLUE (Wang et al., 2018), BLiMP makes it easy to directly compare models. Evaluating n-gram, LSTM, and Transformer LMs (GPT-2 and TransformerXL), we find that transformers are strongest overall, achieving (near) human performance on agreement and binding. However, phenomena like wh-islands and NPI licensing remain challenging even for state-of-the-art LMs.

Keywords: acceptability, language model, evaluation, transformer, n-gram

How to Cite:

Warstadt, A., Parrish, A., Liu, H., Mohananey, A., Peng, W., Wang, S. & Bowman, S. R., (2020) “BLiMP: A Benchmark of Linguistic Minimal Pairs for English”, Society for Computation in Linguistics 3(1), 437-438. doi: https://doi.org/10.7275/zejz-qs04

Downloads:
Download PDF

431 Views

260 Downloads

Published on
2020-01-01

License

Creative Commons Attribution 4.0

Authors

Alex Warstadt (New York University)
Alicia Parrish (New York University)
Haokun Liu (New York University)
Anhad Mohananey (New York University)
Wei Peng (New York University)
Sheng-Fu Wang (New York University)
Samuel R. Bowman (New York University)

Publication details

Pages: 437-438
Submitted on: 2019-10-16

File Checksums (MD5)

PDF: 66c0e92e93061bc1452f752a1fde052c

BLiMP: A Benchmark of Linguistic Minimal Pairs for English

Abstract

Harvard-Style Citation

Vancouver-Style Citation

APA-Style Citation

Non Specialist Summary