Subject-verb Agreement with Seq2Seq Transformers: Bigger Is Better, but Still Not Best

Michael A Wilson; Zhenghao Zhou; Robert Frank

doi:10.7275/d5gb-v650

Options

Paper

Subject-verb Agreement with Seq2Seq Transformers: Bigger Is Better, but Still Not Best

Authors

Michael A Wilson (Yale University)
Zhenghao Zhou (Yale University)
Robert Frank (Yale University)

Abstract

Past work (Linzen et al., 2016; Goldberg, 2019, a.o.) has used the performance of neural network language models on subject-verb agreement to argue that such models possess structure-sensitive grammatical knowledge. We investigate what properties of the model or of the training regimen are implicated in such success in sequence to sequence transformer models that use the T5 architecture (Raffel et al., 2019; Tay et al., 2021). We find that larger models exhibit improved performance, especially in sentences with singular subjects. We also find that larger pre-training datasets are generally associated with higher performance, though models trained with less complex language (e.g., CHILDES, Simple English Wikipedia) can show more errors when trained with larger datasets. Finally, we show that a model\'s ability to replicate psycholinguistic results does not correspondingly improve with more parameters or more training data: none of the models we study displays a fully convincing replication of the hierarchically-informed pattern of agreement behavior observed in human experiments.

Keywords: subject-verb agreement, transformer language models, sequence to sequence models, agreement attraction

How to Cite:

Wilson, M. A., Zhou, Z. & Frank, R., (2023) “Subject-verb Agreement with Seq2Seq Transformers: Bigger Is Better, but Still Not Best”, Society for Computation in Linguistics 6(1), 278-288. doi: https://doi.org/10.7275/d5gb-v650

Downloads:
Download PDF

308 Views

79 Downloads

Published on
2023-06-01

License

Creative Commons Attribution 4.0

Authors

Michael A Wilson (Yale University)
Zhenghao Zhou (Yale University)
Robert Frank (Yale University)

Publication details

Pages: 278-288
Submitted on: 2023-05-15

File Checksums (MD5)

PDF: ae7c16aa9701a0fa9b61d60a0a02e290

Subject-verb Agreement with Seq2Seq Transformers: Bigger Is Better, but Still Not Best

Abstract

Harvard-Style Citation

Vancouver-Style Citation

APA-Style Citation

Non Specialist Summary