Paper

Subject-verb Agreement with Seq2Seq Transformers: Bigger Is Better, but Still Not Best

Authors
  • Michael A Wilson (Yale University)
  • Zhenghao Zhou (Yale University)
  • Robert Frank (Yale University)

Abstract

Past work (Linzen et al., 2016; Goldberg, 2019, a.o.) has used the performance of neural network language models on subject-verb agreement to argue that such models possess structure-sensitive grammatical knowledge. We investigate what properties of the model or of the training regimen are implicated in such success in sequence to sequence transformer models that use the T5 architecture (Raffel et al., 2019; Tay et al., 2021). We find that larger models exhibit improved performance, especially in sentences with singular subjects. We also find that larger pre-training datasets are generally associated with higher performance, though models trained with less complex language (e.g., CHILDES, Simple English Wikipedia) can show more errors when trained with larger datasets. Finally, we show that a model\'s ability to replicate psycholinguistic results does not correspondingly improve with more parameters or more training data: none of the models we study displays a fully convincing replication of the hierarchically-informed pattern of agreement behavior observed in human experiments.

Keywords: subject-verb agreement, transformer language models, sequence to sequence models, agreement attraction

How to Cite:

Wilson, M. A., Zhou, Z. & Frank, R., (2023) “Subject-verb Agreement with Seq2Seq Transformers: Bigger Is Better, but Still Not Best”, Society for Computation in Linguistics 6(1), 278-288. doi: https://doi.org/10.7275/d5gb-v650

Downloads:
Download PDF

123 Views

33 Downloads

Published on
01 Jun 2023