Paper
Authors: Jillian K Da Costa (University at Buffalo) , Rui P Chaves (University at Buffalo)
Filler-gap dependencies are among the most challenging syntactic constructions for com- putational models at large. Recently, Wilcox et al. (2018) and Wilcox et al. (2019b) provide some evidence suggesting that large-scale general-purpose LSTM RNNs have learned such long-distance filler-gap dependencies. In the present work we provide evidence that such models learn filler-gap dependencies only very imperfectly, despite being trained on massive amounts of data. Finally, we compare the LSTM RNN models with more modern state-of-the-art Transformer models, and find that these have poor-to-mixed degrees of success, despite their sheer size and low perplexity.
Keywords: GPT-2, BERT, XLNet, TransformerXL, Surprisal, Filler-gap Dependencies
How to Cite: Da Costa, J. K. & Chaves, R. P. (2020) “Assessing the ability of Transformer-based Neural Models to represent structurally unbounded dependencies”, Society for Computation in Linguistics. 3(1). doi: https://doi.org/10.7275/3sb6-4g20