On Evaluating the Generalization of LSTM Models in Formal Languages

Mirac Suzgun; Yonatan Belinkov; Stuart M. Shieber

doi:10.7275/s02b-4d91

Options

Paper

On Evaluating the Generalization of LSTM Models in Formal Languages

Authors

Mirac Suzgun (Harvard University)
Yonatan Belinkov (Harvard University)
Stuart M. Shieber (Harvard University)

Abstract

Recurrent Neural Networks (RNNs) are theoretically Turing-complete and established themselves as a dominant model for language processing. Yet, there still remains an uncertainty regarding their language learning capabilities. In this paper, we empirically evaluate the inductive learning capabilities of Long Short-Term Memory networks, a popular extension of simple RNNs, to learn simple formal languages, in particular aⁿbⁿ, aⁿbⁿcⁿ, and aⁿbⁿcⁿdⁿ. We investigate the influence of various aspects of learning, such as training data regimes and model capacity, on the generalization to unobserved samples. We find striking differences in model performances under different training settings and highlight the need for careful analysis and assessment when making claims about the learning capabilities of neural network models.

Keywords: LSTM, RNN, Long Short-Term Memory network, CSL, CFL, context sensitive, context free, evaluation, generalization, formal languages, distribution, hidden units, window sizes, windows, uniform, u-shaped, beta binomial

How to Cite:

Suzgun, M., Belinkov, Y. & Shieber, S. M., (2019) “On Evaluating the Generalization of LSTM Models in Formal Languages”, Society for Computation in Linguistics 2(1), 277-286. doi: https://doi.org/10.7275/s02b-4d91

Downloads:
Download PDF

457 Views

90 Downloads

Published on
2019-01-01

License

Creative Commons Attribution 4.0

Authors

Mirac Suzgun (Harvard University)
Yonatan Belinkov (Harvard University)
Stuart M. Shieber (Harvard University)

Publication details

Pages: 277-286
Submitted on: 2018-11-02

File Checksums (MD5)

PDF: c7d714bc9119dc1f8ff7e453d7c6572a

On Evaluating the Generalization of LSTM Models in Formal Languages

Abstract

Harvard-Style Citation

Vancouver-Style Citation

APA-Style Citation

Non Specialist Summary