Unsupervised Formal Grammar Induction with Confidence

Jacob Collard

doi:10.7275/5qfp-sg41

Options

Paper

Unsupervised Formal Grammar Induction with Confidence

Author

Jacob Collard (Cornell University)

Abstract

I present a novel algorithm for minimally supervised formal grammar induction using a linguistically-motivated grammar formalism. This algorithm, called the Missing Link algorithm (ML), is built off of classic chart parsing methods, but makes use of a probabilistic confidence measure to keep track of potentially ambiguous lexical items. Because ML uses a structured grammar formalism, each step of the algorithm can be easily understood by linguists, making it ideal for studying the learnability of different linguistic phenomena. The algorithm requires minimal annotation in its training data, but is capable of learning nuanced data from relatively small training sets and can be applied to a variety of grammar formalisms. Though evaluating an unsupervised syntactic model is difficult, I present an evaluation using the Corpus of Linguistic Acceptability and show state-of-the-art performance.

Keywords: unsupervised learning, formal grammars, probabilistic models, combinatory categorial grammar

How to Cite:

Collard, J., (2020) “Unsupervised Formal Grammar Induction with Confidence”, Society for Computation in Linguistics 3(1), 180-188. doi: https://doi.org/10.7275/5qfp-sg41

Downloads:
Download PDF

364 Views

67 Downloads

Published on
2020-01-01

License

Creative Commons Attribution 4.0

Authors

Jacob Collard (Cornell University)

Publication details

Pages: 180-188
Submitted on: 2019-10-14

File Checksums (MD5)

PDF: 5e73d70ffd5ba57d923b32a0a38f4c51

Unsupervised Formal Grammar Induction with Confidence

Abstract

Harvard-Style Citation

Vancouver-Style Citation

APA-Style Citation

Non Specialist Summary