Unsupervised Formal Grammar Induction with Confidence
- Jacob Collard (Cornell University)
Abstract
I present a novel algorithm for minimally supervised formal grammar induction using a linguistically-motivated grammar formalism. This algorithm, called the Missing Link algorithm (ML), is built off of classic chart parsing methods, but makes use of a probabilistic confidence measure to keep track of potentially ambiguous lexical items. Because ML uses a structured grammar formalism, each step of the algorithm can be easily understood by linguists, making it ideal for studying the learnability of different linguistic phenomena. The algorithm requires minimal annotation in its training data, but is capable of learning nuanced data from relatively small training sets and can be applied to a variety of grammar formalisms. Though evaluating an unsupervised syntactic model is difficult, I present an evaluation using the Corpus of Linguistic Acceptability and show state-of-the-art performance.
Keywords: unsupervised learning, formal grammars, probabilistic models, combinatory categorial grammar
How to Cite:
Collard, J., (2020) “Unsupervised Formal Grammar Induction with Confidence”, Society for Computation in Linguistics 3(1), 180-188. doi: https://doi.org/10.7275/5qfp-sg41
Downloads:
Download PDF