Syntactic Category Learning as Iterative Prototype-Driven Clustering
- Jordan Kodner (University of Pennsylvania)
Abstract
We lay out a model for minimally supervised syntactic category acquisition which combines concepts from standard NLP part-of-speech tagging applications with cognitively motivated distributional statistics. The model assumes a small set of seed words (Haghighi and Klein, 2006), an approach with motivation in (Pinker, 1984)’s semantic bootstrapping hypothesis, and repeatedly constructs hierarchical agglomerative clusterings over a growing lexicon. Clustering is performed on the basis of word-adjacent syntactic frames alone (Mintz, 2003) with no reference to word-internal features, which has been shown to yield qualitatively coherent POS clusters (Redington et al., 1998). A prototype-driven labeling process based on tree-distance yields results comparable to unsupervised algorithms based on complex statistical optimization while maintaining its cognitive underpinnings.
Keywords: syntactic categories, child language acquisition, acquisition, part of speech, POS tagging, low-resource NLP, computational modeling, agglomerative clustering, prototype-driven, semi-supervised, unsupervised
How to Cite:
Kodner, J., (2018) “Syntactic Category Learning as Iterative Prototype-Driven Clustering”, Society for Computation in Linguistics 1(1), 44-54. doi: https://doi.org/10.7275/R5TQ5ZQ4
Downloads:
Download PDF