The Role of Linguistic Features in Domain Adaptation: TAG Parsing of Questions
- Aarohi Srivastava (Yale University)
- Robert Frank (Yale University)
- Sarah Widder (Yale University)
- David Chartash (Yale University)
Abstract
The analysis of sentences outside the domain of the training data poses a challenge for contemporary syntactic parsing. The Penn Treebank corpus, commonly used for training constituency parsers, systematically undersamples certain syntactic structures. We examine parsing performance in Tree Adjoining Grammar (TAG) on one such structure: questions. To avoid hand-annotating a new training set including out-of-domain sentences, an expensive process, an alternate method requiring considerably less annotation effort is explored. Our method is based on three key ideas: First, pursuing the intuition that “supertagging is almost parsing” (Bangalore and Joshi, 1999), the parsing process is decomposed into two distinct stages, supertagging and stapling. Second, following Rimell and Clark (2008), the supertagger is trained with an extended dataset including questions, and the resultant supertags are used with an unmodified parser. Third, to maximize improvements gained from additional training of the supertagger, the parser is provided with linguistically-significant features that reflect commonalities across supertags. This novel combination of ideas leads to an improvement in question parsing accuracy of 13% LAS. This points to the conclusion that adaptation of a parser to a new domain can be achieved with limited data through the careful integration of linguistic knowledge.
Keywords: Tree Adjoining Grammar, Syntactic Parsing, Domain Adaptation
How to Cite:
Srivastava, A., Frank, R., Widder, S. & Chartash, D., (2020) “The Role of Linguistic Features in Domain Adaptation: TAG Parsing of Questions”, Society for Computation in Linguistics 3(1), 423-434. doi: https://doi.org/10.7275/7gvd-cq20
Downloads:
Download PDF