Parsing Early Modern English for Linguistic Search

Seth Kulick; Neville Ryant; Beatrice Santorini

doi:10.7275/twww-ef90

Options

Paper

Parsing Early Modern English for Linguistic Search

Authors

Seth Kulick (University of Pennsylvania)
Neville Ryant (University of Pennsylvania)
Beatrice Santorini (University of Pennsylvania)

Abstract

This work addresses the question of whether the output of a state-of-the-art parser is accurate enough to support research in theoretical linguistics. In order to build reliable models of syntactic change, we aim to eventually parse the 1.5-billion-word Early English Books Online (EEBO) corpus. But since EEBO is not yet parsed, we begin by constructing and testing a parser on the 1.7-million-word Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME). In order to obtain robust results, we define an 8-fold split on PPCEME. We then evaluate the parser with evalb and, more relevantly for us, with a task-specific metric - namely, its accuracy in parsing 6 sentence types necessary to track the rise of auxiliary do (as in They did not come vs. its historical precursor They came not). Retrieving the relevant sentences from the gold and test versions with CorpusSearch queries, we find that the parser\'s accuracy promises to be sufficient for our purposes. A remaining concern is the variability of the output, which we plan to address with three pieces of future work sketched in the conclusion.

Keywords: parsing, syntax, historical linguistics

How to Cite:

Kulick, S., Ryant, N. & Santorini, B., (2022) “Parsing Early Modern English for Linguistic Search”, Society for Computation in Linguistics 5(1), 143-157. doi: https://doi.org/10.7275/twww-ef90

Downloads:
Download PDF

483 Views

165 Downloads

Published on
2022-02-01

License

Creative Commons Attribution 4.0

Authors

Seth Kulick (University of Pennsylvania)
Neville Ryant (University of Pennsylvania)
Beatrice Santorini (University of Pennsylvania)

Publication details

Pages: 143-157
Submitted on: 2022-01-10

File Checksums (MD5)

PDF: d84364caa7f4d0a49894bb9f71240377

Parsing Early Modern English for Linguistic Search

Abstract

Harvard-Style Citation

Vancouver-Style Citation

APA-Style Citation

Non Specialist Summary