Script Knowledge Constrains Ellipses in Fragments - Evidence from Production Data and Language Modeling

Robin Lemke; Lisa Schäfer; Heiner Drenhaus; Ingo Reich

doi:10.7275/mpby-zr74

Options

Extended Abstract

Script Knowledge Constrains Ellipses in Fragments - Evidence from Production Data and Language Modeling

Authors

Robin Lemke (Saarland University)
Lisa Schäfer (Saarland University)
Heiner Drenhaus (Saarland University)
Ingo Reich (Saarland University)

Abstract

We investigate the effect of script-based (Schank and Abelson 1977) extralinguistic context on the omission of words in fragments. Our data elicited with a production task show that predictable words are more often omitted than unpredictable ones, as predicted by the Uniform Information Density (UID) hypothesis (Levy & Jaeger 2007). We take into account effects of linguistic and extralinguistic context on predictability and propose a method for estimating the surprisal of words in presence of ellipsis. Our study extends previous evidence for UID in two ways: First, we show that not only local linguistic context, but also extralinguistic context determines the likelihood of omissions. Second, we find UID effects on the omission of content words.

Keywords: information theory, fragments, ellipsis, script knowledge, corpus, language modeling

How to Cite:

Lemke, R., Schäfer, L., Drenhaus, H. & Reich, I., (2020) “Script Knowledge Constrains Ellipses in Fragments - Evidence from Production Data and Language Modeling”, Society for Computation in Linguistics 3(1), 441-444. doi: https://doi.org/10.7275/mpby-zr74

Downloads:
Download PDF

435 Views

167 Downloads

Published on
2020-01-01

License

Creative Commons Attribution 4.0

Authors

Robin Lemke (Saarland University)
Lisa Schäfer (Saarland University)
Heiner Drenhaus (Saarland University)
Ingo Reich (Saarland University)

Publication details

Pages: 441-444
Submitted on: 2019-10-15

File Checksums (MD5)

PDF: 099f8700cbc93e8e4bf4e4c2eef127f0

Script Knowledge Constrains Ellipses in Fragments - Evidence from Production Data and Language Modeling

Abstract

Harvard-Style Citation

Vancouver-Style Citation

APA-Style Citation

Non Specialist Summary