Computing Ellipsis Constructions: Comparing Classical NLP and LLM Approaches
Abstract
State-of-the-art (SOTA) Natural Language Processing (NLP) technology faces significant challenges with constructions that contain ellipses. Although theoretically well-documented and understood, there needs to be more sufficient cross-linguistic language resources to document, study, and ultimately engineer NLP solutions that can adequately provide analyses for ellipsis constructions. This article describes the typological data set on ellipsis that we created for currently seventeen languages. We demonstrate how SOTA parsers based on a variety of syntactic frameworks fail to parse sentences with ellipsis, and in fact, probabilistic, neural, and Large Language Models (LLM) do so, too. We demonstrate experiments that focus on detecting sentences with ellipsis, predicting the position of elided elements, and predicting elided surface forms in the appropriate positions. We show that cross-linguistic variation of ellipsis-related phenomena has different consequences for the architecture of NLP systems.
Keywords: ellipsis, LLMs, LFG, dependency parsing, constituent parser, lexical-functional grammar
How to Cite:
Cavar, D., Tiganj, Z., Mompelat, L. V. & Dickson, B., (2024) “Computing Ellipsis Constructions: Comparing Classical NLP and LLM Approaches”, Society for Computation in Linguistics 7(1), 217–226. doi: https://doi.org/10.7275/scil.2147
Downloads:
Download PDF