Universal Dependencies and Semantics for English and Hebrew Child-directed Speech
- Ida Szubert (University of Edinburgh)
- Omri Abend (Hebrew University of Jerusalem)
- Nathan Schneider (Georgetown University)
- Samuel Gibbon (University of Edinburgh)
- Sharon Goldwater (University of Edinburgh)
- Mark Steedman (University of Edinburgh)
Abstract
While corpora of child speech and child-directed speech (CDS) have enabled major contributions to the study of child language acquisition, semantic annotation for such corpora is still scarce and lacks a uniform standard. We compile two CDS corpora—in English and Hebrew—with syntactic and semantic annotations. We employ a methodology that enforces a cross-linguistically consistent representation, building on recent advances in dependency representation and semantic parsing. Our semi-automatic syntactic annotation follows the Universal Dependencies standard (UD; de Marneffe et al., 2021), adapted to suit the CDS genre. To induce semantic forms, we develop an automatic method for transducing UD structures into sentential logical forms (LFs). The two representations have complementary strengths: UD structures are language-neutral and support direct annotation, whereas LFs are neutral as to the syntax-semantics interface, and transparently encode semantic distinctions.
Keywords: language acquisition, child-directed speech, corpus annotation, syntax-semantics interface, Universal Dependencies
How to Cite:
Szubert, I., Abend, O., Schneider, N., Gibbon, S., Goldwater, S. & Steedman, M., (2022) “Universal Dependencies and Semantics for English and Hebrew Child-directed Speech”, Society for Computation in Linguistics 5(1), 235-240. doi: https://doi.org/10.7275/2fhp-bf70
Downloads:
Download PDF