BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech
|Item Name:||BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech|
|Author(s):||Martha Palmer, Jena D. Hwang, Aous Mansouri, Claire Bonial, Tim O'Gorman, James Gung|
|LDC Catalog No.:||LDC2021T18|
|Release Date:||November 15, 2021|
|Data Source(s):||discussion forum, telephone conversations, text chat conversations|
|Application(s):||entity extraction, part of speech tagging, question-answering, semantic role labelling|
LDC User Agreement for Non-Members
|Online Documentation:||LDC2021T18 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Palmer, Martha, et al. BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech LDC2021T18. Web Download. Philadelphia: Linguistic Data Consortium, 2021.|
BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and consists of propbank annotation on Egyptian Arabic discussion forum (DF), SMS/Chat and conversational telephone speech (CTS) data.
The DARPA BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference.
DF data was collected from the web using a manual process. SMS/Chat material was donated or collected via live platforms. CTS data was taken from LDC's Egyptian Arabic CALLHOME and CALLFRIEND telephone collections.
Propbank annotation provides a layer of semantic annotation over treebank. In this release, it was applied to BOLT phrase structure treebank annotation and was carried out in two phases: (1) a frame file for each predicate was created, and (2) the predicate argument structure was annotated using the frame file as a reference.
Annotation files are presented as UTF-8 encoded and are in either plain text or XML formats.
This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-11-C-0145. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.
None at this time.