Emotional Prosody Speech and Transcripts
|Item Name:||Emotional Prosody Speech and Transcripts|
|Author(s):||Mark Liberman, Kelly Davis, Murray Grossman, Nii Martey, John Bell|
|LDC Catalog No.:||LDC2002S28|
|Release Date:||July 23, 2002|
|Sample Type:||2-channel pcm|
|Data Source(s):||microphone speech|
|Application(s):||speech recognition, prosody, pronunciation modeling|
LDC User Agreement for Non-Members
|Online Documentation:||LDC2002S28 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Liberman, Mark, et al. Emotional Prosody Speech and Transcripts LDC2002S28. Web Download. Philadelphia: Linguistic Data Consortium, 2002.|
Emotional Prosody Speech and Transcripts was developed by the Linguistic Data Consortium and contains audio recordings and corresponding transcripts, collected over an eight month period in 2000-2001 and designed to support research in emotional prosody. The recordings consist of professional actors reading a series of semantically neutral utterances (dates and numbers) spanning fourteen distinct emotional categories, selected after Banse & Scherers study of vocal emotional expression in German. (Banse, R. & Scherer, K. R. 1996. Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology, 70, 614-636.)
Actor participants were provided with descriptions of each emotional context, including situational examples adapted from those used in the original German study. Flashcards were used to display series of four-syllable dates and numbers to be uttered in the approriate emotional category.
The Prosody Recordings Project was interested in capturing the aspects of speech (emotion, intonation) that are left out of the written form of a message. In these experiments, simple phrases are expressed in ways that reflect varied contexts. The same phrase might be used to answer different questions, address listeners at different distances from the speaker, or express different emotional states. Actors were used because they are experts at producing this kind of contextual variation in a natural and convincing way.
There are 30 data files: 15 recordings in sphere format and their transcripts.
The sphere files are encoded in two-channel interleaved 16-bit PCM, high-byte-first (big-endian) format, for a total of 2,912,067,980 bytes (2777 Mbytes) or nine hours of sphere data.
The utterences were recorded directly into WAVES+ datafiles, on two channels with a sampling rate of 22.05K. The two microphones used were a stand-mounted boom Shure SN94 and a headset Seinnheiser HMD 410.
The original session recordings are provided in their entirety, including informal chit-chat and discussion between each emotion category elicitation task. Time alignment is limited to utterances within the formal elicitation tasks and miscellanous regions have been marked as such.
There are no updates at this time.