General Description
The Occitan Map Task dialogue corpus began in 2007 with an objective to collect the diverse interrogative intonational data in Occitan. The corpus is made up of 6 dialogues (aprox. 1 hour of recordings) orthographically transcribed. The dialogues have been collected from a number of different towns within the Occitan linguistic territory (dialectal map) and cover different regional dialects (for tehe moment, languedocian, limousine and gascon).
This project is part of a larger international research initiative that aims to collect dialogue data from different languages following the Map Task standard methodology (HCRC Map Task Corpus Project). Other languages that are extensively documented in the Map Task Corpus Project include English, Swedish, Dutch, Italian, Japanese, and Portuguese.
language | dialogues/hours of recordings | year | participants | age | sex/others |
---|---|---|---|---|---|
american english | 16 dialogues | 1999 | 8 | adults | women |
australian english | 19 hours | 204 | adults and young people | men and women | |
brittish english | 36 hours |
12-14 for each variety | 16 | men and women | |
dutch | 8 conversations | 1999 | 4 | students | |
italian | 44 dialogues, 350 minuts | young adults, children | |||
japanese | 128 dialogues,: 22 hours | 1994 | students | ||
portuguese | 64 dialogues | 32 / 8 groups of four | students | men and women | |
swedish | 50 minuts, 8000 words | 4 | 30-40 | 1 man, 3 women | |
catalan | 6 hours aprox. | 2005 | 30 | 20-45 | women; friends |
1 hour aprox. | 2007 | 30 | 20-90 | men and women; friends or unknown |
Map Task methodology
The Map Task is a collaborative task between subject pairs. It is designed to elicit the production of certain linguistic phenomena and at the same time permits control of context effects (sonorant segments, information structure, prosodic word structure). Each participant in the pair has a map of an imaginary place defined with drawings. One of the speakers has a map with a designated path from a beginning point to an end (map_giver) and has the role of giving instructions of how to arrive at the proper destination, the instruction-giver. In return, the other speaker has a different version of the same map but with certain differences with respect to the other version, also, this person does not have the path to the final destination laid out (map_follower). The point is to obtain the correct path from the other speaker through a series of questions that, with the help of the responses that they receive, they can reproduce the same path on their map.
The names of the places on the map vary with the intension to produce sonorant consonants and to have the rightmost word have penultimate stress. Unlike the maps that were used for the Catalan corpus, it has not been possible to use written place names, because of the problem caused by the poor socialization of written Occitan and the following illiteracy generalized within the native speakers' community. That's why drawing were used.
This method of elicitation permits: a) an ample collection of intonational contours in controlled conditions and, at the same time, with a natural production; b) a good interaction between participants; c) the appearance of dialectal variation that in a more controlled context would not surface.
From each location it is possible to hear the Map Task recording in audio format and see the corresponding orthographic transcription (transcription criteria).
Speakers
As often as possible, speakers were pairs of native speakers that already knew each other, therefore allowing a level of confidence between the two. At least one of the speakers had to be from the location in question and, in that case, was given the role of instruction-follower.
With the intention to keep the contexts as similar as possible across locations, the speakers were given the following instructions:
Recordings
The recordings vary in length according to each set of participants and were always conducted at one speaker's home in a quiet setting using an audio format. The speakers sat facing each other with an object between them impeding their view of the other participant’s map, but still being able to have visual contact. For the recordings, a Marantz professional recorder was used (model PMD660), and two microphones, AKG C 1000 and Rode NTG 2.
All of the data, in different formats, is available online for free through the web for academic and research use. It is possible to listen to the archive in mp3 format while following the orthographic transcription, or also, it is possible to download the archive in .wav format.
Transcription
The complete corpus is orthographically transcribed (transcription criteria).
Atlàs interactiu de l’intonacion de l’occitan © 2007-2010 Atlàs interactiu de l'intonacion de l'occitan |