Multimedia Corpus of Spoken Bulgarian

The Multimedia Corpus of Spoken Bulgarian is a collection of language resources consisting of a set of digital audio and video data and a corresponding set of transcripts (text files).

The Multimedia Corpus of Spoken Bulgarian contains recordings of authentic dialogues transcribed in a modified orthography that takes into account some of the features of spontaneous speech. Non-verbal elements (pauses, noise, laughter, etc.) are also marked in the transcripts, as well as information about the speakers’ mimics and gesturing. The transcripts and the corresponding audio/video files have been synchronised using the EXMARaLDA system tools. The transcripts provide clickable random access from text to audio - click an asterisk in the grey field over the transcript to go to the corresponding part of the audio. The first four digits of the file numbers indicate the year of transcription.

The corpus also includes two texts (2013101 and 2013102) with synchronised audio or video and clickable random access from text to audio/video. These pages use recently developed technologies based on HTML5, which is still (January 2013) a standard under development. For this reason, they may not work as intended in all browsers and especially not in older versions of browsers. The most recent versions of Internet Explorer, Google Chrome and Firefox should work as intended.
2013001
2013002
2013003
2013004
2013005
2013006
2013007
2013008
2013009
2013010
2013011
2013012
2013013
2013014
2013015
2013016
2013017
2013018
2013019
2013020
2013021
2013022
2013023
2013024
2013025
2013026
2013027
2013028
2013029
2013030
2013031
2013032
2013033
2013034
2013035
2013101
2013102
2014001
2014002
2014003
2014004
2014005
2014006
2014007
2014008
2014009
2014010
2014011
2014012
2014013
2014014
2014015
2014016
2014017
2014018
2014019

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License Creative Commons License. Contact: bgspeech [at] gmail.com