Lecture plan
My home page

English language corpora

What is a corpus?

In principle, any collection of more than one text can be called a corpus, (corpus being Latin for "body", hence a corpus is any body of text). But the term "corpus" when used in the context of modern linguistics tends most frequently to have more specific connotations than this simple definition. The following list describes the four main characteristics of the modern corpus.

(Tony McEnery & Andrew Wilson)

Features of corpora

Textbooks introducing corpus linguistics Some English language corpora Freiburg-LOB (FLOB)
Freiburg-Brown (Frown)
Kolhapur Corpus (India)
Australian Corpus of English (ACE)
Wellington Corpus (New Zealand)
The International Corpus of English - East African component
Lancaster/IBM Spoken English Corpus (SEC)
Wellington Spoken Corpus (New Zealand)
The International Corpus of English - East African component
The Helsinki Corpus of Older Scots
Corpus of Early English Correspondance,
The Newdigate Newsletters
Lampeter Corpus
Innsbruck Computer-Archive of Machine-Readable English Texts (ICAMET)  
Websites for corpus linguistics Online corpora

The English-Norwegian Parallel Corpus (password required: contact Jarle Ebeling or Stig Johansson). http://www.hf.uio.no/iba/prosjekt/ The main part of the corpus has English texts with Norwegian translations and Norwegian texts with English translations. There are also extensions to other languages (Swedish, German, Portuguese, Dutch)

LCD-Online: http://www.ldc.upenn.edu/ldc/online/index.html Various corpora of spoken and written language. A password is required, but can be obtained easily, as the University has a subscription.

The COLT Corpus (Corpus of London Teenage Speech): http://kh.hd.uib.no/tactweb/colta.htm This part of the BNC can be searched without a password.

CobuildDirect demonstration form: http://titania.cobuild.collins.co.uk/form.html Allows you to search in the Bank of English, but displays only a limited amount of data.
 

Concordancing programs

MonoConc: a demo-version can be downloaded from http://www.ruf.rice.edu/~barlow/mono.html#mono
Word Cruncher
TACT

Courses/tutorials on the Net

Sample query from the London-Lund Corpus (search term 'London')

<233 A> ^if accommodation . was d/\ifficult#
<234 A> ^I could of course get back to L\ondon# .
<235 A> the ^same [n] n\ight#

1 4:Heading
<441 B> ^I !th\/ink#
<442 B> *it`s ^not in L\ondon#*
<443 A> *what a ^cl/ever* !thing to **!d\o#**

1 4:Heading
<715 (B> will 'students* j\ustify#
<717 B> ^staying in L\ondon#
<718 B> ^spending m\oney you _see#

1 5:Heading
<267 B> of^f/icially#
<268 B> in ^L/ondon#
<269 A> *^n\o#*

1 5:Heading
<271 B> it`s ^d\ifferent {in a ^sm\all _place#}# *-*
<272 B> - - but in ^London
<273 A> **^[\m]#**

1 6:Heading
<91 B> now ^where w\as it# - - >


Transfer interrupted!

tica"><92 B> ^trouble 'is I :don`t !{kn\ow north _London} at !\all#

Sample from the tagged LOB Corpus (search term 'London')

57 ^ old_JJ boys_NNS ._.
58 ^ one_CD1 of_IN London's_NP$ odder_JJR reunions_NNS took_VBD place_NN last_AP night_NN ._.

A:Press:reportage A09:133
who_WPR started_VBD his_PP$ career_NN *'_*' picking_VBG up_RP pins_NNS in_IN a_AT Paris_NP salon_NN **'_**' and_CC is_BEZ now_RN London's_NP$ leading_JJ couturier_NN ,_, has_HVZ been_BEN chosen_VBN by_IN Katharine_NP Worsley_NP

E:Skills,hobbies E28:76
proposed_VBN might_MD provide_VB the_ATI solution_NN ._.
76 ^ London's_NP$ pure_JJ water_NN ._.
77 ^ bacterial_JJ analysis_NN has_HVZ shown_VBN that_CS during_IN the_ATI

F:Popular lore F18:140
140 ^ then_RN on_IN the_ATI night_NN of_IN March_NR 31_CD ,_, 1958_CD ,_, she_PP3A went_VBD to_IN a_AT Hallowe'en_NP ball_NN at_IN London's_NP$
Dorchester_NP Hotel_NPL with_IN Billy_NP Wallace_NP and_CC other_AP

F:Popular lore F44:85
to_TO talk_VB to_IN a_AT friend_NN in_IN Tin_NP Pan_NP Alley_NPL *-_*-
London's_NP$ Denmark_NP Street_NPL ._.
86 ^ *'_*' what_WDT a_AT shame_NN about_IN Russ_NP Conway_NP leaving_VBG

G:Belle lettres,biog G39:21
two_CD to_IN its_PP$ National_NP Gallery_NPL *-_*- nothing_PN in_IN
comparison_NN with_IN what_WDT he_PP3A did_DOD for_IN London's_NP$ ._.
22 ^ the_ATI tall_JJ house_NN in_IN St\_NPT George's_NP$ Place_NPL ,_,



Go to top