| Lecture plan |
English language corpora
What is a corpus?
In principle, any collection of more than one text can be called a corpus, (corpus being Latin for "body", hence a corpus is any body of text). But the term "corpus" when used in the context of modern linguistics tends most frequently to have more specific connotations than this simple definition. The following list describes the four main characteristics of the modern corpus.
Features of corpora
The English-Norwegian Parallel Corpus (password required: contact Jarle Ebeling or Stig Johansson). http://www.hf.uio.no/iba/prosjekt/ The main part of the corpus has English texts with Norwegian translations and Norwegian texts with English translations. There are also extensions to other languages (Swedish, German, Portuguese, Dutch)
LCD-Online: http://www.ldc.upenn.edu/ldc/online/index.html Various corpora of spoken and written language. A password is required, but can be obtained easily, as the University has a subscription.
The COLT Corpus (Corpus of London Teenage Speech): http://kh.hd.uib.no/tactweb/colta.htm This part of the BNC can be searched without a password.
CobuildDirect demonstration form: http://titania.cobuild.collins.co.uk/form.html
Allows you to search in the Bank of English, but displays only a limited
amount of data.
Concordancing programs
MonoConc: a demo-version can be downloaded
from http://www.ruf.rice.edu/~barlow/mono.html#mono
Word Cruncher
TACT
Courses/tutorials on the Net
<233 A> ^if accommodation . was d/\ifficult#
<234 A> ^I could of course get back
to L\ondon# .
<235 A> the ^same [n] n\ight#
1 4:Heading
<441 B> ^I !th\/ink#
<442 B> *it`s ^not in L\ondon#*
<443 A> *what a ^cl/ever* !thing to
**!d\o#**
1 4:Heading
<715 (B> will 'students* j\ustify#
<717 B> ^staying in L\ondon#
<718 B> ^spending m\oney you _see#
1 5:Heading
<267 B> of^f/icially#
<268 B> in ^L/ondon#
<269 A> *^n\o#*
1 5:Heading
<271 B> it`s ^d\ifferent {in a ^sm\all
_place#}# *-*
<272 B> - - but in ^London
<273 A> **^[\m]#**
1 6:Heading
<91 B> now ^where w\as it# - - >
Sample from the tagged LOB Corpus (search term 'London')
57 ^ old_JJ boys_NNS ._.
58 ^ one_CD1 of_IN London's_NP$ odder_JJR
reunions_NNS took_VBD place_NN last_AP night_NN ._.
A:Press:reportage A09:133
who_WPR started_VBD his_PP$ career_NN
*'_*' picking_VBG up_RP pins_NNS in_IN a_AT Paris_NP salon_NN **'_**' and_CC
is_BEZ now_RN London's_NP$ leading_JJ couturier_NN ,_, has_HVZ been_BEN
chosen_VBN by_IN Katharine_NP Worsley_NP
E:Skills,hobbies E28:76
proposed_VBN might_MD provide_VB the_ATI
solution_NN ._.
76 ^ London's_NP$ pure_JJ water_NN ._.
77 ^ bacterial_JJ analysis_NN has_HVZ
shown_VBN that_CS during_IN the_ATI
F:Popular lore F18:140
140 ^ then_RN on_IN the_ATI night_NN of_IN
March_NR 31_CD ,_, 1958_CD ,_, she_PP3A went_VBD to_IN a_AT Hallowe'en_NP
ball_NN at_IN London's_NP$
Dorchester_NP Hotel_NPL with_IN Billy_NP
Wallace_NP and_CC other_AP
F:Popular lore F44:85
to_TO talk_VB to_IN a_AT friend_NN in_IN
Tin_NP Pan_NP Alley_NPL *-_*-
London's_NP$ Denmark_NP Street_NPL ._.
86 ^ *'_*' what_WDT a_AT shame_NN about_IN
Russ_NP Conway_NP leaving_VBG
G:Belle lettres,biog G39:21
two_CD to_IN its_PP$ National_NP Gallery_NPL *-_*-
nothing_PN in_IN
comparison_NN with_IN what_WDT he_PP3A did_DOD for_IN
London's_NP$ ._.
22 ^ the_ATI tall_JJ house_NN in_IN St\_NPT George's_NP$
Place_NPL ,_,