2009 Release
November 2009
The 2009 Release of the Dictionary of Old English Corpus has been produced in part with the support of the Social Sciences and Humanities Research Council of Canada and the National Endowment for the Humanities, an independent federal agency.
The Dictionary of Old English electronic corpus is a complete record of surviving Old English except for some variant manuscripts of individual texts. The Dictionary of Old English is happy to provide copies of the corpus at cost to scholars interested in working with it. The individual scholar must take responsibility for clearing copyright with the editors and publishers of the editions used in his/her own citations of the material. We ask that you not copy and/or (re)distribute the corpus without the written consent of the Dictionary of Old English. This material may not be made available on the Internet, but can be put on webservers that are accessible only to the users within the institution.
A catalogue of the texts included in our corpus is available in Angus Cameron’s “A List of Old English Texts” in A Plan for the Dictionary of Old English, ed. R. Frank and A. Cameron (Toronto: University of Toronto Press, 1973), pp. 25-306. Full bibliographic information (taken from the updated Healey-Venezky List of Texts) is included with each text.
There are 3060 texts in the corpus presented in two formats: eXtensible Markup Language (XML) – TEI-P5 conformant and HyperText Markup Language (HTML).
The CDROM has the following files and subdirectories:
corpus.html | This is the starting file for viewing the corpus with a web browser. |
about.html | This is the starting file for viewing the corpus with a web browser. |
changes.html | This is the starting file for viewing the corpus with a web browser. |
html/ | This directory contains the corpus in HTML format. |
html/corpus.css | Stylesheet for rendering the display of the HTML Corpus texts. |
html/st2cn.html | This is the conversion table from Short Title to Cameron Number. |
html/cn2st.html | This is the conversion table from Cameron Number to Short Title. |
html/wordcount.html | This is the table for word counts broken down by text. |
images/ | This directory contains images of special characters used by the corpus in HTML format. |
xml-corpus/ | This directory contains the full corpus in XML format. |
xml-corpus/corpus.xml | This is the main control file for the corpus in XML format. |
xml-corpus/corpus.ent | This is the file which contains the declaration of the entities used in the corpus. |
xml-corpus/textlist.xml | This is the file which contains the declaration of texts in the corpus. |
HTML Corpus Description
The HTML corpus is accessible through the file corpus.html which provides an index to the texts by Short Title.
Each text is contained in a separate HTML file. The Short Title and Short Short Title as well as Cameron number is in the header. Full bibliographic material, encoding, and other miscellaneous information is also found in the header. Each citation starts with a citation number and text reference identifier.
The format of the Text Reference Identifier varies from text to text, and the user must consult the header of the text to determine which system is being used.
Latin or Greek included in the Old English texts and Latin glossed by Old English are rendered in italics. Words which are fragmentary in manuscript or emended by the editor of the text are enclosed by ‘< >’. This may also indicate that there is a problem with the manuscript in the space adjacent to the word. Editorial punctuation has usually been adopted; for most texts it follows modern norms. Text that is originally in runic script is enclosed in double slashes ‘//’.
The special characters have been optimized to be viewed in 11 point characters with Firefox Navigator.
XML Corpus Description
The XML corpus is approximately 62 Megabytes in size and conforms to the TEI-P5 Guidelines in C.M. Sperberg-McQueen and Lou Burnard, eds., TEI P5: Guidelines for Electronic Text Encoding and Interchange (TEI Consortium 2008).
The texts are ordered by their Cameron numbers, alphanumeric codes that consist of one letter followed by a number or numbers, variously identifying the specific texts.
Text Letter Prefix | Text Type |
---|---|
A | Poetry |
B | Prose |
C | Interlinear Glosses |
D | Glossaries |
E | Runic Inscriptions |
F | Inscriptions in the Latin Alphabet |
The file textlist.xml provides a list of the texts with the Cameron numbers and their corresponding Short Titles.
The structure is described in detail in the header file (corpus.xml). Each text is enclosed within a <TEI> structure. The value for the id attribute is the letter T followed by the Text Number (e.g. xml:id=”T00010″). The first substructure is the <teiHeader>, which includes title, full bibliographic material, encoding, and other miscellaneous information. The <body> structure follows, which contains the <text>. Each citation is an <s> structure within the <text> structure, and the citation number and text reference identifier are contained as attributes.
The format of the Text Reference Identifier (n attribute) varies from text to text, and the user must consult the <encodingDesc> structure of the text to determine which system is being used.
Latin or Greek included in the Old English texts and Latin glossed by Old English are enclosed within <foreign> tags. Words which are fragmentary in manuscript or emended by the editor of the text are enclosed by <corr> tags. This tag may also indicate that there is a problem with the manuscript in the space adjacent to the word. Editorial punctuation has usually been adopted; for most texts it follows modern norms. Runic text is enclosed within <hi rend=”rune”> tags.
The character entities use the standard definition for XML Character Entities. Character entities which have no match in the standard can be found in xml-corpus/corpus.ent. The character entities used in the corpus texts are as follows:
XML Entity | Character |
---|---|
Æ | Æ |
æ | æ |
œ | œ |
Ð | Ð |
ð | ð |
Þ | Þ |
þ | þ |
Å | Å |
É | É |
é | é |
Ę | Ę |
ę | ę |
à | à |
è | è |
ä | ä |
ö | ö |
ü | ü |
ø | ø |
&bstrok; | ![]() |
đ | ![]() |
ł | ![]() |
&tbar; | ![]() |
ç | ç |
ā | ā |
ē | ē |
ī | ī |
ō | ō |
&ymacr; | ![]() |
&cmacr; | ![]() |
&gmacr; | ![]() |
&hmacr; | ![]() |
&mmacr; | ![]() |
&Nmacron; | ![]() |
&nmacr; | ![]() |
&pmacr; | ![]() |
&qmacr; | ![]() |
&rmacr; | ![]() |
&tmacr; | ![]() |
&vmacr; | ![]() |
&Agr; | Α |
&Egr; | Η |
Λ | Λ |
&Ngr; | Ν |
Ω | Ω |
ω | ω |
&Ogr; | Ο |
&Rgr; | Ρ |
&Tgr; | Τ |
& | & |
< | < |
> | > |
≡ | ≡ |
¬e; | ![]() |
&criphia; | ![]() |
&crisimon; | ![]() |
&lemniscus; | ![]() |
Distribution
The Corpus is available on CD-ROM in HTML and XML formats.
We provide copies of the corpus at cost for academic use: $200 US including shipping. (Prices subject to change without notice.) It can be ordered from the DOE online store.
The Dictionary of Old English Web Corpus provides search functions for the Corpus, available online through subscription at the DOE online store.
The Old English Corpus was originally prepared for internal use at the Dictionary of Old English. We hope that other scholars will be able to use the material and we would be grateful to receive comments about errors or problems you discover.
Contact Information
Dictionary of Old EnglishRoom 14285, Robarts Library
130 St. George St.
University of Toronto
Toronto, Ontario
M5S 3H1
CANADA
Fax: +1 416 978 8835
Email the DOE (please start your subject with: Corpus Information)
DOE Website