• Skip to main content
  • Skip to footer

Dictionary of Old English

The vocabulary of the first six centuries (C.E. 600-1150) of the English language,

  • Access the DOE
    • Dictionary of Old English: A to I
    • Dictionary of Old English Electronic Corpus
  • About the DOE
  • Publications
  • Research Tools
  • Support the DOE
  • Word of the Week
  • News
  • Purchase
  • Contact Us
  • Épinal-Erfurt Project
You are here: Home / publications / oec / The Dictionary of Old English Corpus in Electronic Form 2009

The Dictionary of Old English Corpus in Electronic Form 2009

posted on June 15, 2017

2009 Release

November 2009

The 2009 Release of the Dictionary of Old English Corpus has been produced in part with the support of the Social Sciences and Humanities Research Council of Canada and the National Endowment for the Humanities, an independent federal agency.

The Dictionary of Old English electronic corpus is a complete record of surviving Old English except for some variant manuscripts of individual texts. The Dictionary of Old English is happy to provide copies of the corpus at cost to scholars interested in working with it. The individual scholar must take responsibility for clearing copyright with the editors and publishers of the editions used in his/her own citations of the material. We ask that you not copy and/or (re)distribute the corpus without the written consent of the Dictionary of Old English. This material may not be made available on the Internet, but can be put on webservers that are accessible only to the users within the institution.

A catalogue of the texts included in our corpus is available in Angus Cameron’s “A List of Old English Texts” in A Plan for the Dictionary of Old English, ed. R. Frank and A. Cameron (Toronto: University of Toronto Press, 1973), pp. 25-306. Full bibliographic information (taken from the updated Healey-Venezky List of Texts) is included with each text.

There are 3060 texts in the corpus presented in two formats: eXtensible Markup Language (XML) – TEI-P5 conformant and HyperText Markup Language (HTML).

The CDROM has the following files and subdirectories:

corpus.htmlThis is the starting file for viewing the corpus with a web browser.
about.htmlThis is the starting file for viewing the corpus with a web browser.
changes.htmlThis is the starting file for viewing the corpus with a web browser.
html/This directory contains the corpus in HTML format.
html/corpus.cssStylesheet for rendering the display of the HTML Corpus texts.
html/st2cn.htmlThis is the conversion table from Short Title to Cameron Number.
html/cn2st.htmlThis is the conversion table from Cameron Number to Short Title.
html/wordcount.htmlThis is the table for word counts broken down by text.
images/This directory contains images of special characters used by the corpus in HTML format.
xml-corpus/This directory contains the full corpus in XML format.
xml-corpus/corpus.xmlThis is the main control file for the corpus in XML format.
xml-corpus/corpus.entThis is the file which contains the declaration of the entities used in the corpus.
xml-corpus/textlist.xmlThis is the file which contains the declaration of texts in the corpus.

HTML Corpus Description

The HTML corpus is accessible through the file corpus.html which provides an index to the texts by Short Title.

Each text is contained in a separate HTML file. The Short Title and Short Short Title as well as Cameron number is in the header. Full bibliographic material, encoding, and other miscellaneous information is also found in the header. Each citation starts with a citation number and text reference identifier.

The format of the Text Reference Identifier varies from text to text, and the user must consult the header of the text to determine which system is being used.

Latin or Greek included in the Old English texts and Latin glossed by Old English are rendered in italics. Words which are fragmentary in manuscript or emended by the editor of the text are enclosed by ‘< >’. This may also indicate that there is a problem with the manuscript in the space adjacent to the word. Editorial punctuation has usually been adopted; for most texts it follows modern norms. Text that is originally in runic script is enclosed in double slashes ‘//’.

The special characters have been optimized to be viewed in 11 point characters with Firefox Navigator.

XML Corpus Description

The XML corpus is approximately 62 Megabytes in size and conforms to the TEI-P5 Guidelines in C.M. Sperberg-McQueen and Lou Burnard, eds., TEI P5: Guidelines for Electronic Text Encoding and Interchange (TEI Consortium 2008).

The texts are ordered by their Cameron numbers, alphanumeric codes that consist of one letter followed by a number or numbers, variously identifying the specific texts.

Text Letter PrefixText Type
APoetry
BProse
CInterlinear Glosses
DGlossaries
ERunic Inscriptions
FInscriptions in the Latin Alphabet

The file textlist.xml provides a list of the texts with the Cameron numbers and their corresponding Short Titles.

The structure is described in detail in the header file (corpus.xml). Each text is enclosed within a <TEI> structure. The value for the id attribute is the letter T followed by the Text Number (e.g. xml:id=”T00010″). The first substructure is the <teiHeader>, which includes title, full bibliographic material, encoding, and other miscellaneous information. The <body> structure follows, which contains the <text>. Each citation is an <s> structure within the <text> structure, and the citation number and text reference identifier are contained as attributes.

The format of the Text Reference Identifier (n attribute) varies from text to text, and the user must consult the <encodingDesc> structure of the text to determine which system is being used.

Latin or Greek included in the Old English texts and Latin glossed by Old English are enclosed within <foreign> tags. Words which are fragmentary in manuscript or emended by the editor of the text are enclosed by <corr> tags. This tag may also indicate that there is a problem with the manuscript in the space adjacent to the word. Editorial punctuation has usually been adopted; for most texts it follows modern norms. Runic text is enclosed within <hi rend=”rune”> tags.

The character entities use the standard definition for XML Character Entities. Character entities which have no match in the standard can be found in xml-corpus/corpus.ent. The character entities used in the corpus texts are as follows:

XML EntityCharacter
&AElig;Æ
&aelig;æ
&oelig;œ
&ETH;Ð
&eth;ð
&THORN;Þ
&thorn;þ
&Aring;Å
&Eacute;É
&eacute;é
&Eogon;Ę
&eogon;ę
&agrave;à
&egrave;è
&auml;ä
&ouml;ö
&uuml;ü
&oslash;ø
&bstrok;lower-case b with a stroke
&dstrok;lower-case crossed d or dyet
&lstrok;lower-case l with a stroke
&tbar;Old English letter thorn
&ccedil;ç
&amacr;ā
&emacr;ē
&imacr;ī
&omacr;ō
&ymacr;lower-case y with a macron
&cmacr;lower-case c with a macron
&gmacr;lower-case g with a macron
&hmacr;lower-case h with a macron
&mmacr;
&Nmacron;upper-case N with a macron
&nmacr;lower-case n with a macron
&pmacr;lower-case p with a macron
&qmacr;lower-case q with a macron
&rmacr;lower-case r with a macron
&tmacr;lower-case t with a macron
&vmacr;lower-case v with a macron
&Agr;Α
&Egr;Η
&Lambda;Λ
&Ngr;Ν
&Omega;Ω
&omega;ω
&Ogr;Ο
&Rgr;Ρ
&Tgr;Τ
&amp;&
&lt;<
&gt;>
&equiv;≡
&note;sign drawn like a miniature numeral 7
&criphia;criphia sign, drawn as an incomplete circle with a dot in the centre
&crisimon;monogram for Christ with the Greek letters chi and rho
&lemniscus;division sign
Character entities used in the corpus texts

Distribution

The Corpus is available on CD-ROM in HTML and XML formats.

We provide copies of the corpus at cost for academic use: $200 US including shipping. (Prices subject to change without notice.) It can be ordered from the DOE online store.

The Dictionary of Old English Web Corpus provides search functions for the Corpus, available online through subscription at the DOE online store.

The Old English Corpus was originally prepared for internal use at the Dictionary of Old English. We hope that other scholars will be able to use the material and we would be grateful to receive comments about errors or problems you discover.

Contact Information

Dictionary of Old English
Room 14285, Robarts Library
130 St. George St.
University of Toronto
Toronto, Ontario
M5S 3H1
CANADA
Phone: +1 416 978 8883
Fax: +1 416 978 8835

Email the DOE (please start your subject with: Corpus Information)
DOE Website

Filed Under: oec

Footer CTA

An animated clip highlighting Old English and the DOE

https://youtu.be/fVc84pC9OEE

“Why O Lord were you ever willing that fate should turn in such a way?”

https://youtu.be/TJztLQg07lY
  • Access the DOE
  • About the DOE
  • Publications
  • Research Tools
  • Support the DOE
  • Word of the Week
  • News
  • Purchase
  • Contact Us
  • Épinal-Erfurt Project

Copyright © 2023 The Dictionary of Old English, University of Toronto. All Rights Reserved.