Lexicon Help
Odessa maintains a lexicon, which is a list of all the unique terms that occur in the Odessa document collection. Due to conversion problems between alphabets, misspellings on the part of document authors, and extraction errors, there are usually many variant spellings of surnames. Common name variants include Naass-Naasz-Naas, Veil-Feil, and many others. However, there are misspellings such as Catha (a misspelling of Käthe = Katherine) that are much harder to deal with.

The Odessa Lexicon Browser permits you to scan the lexicon for variant spellings. Simply enter a prefix of any term you are interested in. Keep in mind that the lexicon contains more than 300,000 terms, so be as specific as you can when you browse. You are required to enter at least two characters into the browser.

The browser will tell you for each term, how many files in the document collection contain that term, as well as the total number of occurrences of that term in the collection. Each term is hyperlinked, so you can view files that contain that term. The intent is not to use the browser for searching, but rather to be able to locate rogue files that contain odd spelling variants of low frequency.


Back to Odessa