Re: Library of Congress diacriticized c

From: Smith,Gary (smithg@oclc.org)
Date: Mon Jul 28 1997 - 15:13:53 EDT


Library of Congress data uses the USMARC character set, which contains only
a very few additions to the set of standard Latin letters. Characters such
as OPEN E would most likely be transliterated or represented by some
specialized transcription like "[open E]". Since most library systems do
not validate combinations of characters (and since authors and printers
rarely feel constrained to use only "valid" characters), it's not terribly
unusual to encounter unexpected combinations.

Gary Smith
 ----------
From: unicode
To: Multiple Recipients of
Subject: Re: Library of Congress diacriticized c
Date: Monday, July 28, 1997 14:46

I was able to look at the data (delete www. from the server address).

Interestingly, only standard latin letters occured as base letters, but not
additional latin letters. What has happened to combinationation like
OPEN E WITH TILDE AND ACUTE (several african languages)? Do they never show
up in the LOC data or were they filtered out by preprocessing?

Some combination also look bogus to me (I with candrabindhu = I (dotted)
with tilde/circumflex probably).

 --J"org Knappen



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:36 EDT