Date: Fri Sep 27 2002 - 09:44:12 EDT
On 09/26/2002 07:24:08 PM "Murray Sargent" wrote:
>I don't think the idea is that codepage equals language. Rather codepage
>equals a writing system, which consists of one or more scripts (e.g., 6
>scripts for ShiftJIS). As such the codepage is a useful cue in choosing
>an appropriate font for rendering text.
(Murray and I talked about this some at dinner a couple of weeks ago, so
there's some history here.)
I don't think things are quite that simple. A codepage *can* be a useful
cue in choosing an appropriate font (or in choosing typographic preferences
by whatever means). This certainly may be the case in some instances, such
as Shift JIS. But it's not always the case. For instance, cp1251 doesn't
tell you what language is involved, and isn't sufficient to determine which
italic variants of certain Cyrillic characters are needed. Similarly,
cp1250 doesn't tell you what cultural preferences should apply in relation
to design and alignment of the ogonek diacritic (e.g. Polish and Lithuanian
differ in this regard), or other diacritics (e.g. caron should have a
distinct form for Czech); and cp1252 doesn't tell you about cultural
preferences regarding cedilla (three different forms can be used for
French, but only one is acceptable for Portuguese or Catalan).
That's why I maintain that a codepage is a character set, but not a writing
system. In general, a codepage does not determine a set of rules for
writing; it just provides a vocabularly with which to work.
>The bottom line is that if text was generated using a particular
>codepage it's likely that the creator of that text intended the text to
>be rendered with a font that supports that codepage.
Of course, fonts can support multiple codepages. Given e.g. Arial, Tahoma
and Verdana, they all support codepages 1250, 1251, 1252, 1253, 1254, 1257
and 1258. That doesn't tell you whether they're appropriate for Polish or
Lithuanian or Czech or whatever. Even the fact that they support cp1258
doesn't imply that they are appropriate for Vietnamese: e.g. the default
glyphs in Arial for U+1EA5 and U+1EA7 do not have the diacritics stacked in
the way needed for Vietnamese.
I'm not saying that codepage information isn't ever useful. Obviously, you
have found it very useful. But the usefulness has limits.
Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
This archive was generated by hypermail 2.1.5 : Fri Sep 27 2002 - 10:45:59 EDT