RE: glyph selection for Unicode in browsers

From: Murray Sargent (murrays@Exchange.Microsoft.com)
Date: Thu Sep 26 2002 - 20:24:08 EDT

Next message: Kenneth Whistler: "Re: Keys. (derives from Re: Sequences of combining characters.)"

Previous message: Kenneth Whistler: "Re: Sequences of combining characters (from Romanization of Cyrillic andByzantine legal codes)"
Maybe in reply to: Tex Texin: "glyph selection for Unicode in browsers"
Next in thread: jameskass@att.net: "Re: glyph selection for Unicode in browsers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I don't think the idea is that codepage equals language. Rather codepage
equals a writing system, which consists of one or more scripts (e.g., 6
scripts for ShiftJIS). As such the codepage is a useful cue in choosing
an appropriate font for rendering text. In the RichEdit edit engine, we
use a codepage generalization called a CharRep and break Unicode plain
text into runs of text each characterized by a particular CharRep. We
then bind these runs to appropriate fonts for rendering. There are many
additional considerations, so unfortunately this isn't an easy task. But
with enough refinements it works quite well.

The bottom line is that if text was generated using a particular
codepage it's likely that the creator of that text intended the text to
be rendered with a font that supports that codepage. For text tagged
with no codepage, we do our best to translate the keyboard language to a
CharRep and proceed as above. When neither the keyboard nor codepage
info is available, we use a set of heuristics to break the text into
CharRep runs. Among the many heuristics used are 1) a string containing
Kana is likely to have a Japanese CharRep, and 2) a CJK string that
round trips through CHT, CHS, or ShiftJIS may well belong to those
CharReps. In particular if a CJK string doesn't round trip through CHT,
it's probably not Traditional Chinese.

Murray

Next message: Kenneth Whistler: "Re: Keys. (derives from Re: Sequences of combining characters.)"
Previous message: Kenneth Whistler: "Re: Sequences of combining characters (from Romanization of Cyrillic andByzantine legal codes)"
Maybe in reply to: Tex Texin: "glyph selection for Unicode in browsers"
Next in thread: jameskass@att.net: "Re: glyph selection for Unicode in browsers"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Sep 26 2002 - 21:01:26 EDT