RE: glyph selection for Unicode in browsers

From: Murray Sargent (murrays@Exchange.Microsoft.com)
Date: Thu Sep 26 2002 - 20:24:08 EDT

  • Next message: Kenneth Whistler: "Re: Keys. (derives from Re: Sequences of combining characters.)"

    I don't think the idea is that codepage equals language. Rather codepage
    equals a writing system, which consists of one or more scripts (e.g., 6
    scripts for ShiftJIS). As such the codepage is a useful cue in choosing
    an appropriate font for rendering text. In the RichEdit edit engine, we
    use a codepage generalization called a CharRep and break Unicode plain
    text into runs of text each characterized by a particular CharRep. We
    then bind these runs to appropriate fonts for rendering. There are many
    additional considerations, so unfortunately this isn't an easy task. But
    with enough refinements it works quite well.

    The bottom line is that if text was generated using a particular
    codepage it's likely that the creator of that text intended the text to
    be rendered with a font that supports that codepage. For text tagged
    with no codepage, we do our best to translate the keyboard language to a
    CharRep and proceed as above. When neither the keyboard nor codepage
    info is available, we use a set of heuristics to break the text into
    CharRep runs. Among the many heuristics used are 1) a string containing
    Kana is likely to have a Japanese CharRep, and 2) a CJK string that
    round trips through CHT, CHS, or ShiftJIS may well belong to those
    CharReps. In particular if a CJK string doesn't round trip through CHT,
    it's probably not Traditional Chinese.

    Murray



    This archive was generated by hypermail 2.1.5 : Thu Sep 26 2002 - 21:01:26 EDT