Re: Level of Unicode support required for various languages

From: John H. Jenkins (
Date: Mon Oct 29 2007 - 19:49:58 CST

  • Next message: Jeroen Ruigrok van der Werven: "Re: Afaka script"

    On Oct 29, 2007, at 6:28 PM, Andrew West wrote:

    > On 29/10/2007, Peter Constable <> wrote:
    >> I guess I assumed that that was never intended to provide a
    >> substitute for encoding the characters needed for Zhuang text -- it
    >> would be a terrible way to represent Zhuang text, though I suppose
    >> you can argue (as you have done) that it's valid.
    > I'm sure that John has never suggested that IDS sequences should be a
    > substitute for encoding, merely that given what the Unicode Standard
    > currently says, it would be a feasible interim solution.

    TUS is most emphatic on this point: An IDS is *not* the same thing as
    encoding. It should be considered a better-than-nothing stop-gap
    until something appropriate comes along (either an encoded character
    or a registered variation sequence). I suppose that a text in say
    Zhuang could use a custom font to hide the fact that most of it
    consists of IDSs, but in such a case Unicode explicitly warns that no
    operation other than display-related ones will likely work. Using an
    IDS in running text is a hack.

    > The question is just what exactly the intent of that paragraph in the
    > Unicode Standard was. It sure sounds to me as if it is suggesting (and
    > Unicode is sanctioning) a mechanism for component based represention
    > of unencoded ideographs -- if the character was already encoded why
    > would you want the rendering system to render an IDS as a single glyph
    > and treat it as a single unit for editing purposes?

    The intent is to allow systems to represent IDSs using single glyphs,
    if they can and choose to do so, either through on-the-fly composition
    (which will almost certainly be pretty ugly) or through the ligature
    mechanisms available in smart fonts. The latter is more likely. In
    this case someone with a need to represent a particular unencoded
    character (or a set of such) could use a custom font to, at least,
    make their text look decent.

    > I guess it must have been written at a time when people didn't worry
    > so much about security and spoofing issues. I would suggest that the
    > UTC should consider removing the offending paragraph at the earliest
    > opportunity, and replace it with a statement that IDS sequences are
    > intended to be rendered as a visible sequence of IDC characters and
    > ideographic components, and not composed into a single glyph. But then
    > maybe it is too late for that now ?
    > Andrew

    No, go ahead and file a defect report. I doubt this would change
    because I don't think the problem of spoofing is really serious for
    IDSs. For one thing, for spoofing to work, you'd have to have a
    system which can create decent-looking glyphs on-the-fly from IDSs,
    and they're just too coarse to make that likely. For another, given
    the low-utility of IDSs you simply have to state that a string
    containing them isn't valid for whatever purpose. But I'm not going
    to second-guess the UTC and they may think it's a serious enough
    problem to take action.

    John H. Jenkins

    This archive was generated by hypermail 2.1.5 : Mon Oct 29 2007 - 19:52:00 CST