Re: Level of Unicode support required for various languages

Date: Tue Oct 30 2007 - 04:46:48 CST

  • Next message: Ben Monroe: "Re: Level of Unicode support required for various languages"

    Quoting "John H. Jenkins" <>:

    > On Oct 29, 2007, at 6:28 PM, Andrew West wrote:
    >> On 29/10/2007, Peter Constable <> wrote:
    >>> I guess I assumed that that was never intended to provide a
    >>> substitute for encoding the characters needed for Zhuang text --
    >>> it would be a terrible way to represent Zhuang text, though I
    >>> suppose you can argue (as you have done) that it's valid.
    >> I'm sure that John has never suggested that IDS sequences should be a
    >> substitute for encoding, merely that given what the Unicode Standard
    >> currently says, it would be a feasible interim solution.
    > TUS is most emphatic on this point: An IDS is *not* the same thing as
    > encoding. It should be considered a better-than-nothing stop-gap until
    > something appropriate comes along (either an encoded character or a
    > registered variation sequence). I suppose that a text in say Zhuang
    > could use a custom font to hide the fact that most of it consists of
    > IDSs, but in such a case Unicode explicitly warns that no operation
    > other than display-related ones will likely work. Using an IDS in
    > running text is a hack.

    Certainly I was only thinking in terms of an interim solution. bearing
    inmmind that as things stand at present it is likely to take 10 years
    for characters to be encoded. The question then is what is the best
    interim solution.

    >> The question is just what exactly the intent of that paragraph in the
    >> Unicode Standard was. It sure sounds to me as if it is suggesting (and
    >> Unicode is sanctioning) a mechanism for component based represention
    >> of unencoded ideographs -- if the character was already encoded why
    >> would you want the rendering system to render an IDS as a single glyph
    >> and treat it as a single unit for editing purposes?
    > The intent is to allow systems to represent IDSs using single glyphs,
    > if they can and choose to do so, either through on-the-fly composition
    > (which will almost certainly be pretty ugly) or through the ligature
    > mechanisms available in smart fonts. The latter is more likely. In
    > this case someone with a need to represent a particular unencoded
    > character (or a set of such) could use a custom font to, at least, make
    > their text look decent.

    The intent would seem to allow for the representation through smart fonts.

    >> I guess it must have been written at a time when people didn't worry
    >> so much about security and spoofing issues. I would suggest that the
    >> UTC should consider removing the offending paragraph at the earliest
    >> opportunity, and replace it with a statement that IDS sequences are
    >> intended to be rendered as a visible sequence of IDC characters and
    >> ideographic components, and not composed into a single glyph. But then
    >> maybe it is too late for that now ?
    >> Andrew
    > No, go ahead and file a defect report. I doubt this would change
    > because I don't think the problem of spoofing is really serious for
    > IDSs. For one thing, for spoofing to work, you'd have to have a system
    > which can create decent-looking glyphs on-the-fly from IDSs, and
    > they're just too coarse to make that likely. For another, given the
    > low-utility of IDSs you simply have to state that a string containing
    > them isn't valid for whatever purpose. But I'm not going to
    > second-guess the UTC and they may think it's a serious enough problem
    > to take action.

    As point out elsewhere in this thread there is a big difference
    between can and encouraged. Any implementation is like to use some
    sort of look up table and precomposed glyphs as its main mechanism,
    that can as easily eliminate possibilities as display them.

    IMHO any dedicated CJK spoofer already has enough material in existing
    enocded precomposed characters.

    Yours sincerly
    John Knightley

    > =====
    > John H. Jenkins

    This message sent through Virus Free Email

    This archive was generated by hypermail 2.1.5 : Tue Oct 30 2007 - 04:49:10 CST