RE: (base as a combing char)

From: Addison Phillips [wM] (aphillips@webmethods.com)
Date: Sat Nov 27 2004 - 13:41:31 CST

  • Next message: Philippe Verdy: "Re: No Invisible Character - NBSP at the start of a word"

    Dear Flarn,

    I'm not sure exactly what you're trying to do: your question is a bit sketchy.

    Most Unicode characters are "base" characters, in that they do not combine with the character preceding them in the character stream based on their character properties. Some characters are combining marks. Combining marks generally are not used (and thus not rendered) by themselves. They exist to modify base characters.

    But combining marks are not the only way that a sequence of characters can form a "grapheme" (a visual unit of text). Ligatures, for example, are a sequence of base characters that form a grapheme. Some languages treat a sequence of base characters as a single letter. For example, Dutch sometimes treats the sequence "ij" as a single letter (it turns out that there are characters for the letter 'ij' in Unicode too, but they are for compatibility with an ancient non-Unicode character set). Software must be modified or tailored to provide behavior consistent with the specific language and context.

    For example, when you see the "fi" ligature in English, you naturally expect that the two letters "f" and "i" should be treated as separate letters--you should be able to place the cursor between them and delete one of them, for example. In another language you might find a letter that consists of two base characters where you should NOT be able to do that. Similarly, when one has the letter 'u' followed by a combining dieresis mark (U+0308), one expects the pair of logical characters to behave as a single character--ü (which is also encoded as a precomposed character U+00FC).

    So: what application do you have for two base characters treated as a combining mark? Then list members might be able to comments with precision on your request.

    Best Regards,

    Addison

    Addison P. Phillips
    Director, Globalization Architecture
    http://www.webMethods.com

    Chair, W3C Internationalization Working Group
    http://www.w3.org/International

    Internationalization is an architecture.
    It is not a feature.

    > -----Original Message-----
    > From: unicode-bounce@unicode.org
    > [mailto:unicode-bounce@unicode.org]On Behalf Of Flarn
    > Sent: 2004年11月26日 8:49
    > To: unicode@unicode.org
    > Subject:
    >
    >
    > I know that there are some combining characters, and a lot of base
    > characters. But, is there any way to use a base character as a
    > combining character? Please help me!
    >
    > - Michael Norton (a.k.a. Flarn)
    > E-mail address: flarn2003@megapipe.net
    >



    This archive was generated by hypermail 2.1.5 : Sat Nov 27 2004 - 13:44:37 CST