Re: Getting A Newb Started

From: Kenneth Whistler (
Date: Mon Jul 07 2008 - 19:52:59 CDT

  • Next message: William J Poser: "Re: Getting A Newb Started"

    > >This isn't as much of an advantage as it sounds, since in most Unicode
    > >processes you need to be prepared to deal with multiple characters at
    > >once anyway.
    > I don't get the point. Whether you're dealing with one character or
    > many, life is simpler if they're all the same size.

    I think the point that John was making is that if you are
    constructing APIs, whether public APIs or internal ones, most
    of the time you are better off defining them as a string interface
    rather than a character interface.

    Even if you "think" you are just dealing with a "character" it
    is often the case that what is of interest may actually be
    a combining character sequence or a grapheme cluster or
    a collation contraction element or some other significant
    sequence of code points.

    And if you already have a UTF-16 string API in place, the API doesn't
    care if it is getting a single-code-unit BMP character or a
    two-code-unit SMP character.

    Of course the code underneath, if it is actually parsing code points
    from the string, needs to know the distinction and behave
    correctly. But it is often the case that complex code can
    be written much more cleanly if it is just passing string
    pointers (or objects) up and down the stacks, rather than
    prematurely parsing characters and trying to pass individual
    characters as parameters. For Unicode this is particularly
    important, because there are so many complex conditions where
    the behavior of a character depends on its string context.

    Yes, if you do *everything* in UTF-32, the same arguments
    for string APIs would apply without having to do surrogate
    detection at the point of parsing code point boundaries,
    but there are a number of good reasons why people choose
    to (or have to) process text in UTF-16, as well.


    This archive was generated by hypermail 2.1.5 : Mon Jul 07 2008 - 19:54:33 CDT