Re: Getting A Newb Started

From: William J Poser (wjposer@ldc.upenn.edu)
Date: Mon Jul 07 2008 - 20:17:00 CDT

Next message: Doug Ewell: "Re: Getting A Newb Started"

Previous message: William J Poser: "Re: Getting A Newb Started"
In reply to: Kenneth Whistler: "Re: Getting A Newb Started"
Next in thread: J: "Re: Getting A Newb Started"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

>Yes, if you do *everything* in UTF-32, the same arguments
>for string APIs would apply without having to do surrogate
>detection at the point of parsing code point boundaries,
>but there are a number of good reasons why people choose
>to (or have to) process text in UTF-16, as well.

For most purposes I do do everything in UTF-32. I read UTF-8,
convert it to UTF-32, work on the UTF-32, and convert it to
UTF-8 again on output. In a UTF-16 world that may not be the
best approach, but in my overwhelmingly Unix world, the input
I see is ASCII, UTF-8, or some parochial encoding. I don't think
that I have ever encountered UTF-16 in the wild, though I have
created it for testing purposes. Your mileage may vary.

(The weirdest parochial encoding that I have encountered was
one used by an Indian word processor whose native encoding I
reverse-engineered. It was a stateful encoding in which the same
codepoint could represent different characters depending on whether
it was expecting a consonant or a vowel.)

Bill

Next message: Doug Ewell: "Re: Getting A Newb Started"
Previous message: William J Poser: "Re: Getting A Newb Started"
In reply to: Kenneth Whistler: "Re: Getting A Newb Started"
Next in thread: J: "Re: Getting A Newb Started"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jul 07 2008 - 20:18:56 CDT