Re: Handling UTF-8

From: Gaute B Strokkenes (gs234@cam.ac.uk)
Date: Thu Mar 01 2001 - 12:51:40 EST


On Thu, 1 Mar 2001, trond.trosterud@hum.uit.no wrote:

> Apropos UTF-8:
>
> While waiting for software (Mac or Unix) that makes me able to
> handle UTF-8 (input, sort, wc, such things), I try to put up UTF-8
> web pages myself. I look at the algorithm of p. 47 in the Unicode
> Book, and convert any UxHHHH sequence to UTF-8 by breaking the hexes
> down to binaries.
>
> Now, I have a distinct feeling that there is a mathematical formula
> for doing this, e.g. on my hex calculator. To my irritation I cannot
> figure it out. Instead of trying to reinvent the wheel I go to the
> list. Can anyone help me: how do I calculate a UTF-8 value from a
> UCS value (except by the bit-counting paper and pencil way)?
>
> The fall-back would be a table for UCS>UTF-8, starting on Ux0080.

You're putting yourself through a lot of unnecessary pain. I'd
suggest that you look into the posix program iconv (if your Unix
supports it) , otherwise you might want to have a look at GNU recode.

-- 
Big Gaute                               http://www.srcf.ucam.org/~gs234/
MY income is ALL disposable!



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:19 EDT