RE: Question...

From: Addison Phillips (AddisonP@simultrans.com)
Date: Thu May 06 1999 - 19:17:16 EDT


Hi Joon,

By "NULL represent a character", do you mean that character NULL 0x0000? Or
do you mean NULL has a half character (e.g. one byte of NULL such as 0x0032
or 0x3200)?

If the former, the NULL character gives compatibility for programs written
in C that use NULL as a string terminator.
If the latter, NULL represents half of the character and cannot (well,
should not) be eliminated without changing the encoding.

If you wish to send data that contains no nulls (other than string
terminators), consider converting your data to UTF-8 or UTF-7. The
particular encodings were created for file-system and mail-system safety in
particular and have the advantage of containing no NULLs. UTF-8 is commonly
used for compatibility purposes such as this.

The downside of using these encodings is that, while they "compress" English
UNICODE data, your Asian characters will average MORE THAN 2 bytes per
character.

The encoding function for UTF-7 and UTF-8 are out on the web and both are
small functions.
The definition and encoding for UCS-2 is on Unicode's web site (well,
conversion tables are... the definition is long and you should buy the
book). You can find some sample code that might help on our FTP site
(ftp://ftp.simultrans.com/anonymous). You're not going to write your own
character converter, though, are you? I thought you guys were a Windows
shop?

Addison
        __________________________________________

        Addison Phillips
        Director, Globalization Services
        SimulTrans, L.L.C.
        2606 Bayshore Parkway
        Mountain View, California 94043 USA

        +1 650-526-4652 (direct telephone)
        +1 650-969-9959 (facsimile)
        AddisonP@simultrans.com (Internet email)
        http://www.simultrans.com (website)

        "22 languages. One release date."
        __________________________________________

-----Original Message-----
From: Magda Danish (Unicode) [mailto:v-magdad@microsoft.com]
Sent: jeudi 6 mai 1999 15:52
To: Unicode List
Subject: FW: Question...

> -----Original Message-----
> From: Joon Kang [SMTP:ykang@verity.com]
> Sent: Thursday, April 22, 1999 2:32 PM
> To: info@unicode.org
> Subject: Question...
>
> Hello,
>
> My name is Joon Kang working at Verity, Inc as an Engineer.
> I have two questions on Unicode and would very appreciated if you cure my
> curiosity.
> I believe it might be very simple for you.
>
> 1. Is there any specific reason to have NULL in Unicode to represent a
> character?
> I found many in Asian codepoints.
> I am going to write a program, transfering Unicode data through TCP/IP
> connection.
> But due to this NULL, I realized that the data transmission can be
> broken.
> Is there any solution on that?
>
> 2. Where I can find the clear definition & encoding scheme for various
> Unicode variants in FREE?
> UCS2, UTF16, UTF8, UCS4 ...
>
> Thank you very much!
> -Joon Kang
> Verity, Inc.
> 892 Ross Drive,
> Sunnyvale, CA 94089
> ykang@verity.com <mailto:ykang@verity.com>
> (408) 542-2323
>
>



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:45 EDT