RE: discovering code points with embedded nulls

From: Rick Cameron (Rick.Cameron@crystaldecisions.com)
Date: Wed Feb 05 2003 - 13:25:19 EST

Next message: Marco Cimarosti: "RE: discovering code points with embedded nulls"

Previous message: jameskass@att.net: "VS vs. P14 (was Re: Indic Devanagari Query)"
Maybe in reply to: Erik.Ostermueller@alltel.com: "discovering code points with embedded nulls"
Next in thread: Marco Cimarosti: "RE: discovering code points with embedded nulls"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Are you sure the API doesn't support Unicode _characters_ with embedded
NULs? Or does it fail to support Unicode _strings_ with embedded NULs?

If it really is the former, no character in UTF-8 (except, of course,
U+0000) will include a NUL byte. In UTF-16, it will be any character of the
form U+00xx (that is, all the ASCII and Latin-1 characters) or U+xx00 (a
great miscellany of characters).

It's hard to believe that an API that accepts UTF-16 would not handle ASCII
and Latin-1 characters! So I think the restriction must be about embedded
U+0000 characters in strings.

If so, that's much less onerous - it's pretty weird to embed U+0000 in the
middle of a string, despite the fact that many Win32 API functions require
this!

- rick

-----Original Message-----
From: Erik.Ostermueller@alltel.com [mailto:Erik.Ostermueller@alltel.com]
Sent: Wednesday, 5 February 2003 8:43
To: unicode@unicode.org
Subject: discovering code points with embedded nulls

Hello, all.

I'm dealing with an API that claims it doesn't support unicode characters
with embedded nulls. I'm trying to figure out how much of a liability this
is.

What is my best plan of attack for discovering precisely which code points
have embedded nulls given a particular encoding? Didn't find it in the
maillist archive. I've googled for quite a while with no luck.

I'll want to do this for a few different versions of unicode and a few
different encodings. What if I write a program using some of the data files
available at unicode.org? Am I crazy (I'm new at this stuff) or am I getting
warm? Perhaps this data file:
http://www.unicode.org/Public/UNIDATA/UnicodeData.txt ?

Algorithm:
INPUT: Name of unicode code point file
INPUT: Name of encoding (perhaps UTF-8)

Read code point from file.
Expand code point to encoded format for the given encoding. Test all
constituent bytes for 0x00. Goto next code point from file.

Thanks in advance for any help,

--Erik O.

Next message: Marco Cimarosti: "RE: discovering code points with embedded nulls"
Previous message: jameskass@att.net: "VS vs. P14 (was Re: Indic Devanagari Query)"
Maybe in reply to: Erik.Ostermueller@alltel.com: "discovering code points with embedded nulls"
Next in thread: Marco Cimarosti: "RE: discovering code points with embedded nulls"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Feb 05 2003 - 14:05:37 EST