Re: Question about \uxxxx etc. for 21-bit code points - need advice

From: Frank da Cruz ([email protected])
Date: Tue May 23 2000 - 15:00:58 EDT

Next message: [email protected]: "RE: Question about \uxxxx etc. for 21-bit code points - need advi ce"
Previous message: Paul Dempsey: "RE: Question about \uxxxx etc. for 21-bit code points - need advi ce"
Maybe in reply to: Markus Scherer: "Question about \uxxxx etc. for 21-bit code points - need advice"
Next in thread: Marco Cimarosti: "Re: Question about \uxxxx etc. for 21-bit code points - need advice"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

> we (ICU) are trying to figure out how best to specify non-BMP (21-bit) code
> points with escape sequences or similar in strings.
>
> Problem:
> The C language has \ooo with octal digits for bytes of whatever encoding,
> and modern compilers also know \xhh with hexadecimal digits (with variable
> numbers of digits). Java introduced \uhhhh with (always 4) hexadecimal
> digits for Unicode code units.
>
> But how does one write a non-BMP code point in this fashion?
>
> I am trying to list some suggestions, make a proposal, and ask you for what
> you are doing or other people/standards/organizations/languages are planning
> to do.
>
Making up new x's for "\x" is not the best way, since, as long as our
programming languages are based on ASCII (a whole different topic), we'll
quickly run out of x's, especially when we are overloading the x by trying to
make it convey two pieces of info: the encoding that follows, and its length.

In the Kermit language, we use:

\x{yyy...}

where 'x' says what it is (e.g. decimal, hex, octal, whatever), and the
braces delimit the operand, thus allowing it to be any length. This is also
handy for disambiguating expressions like:

\o0123456

Is that "\o012" followed by "3456" or "\o0123" followed "456"? In:

\o{012}3456

it's clear.

- Frank

Next message: [email protected]: "RE: Question about \uxxxx etc. for 21-bit code points - need advi ce"
Previous message: Paul Dempsey: "RE: Question about \uxxxx etc. for 21-bit code points - need advi ce"
Maybe in reply to: Markus Scherer: "Question about \uxxxx etc. for 21-bit code points - need advice"
Next in thread: Marco Cimarosti: "Re: Question about \uxxxx etc. for 21-bit code points - need advice"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:03 EDT