Re: UTF-c

From: William_J_G Overington (wjgo_10009@btinternet.com)
Date: Sat Feb 26 2011 - 04:59:17 CST

Next message: Philippe Verdy: "Re: UTF-c"

Previous message: William_J_G Overington: "Re: UTF-c"
Maybe in reply to: Thomas Cropley: "UTF-c"
Next in thread: Philippe Verdy: "Re: UTF-c"
Reply: Philippe Verdy: "Re: UTF-c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Philippe Verdy <verdy_p@wanadoo.fr> wrote:

> | 6 bits : 11.yyxxxx
> |
> Encodes U+00C0..U+00FF (by default) :
> |
> yyxxxxx = Unicode scalar value - BASE
> |
> BASE should necessarily be a multiple of 16 (policy of ISO/IEC 10646-1 for block allocations).
> |
> BASE must then be able to store up to 15 bits if arbitrary positions in the UCS are possible
> |
> BASE is then constrained to 0x80 .. 0x10FFF0 (by step of
> 16).
> |
> Same as ISO-8859-1 only if BASE=0xC0
> |
> (BASE may be different from 0xC0 if a switch code has been explicitly used in the stream)

When a byte starting 11 is used in isolation, why is it represented as 11.yyxxxx please?

Is it because there are four possible values of BASE, namely BASE[0], BASE[1], BASE[2] and BASE[3]?

If BASE has a non-negative value less than 0x80, could that value of BASE be used to signal accessing a decoding tree so that the most common codepoints in the text from beyond the range U+0000 to U+007F could be represented using a single byte starting with 11? The contents of the decoding tree could be dynamically altered using switching codes.

If the idea of four values for BASE, in BASE[0], BASE[1], BASE[2] and BASE[3] is used, then access to a decoding tree would be possible simultanwously with one-byte access to a contiguous block of other Unicode characters if so desired, though if BASE[0], BASE[1], BASE[2] and BASE[3] are used the range of possible values of BASE would need to be 17 bits.

For example, at some particular time in some particular application of the format, BASE[0] might have a value of 0x00 and BASE[1] might have a value of 0x100.

William Overington

26 February 2011

Next message: Philippe Verdy: "Re: UTF-c"
Previous message: William_J_G Overington: "Re: UTF-c"
Maybe in reply to: Thomas Cropley: "UTF-c"
Next in thread: Philippe Verdy: "Re: UTF-c"
Reply: Philippe Verdy: "Re: UTF-c"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Feb 26 2011 - 05:02:58 CST