**From:** William_J_G Overington (*wjgo_10009@btinternet.com*)

**Date:** Sat Feb 26 2011 - 04:59:17 CST

**Previous message:**William_J_G Overington: "Re: UTF-c"**Maybe in reply to:**Thomas Cropley: "UTF-c"**Next in thread:**Philippe Verdy: "Re: UTF-c"**Reply:**Philippe Verdy: "Re: UTF-c"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]**Mail actions:**[ respond to this message ] [ mail a new topic ]

Philippe Verdy <verdy_p@wanadoo.fr> wrote:

*> | 6 bits : 11.yyxxxx
*

*> |
*

*> Encodes U+00C0..U+00FF (by default) :
*

*> |
*

*> yyxxxxx = Unicode scalar value - BASE
*

*> |
*

*> BASE should necessarily be a multiple of 16 (policy of ISO/IEC 10646-1 for block allocations).
*

*> |
*

*> BASE must then be able to store up to 15 bits if arbitrary positions in the UCS are possible
*

*> |
*

*> BASE is then constrained to 0x80 .. 0x10FFF0 (by step of
*

*> 16).
*

*> |
*

*> Same as ISO-8859-1 only if BASE=0xC0
*

*> |
*

*> (BASE may be different from 0xC0 if a switch code has been explicitly used in the stream)
*

When a byte starting 11 is used in isolation, why is it represented as 11.yyxxxx please?

Is it because there are four possible values of BASE, namely BASE[0], BASE[1], BASE[2] and BASE[3]?

If BASE has a non-negative value less than 0x80, could that value of BASE be used to signal accessing a decoding tree so that the most common codepoints in the text from beyond the range U+0000 to U+007F could be represented using a single byte starting with 11? The contents of the decoding tree could be dynamically altered using switching codes.

If the idea of four values for BASE, in BASE[0], BASE[1], BASE[2] and BASE[3] is used, then access to a decoding tree would be possible simultanwously with one-byte access to a contiguous block of other Unicode characters if so desired, though if BASE[0], BASE[1], BASE[2] and BASE[3] are used the range of possible values of BASE would need to be 17 bits.

For example, at some particular time in some particular application of the format, BASE[0] might have a value of 0x00 and BASE[1] might have a value of 0x100.

William Overington

26 February 2011

**Next message:**Philippe Verdy: "Re: UTF-c"**Previous message:**William_J_G Overington: "Re: UTF-c"**Maybe in reply to:**Thomas Cropley: "UTF-c"**Next in thread:**Philippe Verdy: "Re: UTF-c"**Reply:**Philippe Verdy: "Re: UTF-c"**Messages sorted by:**[ date ] [ thread ] [ subject ] [ author ] [ attachment ]**Mail actions:**[ respond to this message ] [ mail a new topic ]

*
This archive was generated by hypermail 2.1.5
: Sat Feb 26 2011 - 05:02:58 CST
*