From: William_J_G Overington (email@example.com)
Date: Sat Feb 26 2011 - 04:59:17 CST
Philippe Verdy <firstname.lastname@example.org> wrote:
> | 6 bits : 11.yyxxxx
> Encodes U+00C0..U+00FF (by default) :
> yyxxxxx = Unicode scalar value - BASE
> BASE should necessarily be a multiple of 16 (policy of ISO/IEC 10646-1 for block allocations).
> BASE must then be able to store up to 15 bits if arbitrary positions in the UCS are possible
> BASE is then constrained to 0x80 .. 0x10FFF0 (by step of
> Same as ISO-8859-1 only if BASE=0xC0
> (BASE may be different from 0xC0 if a switch code has been explicitly used in the stream)
When a byte starting 11 is used in isolation, why is it represented as 11.yyxxxx please?
Is it because there are four possible values of BASE, namely BASE, BASE, BASE and BASE?
If BASE has a non-negative value less than 0x80, could that value of BASE be used to signal accessing a decoding tree so that the most common codepoints in the text from beyond the range U+0000 to U+007F could be represented using a single byte starting with 11? The contents of the decoding tree could be dynamically altered using switching codes.
If the idea of four values for BASE, in BASE, BASE, BASE and BASE is used, then access to a decoding tree would be possible simultanwously with one-byte access to a contiguous block of other Unicode characters if so desired, though if BASE, BASE, BASE and BASE are used the range of possible values of BASE would need to be 17 bits.
For example, at some particular time in some particular application of the format, BASE might have a value of 0x00 and BASE might have a value of 0x100.
26 February 2011
This archive was generated by hypermail 2.1.5 : Sat Feb 26 2011 - 05:02:58 CST