From: Theodore H. Smith (delete@elfdata.com)
Date: Wed Jun 14 2006 - 16:26:36 CDT
On 14 Jun 2006, at 18:39, Richard Wordingham wrote:
>
>> The re-ordering pass, is not using my "multiple-replaceall"
>> algorithm. It does use the canonical combining classes. A multi-
>> pass approach, while possible... I wouldn't do, it would take too
>> long.
>
> And this was the basis of the claim that you couldn't just treat
> characters as 'bags of bytes'!
But I do treat them as a "bag of bytes" :)
Even my combining character reorder does that.
I'll copy/paste part of my code (below). (theres two other necessary
functions which I won't bother you with.)
this line here:
StartComb = dict.SearchObj( Data, nil, PrevCombEnd, ElfData.kEnd,
FoundLen, FoundObj )
It finds within some UTF-8 data, the first "key" within the
dictionary, which can be found in the data. The length of the found
item (a character in this case), is stored in FoundLen.
The "value" for the key in the dictionary, is returned into FoundObj.
This "object" is actually just a 1 byte long string, and the byte is
actually the combining code!! Hence this line "CurrByte = ElfData
( FoundObj ).ByteVal".
Anyhow, all this code is byte-oriented, yet it cannot corrupt any
characters or miss them or cause false hits.
Well, in theory, it *could*, if you had a UTF-8 character that could
exist within another UTF-8 character. No ASCII values exist within
UTF-8 characters, and no character even exists within any other, so
in practice this false matching never occurs.
The whole thing is safe, despite treating everything as UTF-8 and not
doing any code point detection or character boundary checking!
The character boundaries are implicit by the keys in the dictionary
themself. If a key, which is a UTF-8 character, is found, and that
key is 3 bytes long, then the character boundary is after the 3rd
byte... It's just all treated as strings of bytes.
Private Function ReorderSub(Dict as ElfDataDictionary, Data as
ElfData, Start as integer, fs as FastString) As integer
dim CurrByte, LastByte, StartComb, PrevCombEnd, FoundLen as Integer
dim Chars() as ElfData
dim Scores() as Integer
dim UnOrdered as Boolean
dim FoundObj as object
PrevCombEnd = Start
do
StartComb = dict.SearchObj( Data, nil, PrevCombEnd,
ElfData.kEnd, FoundLen, FoundObj )
if StartComb = 0 or StartComb <> PrevCombEnd then
exit
end if
CurrByte = ElfData( FoundObj ).ByteVal
if CurrByte < LastByte then
UnOrdered = true
end if
LastByte = CurrByte
chars.Append data.mid( StartComb, FoundLen )
Scores.Append CurrByte
PrevCombEnd = StartComb + FoundLen
loop
if UnOrdered = false then
Return PrevCombEnd
end if
ReorderArrays Chars, Scores
fs.AppendSectElfData Data, fs.Length + 1, Start - ( fs.length + 1 )
for CurrByte = 0 to UBound( Chars )
fs.AppendElfData Chars( CurrByte )
next
Return fs.Length + 1
End Function
-- http://elfdata.com/plugin/
This archive was generated by hypermail 2.1.5 : Wed Jun 14 2006 - 16:33:32 CDT