From: Dmitry Turin (unicode20@narod.ru)
Date: Mon Oct 01 2007 - 13:27:42 CST
Philippe,
>> PV> case-insensitive searches, the
>> PV> algorithms are extremely simple and fast in their implementation
>> These algorithms are unnecessary in general.
PV> Unnecessary ?!?!?
Yes. It's redundant everywhere, where it's possible to be without it.
PV> you need to consider the huge cost of the conversion
Take care:
cost is not payment for modification of already written software,
cost is payment for realization of redundant algorith in future software.
PV> what is the interest of
PV> making such change, except locally within your own local applications? If
PV> transform texts locally in your system
Once again: transform, transform, transform ...
PV> Concrete implementations
PV> already exist that don't need your proposed "controls".
Am i ever said, that it's impossible to do in other way !?
There are many ways to reach purpose, but these ways have different
property (characteristic).
---look into
PV> think about Base64 representation of binary data
Where are you see a problem ?
PV> in other protocols like Email and networking protocols
(1.1) Network protocals don't use Unicode II
(in which these two new symbols could be)
(1.2) To this future time, all network protocals will be compress into
one: into XML, used for this purpose. It's obviously.
(2) Today: all network protocals (as i know) understand strings,
written only by lower-case letters (if i'm wrong, correct me).
Thus new two symbols have no influence to protocals.
PV> if there are some prior "symbol" or control somewhere at an unknown distance
+
PV> Think about those algorithms that try to extract substrings, including text
PV> parsers used for linguistic analysis
+
PV> extract substrings
Look at these two signs as at _printable_ lower-case letters.
Parser __must__ not distinguish printable and these two un-printable
(control) symbols.
PV> One note: you have forgotten tricameral letters that are ligatures of two
PV> letters (that may have their own bicameral behaviour, but that, when used in
PV> combination in the ligated form, create a tricameral scheme with lowercase,
PV> uppercase and titlecase forms...)
+
PV> tricameral
PV> like the ligatures "dz" or "DZ" or "Dz"
+
PV> Turkish/Azeri "I" letters with or without upper dot
+
PV> conversion between the two sets [small letters and capital letters]
PV> is not trivial and not safe in all cases.
+
PV> Case conversion is not a lossless process
(1) What is the syntax rules, when each of forms should be used ?
(2) Could you point to graphical images of these three forms
for several tricameral letters ?
---back question
PV> in Dutch for the "ij" or "IJ" ligature which is distinct from
PV> the two separate letters "i/I" and "j/J"
PV> ... when capitalizing it: this Dutch ligature itself is bicameral
i.e. this is not interesting case, mentioned above ?
PV> Think about numeric parsers
Please, examples of "numeric parsers".
PV> effect of layout rendering: what is the scope of application of your "#" control?
PV> If such scope is unambiguous, then
PV> the only safe choice would be to make this scope limited to only the next
PV> character, so that you'll need to always write "#o#n#u" and not "#onu".
Rephrase please, i don't understand.
PV> nothing in Unicode prevents you to do that
+
PV> The assigned
PV> standard code points will be the same, even if your local encoding represent
PV> those code points in a decomposed way for capitals.
I'm doubting: free place in encoding table is necessary.
Dmitry Turin
Unicode2 (2.1.1) http://unicode2.chat.ru
HTML6 (6.4.2) http://html60.chat.ru
SQL5 (5.4.0) http://sql50.chat.ru
Computer2 (2.0.3) http://computer20.chat.ru
This archive was generated by hypermail 2.1.5 : Mon Oct 01 2007 - 23:57:52 CST