Re[6]: marks (2 new symbols)

From: Dmitry Turin (
Date: Mon Oct 01 2007 - 13:27:42 CST

  • Next message: Dmitry Turin: "Re[6]: marks (2 new symbols)"


    >> PV> case-insensitive searches, the
    >> PV> algorithms are extremely simple and fast in their implementation
    >> These algorithms are unnecessary in general.
    PV> Unnecessary ?!?!?
    Yes. It's redundant everywhere, where it's possible to be without it.

    PV> you need to consider the huge cost of the conversion
    Take care:
    cost is not payment for modification of already written software,
    cost is payment for realization of redundant algorith in future software.

    PV> what is the interest of
    PV> making such change, except locally within your own local applications? If
    PV> transform texts locally in your system
    Once again: transform, transform, transform ...

    PV> Concrete implementations
    PV> already exist that don't need your proposed "controls".
    Am i ever said, that it's impossible to do in other way !?
    There are many ways to reach purpose, but these ways have different
    property (characteristic).

    ---look into

    PV> think about Base64 representation of binary data
    Where are you see a problem ?

    PV> in other protocols like Email and networking protocols
    (1.1) Network protocals don't use Unicode II
    (in which these two new symbols could be)
    (1.2) To this future time, all network protocals will be compress into
    one: into XML, used for this purpose. It's obviously.
    (2) Today: all network protocals (as i know) understand strings,
    written only by lower-case letters (if i'm wrong, correct me).
    Thus new two symbols have no influence to protocals.

    PV> if there are some prior "symbol" or control somewhere at an unknown distance
    PV> Think about those algorithms that try to extract substrings, including text
    PV> parsers used for linguistic analysis
    PV> extract substrings
    Look at these two signs as at _printable_ lower-case letters.
    Parser __must__ not distinguish printable and these two un-printable
    (control) symbols.

    PV> One note: you have forgotten tricameral letters that are ligatures of two
    PV> letters (that may have their own bicameral behaviour, but that, when used in
    PV> combination in the ligated form, create a tricameral scheme with lowercase,
    PV> uppercase and titlecase forms...)
    PV> tricameral
    PV> like the ligatures "dz" or "DZ" or "Dz"
    PV> Turkish/Azeri "I" letters with or without upper dot
    PV> conversion between the two sets [small letters and capital letters]
    PV> is not trivial and not safe in all cases.
    PV> Case conversion is not a lossless process
    (1) What is the syntax rules, when each of forms should be used ?
    (2) Could you point to graphical images of these three forms
    for several tricameral letters ?

    ---back question

    PV> in Dutch for the "ij" or "IJ" ligature which is distinct from
    PV> the two separate letters "i/I" and "j/J"
    PV> ... when capitalizing it: this Dutch ligature itself is bicameral
    i.e. this is not interesting case, mentioned above ?

    PV> Think about numeric parsers
    Please, examples of "numeric parsers".

    PV> effect of layout rendering: what is the scope of application of your "#" control?
    PV> If such scope is unambiguous, then
    PV> the only safe choice would be to make this scope limited to only the next
    PV> character, so that you'll need to always write "#o#n#u" and not "#onu".
    Rephrase please, i don't understand.

    PV> nothing in Unicode prevents you to do that
    PV> The assigned
    PV> standard code points will be the same, even if your local encoding represent
    PV> those code points in a decomposed way for capitals.
    I'm doubting: free place in encoding table is necessary.

    Dmitry Turin
    Unicode2 (2.1.1)
    HTML6 (6.4.2)
    SQL5 (5.4.0)
    Computer2 (2.0.3)

    This archive was generated by hypermail 2.1.5 : Mon Oct 01 2007 - 23:57:52 CST