Re: marks

From: Otto Stolz (Otto.Stolz@uni-konstanz.de)
Date: Fri Sep 28 2007 - 10:17:27 CDT

  • Next message: John H. Jenkins: "Re: New brackets (6 new symbols)"

    Hello Дмитры Турин,

    you wrote:
    > (2) My proposal not only economize mark-place in table of encoding
    > (what is important itself),

    Given the almost negligible quota of uppercase letters in the
    number of assigned Unicode codepoints, the saving of code-
    positions is quite unimportant in itself. Note that there are
    only five† scripts featuring cases, at all, and all of them
    are alfabets, hence comparably small. Have a look on the roadmaps,
    sub <http://www.unicode.org/roadmaps/>, to develop a feeling
    for the proportions.

    > but also simplifies comparison of various variants of spelling
    > (all letters are lower-case, first letter is upper-case, all
    > letters are upper-case), because comparison is reduced to
    > comparison in one variant of spelling (all letters are lower-case).

    This is plainly wrong. For, e. g., a case-invariant comparison,
    your proposition requires removal of your “marks”, whilst the
    Unicode way requires case folding. Both are commensurably cheap
    operations, on contemporary computers.

    > Eternity (unlimited time) is before us !
    > You [i. e. Philippe Verdy] are seggesting to carry
    > gasket through future time !

    Unicode is here to stay for quite a while. More than
    16 years of development by uncountable contributors have been
    invested in it, and it is deeply entrenched in an overwhelming
    number of software products, and IT standards. So, if you want
    to replace it with anything new, you would have to
    - prove that your suggestion is indeed superior, and quite so
       in order to justify the expenditure for the change-over,
    - specify your suggestion thoroughly,
    - solve, for your suggested encoding, all those problems that
       have been solved for Unicode in those years (browse through
       the Unicode Standard <http://www.unicode.org/versions/Unicode5.0.0/>,
       the Character Databases <http://www.unicode.org/ucd/> and
       <http://www.unicode.org/charts/unihan.html> and the Technical
       Reports and Standards <http://www.unicode.org/reports/index.html>
       to get an impression of the sheer amount of this work),
    - demonstrate, how the cost of adapting all existing text-
       processing software to your scheme can be afforded by the
       vendors. Note that any new encoding scheme will not render
       the existing software less complex; rather, the software will
       become more complex, as it will have to cope both with legacy,
       and new, data. Hence, there will be no savings (in terms of
       reduced maintenace costs) that could compensate for the
       development of the new code,
    - and, above all, you would have to convince everybody that
       the effort would be worthwile and they should join your plan.

    Believe me, computer users are quite a conservative lot:
    they want their data to be readable, editable, and processable,
    for decades, if not for centuries.

    Above all, your proposition will not work, at all, as the
    details of case-mapping vary with the language.
    > Give me _concrete_ examples of word/phrase,
    > which you don't know how to write within my proposal,
    > and i will send you _concrete_ answers.

    Take, as an example, «İzmir» (in Turkish spelling), or «Izmir»
    (in German spelling), respectively, the name of a Turkish town.
    In your scheme, both of these spellings would be «¿izmir»,
    where «¿» is your capitalize-initial mark. So, how could you
    ever hope to render that word according to the user’s
    expectations? Please do not point to higher-level protocolls,
    such as language-tagging, because this discussion pertains
    to encoding plain text.

    You have not understood Philippes remark:
    > NO capital at the first letter (for example with prefixes)

    Example: the Netherlands’ capital «’s-Gravenhage». However,
    you have already given examples of this sort of happening,
    so there is no need to answer on this particular example.

    You have also written:
    > "Widespread error is equating of designation of a letters (_coding_) and
    > their graphic images (_font_). It’s absolutely different things".

    That error is definitely not widespread among the addressees of your remark;
    rather, they are used to the notions of “character” vs. “glyph”.
    However, most of them will agree that a capital A, a small a, a capital Αλφα,
    a small αλφα, a capital Аз, and a small аз are six different letters.

    But this has nothing to do with the encoding of those letters.
    It was a deliberate decision, based on a history of about 30 years of
    character encoding (before Unicode, as we know it), to assign six different
    code position to those six characters, and not three or even only one.

    Philippe had written:
    > there are many other reasons why your solution is even more complicate
    to which you have answered:
    > List it, please. In way #1 ... , #2 ... , #3 ..., etc

    Some of them have have been brought up in this discussion;
    among them a new one in this very contribution (cf. «İzmir», above).

    But, as I have explained above, it is rather your duty to demonstrate
    the feasability of your proposition — and demonstrate it convincingly —
    than the duty of this list’s subscribers to point out every single flaw
    in your proposition.

    Best wishes,
       Otto Stolz

    ------------

    † Armenian, Cyrillic, (Georgian), Greek, Latin; where Georgian
       has not a fully developped case system,
       cf. <http://www.unicode.org/versions/Unicode5.0.0/ch07.pdf>.



    This archive was generated by hypermail 2.1.5 : Fri Sep 28 2007 - 10:22:47 CDT