Not about Phoenician as requested

From: Peter Kirk (
Date: Mon Nov 08 2004 - 18:47:07 CST

  • Next message: Peter Kirk: "Again not about Phoenician"

    On 08/11/2004 20:06, Edward H. Trager wrote:

    >While the Unicode code space is by definition mathematically finite, still it is
    >for all practical intents and purposes a very large code space that should be
    >able to incorporate the "legitimate needs" of scholars, researchers, historians,
    >among others. Regardless of whether one agrees completely or not about the encoding
    >of Phoenecian in Unicode, I --perhaps naively I admit-- fail to see how it does
    >any more harm than the encoding of that HUGE number of "CJK Unified Ideographs
    >Extension B" which, as far as I can tell (given my lack of scholarship in this area),
    >is of more use to esoteric scholars
    >than it is to ordinary speakers and writers of Chinese, Japanese, or Korean.
    >It is no worse than the encoding of a large number of Arabic ligatures --a clear
    >case of encoding glyphs, not characters-- that occurred in Unicode to support legacy
    >systems that had already been defined for Arabic at the time when Unicode came around.
    >Thankfully a similar thing did not happen for, say, Syriac. It is no worse than
    >the encoding of Hangul syllables.
    >I don't closely follow what additional planes of Unicode are being designated
    >for, but perhaps there should be a plane set aside for the encoding of historical
    >"script nodes" that would be useful to scholars, but not as useful to others.
    >Then again, perhaps I'm too naive in this area to know what I'm talking about ... ;-)
    Thank you for your mostly helpful comments.

    But I would like to address your argument that it does no harm to add
    additional characters which people can use or not use as they please. I
    would like to disagree, as a general principle. The aim of Unicode
    standardisation is surely to define a single and unambiguous
    representation of text. That requires that there be a single code point
    for each character, or perhaps a set of canonically equivalent
    representations. Where for historical reasons there are alternative
    representations e.g. Arabic presentation forms, use of them is clearly
    (though sometimes not clearly enough) deprecated, and anyway they
    usually have canonical decompositions. But if we get into the position
    where there is more than one (not canonically equivalent) way of
    representing the same text, we are moving quickly away from
    standardisation. There may be good reasons for some departures, but the
    impact of these will be minimised by mechanisms like compatibility
    decompositions and folding together for collation. But the suggestion of
    encoding alternative representations for variant forms of scripts for
    use alongside the original ones is likely to lead rapidly to chaos.

    Imagine for example if Fraktur were defined as a "historical script
    node" on your scheme, for use by scholars only. The result would be that
    some scholars would encode texts with the special Fraktur characters,
    but others as well as the general public would encode them as currently
    as glyph variants of Latin script. The result would quickly be chaos.

    ... (omitted by request)

    Peter Kirk (personal) (work)

    This archive was generated by hypermail 2.1.5 : Mon Nov 08 2004 - 18:53:41 CST