RE: VOWEL, CONSONANT, ...: allow recognition of shorter names?

From: Philippe Verdy (
Date: Fri Apr 11 2008 - 21:16:41 CDT

  • Next message: Asmus Freytag: "Re: Collection numbers."

    Michael Everson wrote:
    > At 11:35 -0700 2008-04-11, Kenneth Whistler wrote:
    > >National Bodies are (justifiably, I think) concerned and
    > worried about
    > >algorithmic constraints on their ability to name things,
    > particularly
    > >when the constraints get complicated to the point that they can't
    > >remember all the details or envision being able to check
    > manually for
    > >uniqueness.
    > Yep.

    There's certainly applications that would benefit of having simplified
    character names.

    These names could be simplified by an automatic process that drops words
    that are not necessary for uniqueness of character names in one version.

    Now let's suppose a new character is added, and using the same list of
    removed words, the characters names are no longer unique. How will the
    automatic "word remover" will be able to make the difference? One solution
    is to use the age of characters, i.e. their property specifying the Unicode
    version in which they were introduced: words would still be removed (and
    implied) from the older characters, when the newer character will still need
    a precision. Other candidate words for suppression in simplified names
    include "symbol", "with", "accent", "mark", "sign", "vocalic" (but beware of
    "R", "RR", "L" and "LL" in Indic scripts which may need the difference
    between the combining vocalic sign and an alternate base consonnant)...

    Anyway, the default names assigned to characters should remain stable, even
    if they are misleading (due to historic errors), so the extension of
    stability rules will not be useful (they are already a constraint that may
    become difficulties for assingning standard names to new characters).

    On the opposite, the removal of all separators (spaces and hyphen) in the
    existing stability rules seems overkill when a simple substitution of all
    sequences of separators by a single space (substitutable using underscores
    or capitalization for language identifiers) would have been enough to keep
    the words separable.

    But nothing prevents an application to use alternate lists of character
    names (and notably localized or transliterated lists, or the original
    character names in the script for which the character is defined). For
    example the Hangul O-E exception in the (standard) default names comes from
    a restriction to use only the Basic Latin letters, even though the "OE" name
    should probably better be presented by using the Latin "open" O letter or a
    diacritic for languages where it is meaningful. And nothing prevents the
    same application to maintain its own stability rules on this simplified list
    (here also the age/version property of characters will be useful, as this
    property is guaranteed to be stable).

    This archive was generated by hypermail 2.1.5 : Sat Apr 12 2008 - 11:01:20 CDT