Re: Obsolete characters

From: Mark E. Shoulson (mark@kli.org)
Date: Thu Jan 15 2009 - 14:57:02 CST

  • Next message: Michael Everson: "Re: Obsolete characters"

    Mark Davis wrote:

    > According to the information I have (extracting from UAX31 and UTF39
    > plus some heuristics on Unicode subheaders), the following are
    > archaic/obsolete characters (that is, not in customary modern use).
    > There are undoubtedly errores, so I'd appreciate any feedback on any
    > of these that are incorrect, or any others missing that you know of.
    > (Note: I have a separate question out about some of these that are IPA
    > characters.)
    I'm not sure what you're using as a criterion for "obsolescence," nor
    where you're getting your data for it; it seems there are several
    different notions of "obsolete" competing for space here.
    >
    > [:blk=Syriac:] [:blk=Ogham:] [:blk=Runic:]
    > [:blk=Hangul_Compatibility_Jamo:]
    > [:blk=Halfwidth_And_Fullwidth_Forms:] [:blk=Old_Italic:]
    > [:blk=Gothic:] [:blk=Deseret:] [:blk=Byzantine_Musical_Symbols:]
    > [:blk=Tagalog:] [:blk=Hanunoo:] [:blk=Buhid:] [:blk=Tagbanwa:]
    > [:blk=Linear_B_Syllabary:] [:blk=Linear_B_Ideograms:]
    > [:blk=Aegean_Numbers:] [:blk=Ugaritic:] [:blk=Shavian:]
    > [:blk=Osmanya:] [:blk=Cypriot_Syllabary:]
    > [:blk=Ancient_Greek_Musical_Notation:] [:blk=Ancient_Greek_Numbers:]
    > [:blk=Buginese:] [:blk=Coptic:] [:blk=Glagolitic:] [:blk=Kharoshthi:]
    > [:blk=Old_Persian:] [:blk=Syloti_Nagri:] [:blk=Phags_Pa:]
    > [:blk=Phoenician:] [:blk=Cuneiform:]
    > [:blk=Cuneiform_Numbers_And_Punctuation:] [:blk=Sundanese:]
    > [:blk=Rejang:] [:blk=Ancient_Symbols:] [:blk=Phaistos_Disc:]
    > [:blk=Lycian:] [:blk=Carian:] [:blk=Lydian:]
    Just looking here, most of these are obsolete in the sense of not being
    in *common* use by whatever their community once was (I wonder just how
    "common" fluency was with Linear B or the Phaistos Disc script, even at
    the time and place when they were used). But most if not all of these
    are still in use by their respective scholarly communities.

    On the other side, you list the Hangul Jamo. AFAIK, Hangul is still
    very much in use. What's obsolete, apparently, is the individual jamo
    way of encoding them. That's a different type of obsoleteness. To
    people using the scripts described above, we would say "those scripts
    aren't in common use, but they're in Unicode for you to use." To people
    using Hangul, we would say "Sure, use Hangul, but *don't* use these
    blocks, they're only there for compatibility."

    There are probably people still out there using slightly outdated
    flavors of IPA; don't write off their characters too quickly.
    > ||
    > |U+03D8 <http://unicode.org/cldr/utility/character.jsp?a=03D8>| ( ? )
    > GREEK LETTER ARCHAIC KOPPA
    > |U+03D9 <http://unicode.org/cldr/utility/character.jsp?a=03D9>| ( ? )
    > GREEK SMALL LETTER ARCHAIC KOPPA
    > |U+03DA <http://unicode.org/cldr/utility/character.jsp?a=03DA>| ( ? )
    > GREEK LETTER STIGMA
    > |U+03DB <http://unicode.org/cldr/utility/character.jsp?a=03DB>| ( ? )
    > GREEK SMALL LETTER STIGMA
    > |U+03DC <http://unicode.org/cldr/utility/character.jsp?a=03DC>| ( ? )
    > GREEK LETTER DIGAMMA
    > |U+03DD <http://unicode.org/cldr/utility/character.jsp?a=03DD>| ( ? )
    > GREEK SMALL LETTER DIGAMMA
    > |U+03DE <http://unicode.org/cldr/utility/character.jsp?a=03DE>| ( ? )
    > GREEK LETTER KOPPA
    > |U+03DF <http://unicode.org/cldr/utility/character.jsp?a=03DF>| ( ? )
    > GREEK SMALL LETTER KOPPA
    > |U+03E0 <http://unicode.org/cldr/utility/character.jsp?a=03E0>| ( ? )
    > GREEK LETTER SAMPI
    > |U+03E1 <http://unicode.org/cldr/utility/character.jsp?a=03E1>| ( ? )
    > GREEK SMALL LETTER SAMPI
    Aren't (some of) these still in common use in Greece for representing
    numbers?
    > |U+05A2 <http://unicode.org/cldr/utility/character.jsp?a=05A2>| ( ? )
    > HEBREW ACCENT ATNAH HAFUKH
    Atnah Hafukh is no more and no less obsolete than all the rest of the
    cantillations/accents. No new text are being written that use it, but
    it's still in use for the texts that have it. It's actually *less*
    obsolete since it was rediscovered as having always been there and
    merely conflated with YERAH BEN YOMO for some centuries.
    > |U+05C5 <http://unicode.org/cldr/utility/character.jsp?a=05C5>| ( ? )
    > HEBREW MARK LOWER DOT
    > |U+05C6 <http://unicode.org/cldr/utility/character.jsp?a=05C6>|
    > ( ??? ) HEBREW PUNCTUATION NUN HAFUKHA
    > |U+05C7 <http://unicode.org/cldr/utility/character.jsp?a=05C7>| ( ? )
    > HEBREW POINT QAMATS QATAN
    QAMATS QATAN is a recent invention; it's just coming into use, not
    drifting out of use.
    > |U+00B5 <http://unicode.org/cldr/utility/character.jsp?a=00B5>| ( )
    > MICRO SIGN
    This looks like it's more the other kind of obsolete. Are you saying it
    should be replaced by U+03BC ? ? Because certainly the symbol, in both
    meanings, is in current usage.
    >
    > |U+0132 <http://unicode.org/cldr/utility/character.jsp?a=0132>| ( IJ )
    > LATIN CAPITAL LIGATURE IJ
    > |U+0133 <http://unicode.org/cldr/utility/character.jsp?a=0133>| ( ij )
    > LATIN SMALL LIGATURE IJ
    > |U+013F <http://unicode.org/cldr/utility/character.jsp?a=013F>| ( L. )
    > LATIN CAPITAL LETTER L WITH MIDDLE DOT
    > |U+0140 <http://unicode.org/cldr/utility/character.jsp?a=0140>| ( l. )
    > LATIN SMALL LETTER L WITH MIDDLE DOT
    These, too, and most of the rest, are in use, but maybe not in this
    form. Obsolete like the jamos, not like Linear B.
    > |U+FB20 <http://unicode.org/cldr/utility/character.jsp?a=FB20>|
    > ( ??? ) HEBREW LETTER ALTERNATIVE AYIN
    > |U+FB21 <http://unicode.org/cldr/utility/character.jsp?a=FB21>|
    > ( ??? ) HEBREW LETTER WIDE ALEF
    > |U+FB22 <http://unicode.org/cldr/utility/character.jsp?a=FB22>|
    > ( ??? ) HEBREW LETTER WIDE DALET
    > |U+FB23 <http://unicode.org/cldr/utility/character.jsp?a=FB23>|
    > ( ??? ) HEBREW LETTER WIDE HE
    > |U+FB24 <http://unicode.org/cldr/utility/character.jsp?a=FB24>|
    > ( ??? ) HEBREW LETTER WIDE KAF
    > |U+FB25 <http://unicode.org/cldr/utility/character.jsp?a=FB25>|
    > ( ??? ) HEBREW LETTER WIDE LAMED
    > |U+FB26 <http://unicode.org/cldr/utility/character.jsp?a=FB26>|
    > ( ??? ) HEBREW LETTER WIDE FINAL MEM
    > |U+FB27 <http://unicode.org/cldr/utility/character.jsp?a=FB27>|
    > ( ??? ) HEBREW LETTER WIDE RESH
    > |U+FB28 <http://unicode.org/cldr/utility/character.jsp?a=FB28>|
    > ( ??? ) HEBREW LETTER WIDE TAV
    (Hebrew is what I'm familiar with, ok?) These aren't in current usage in
    particular, but are they obsolete or just shouldn't have been considered
    distinct in the first place? We don't have a codepoint for the LAMED
    with the broken head (and I'm not saying we need one, either).
    > |U+FB29 <http://unicode.org/cldr/utility/character.jsp?a=FB29>| ( + )
    > HEBREW LETTER ALTERNATIVE PLUS SIGN
    > |U+FB4F <http://unicode.org/cldr/utility/character.jsp?a=FB4F>|
    > ( ??? ) HEBREW LIGATURE ALEF LAMED
    I understand that the ALEF LAMED ligature is still used in some
    Judeo-Arabic languages.

    ~mark



    This archive was generated by hypermail 2.1.5 : Thu Jan 15 2009 - 14:59:08 CST