Re: Obsolete characters

From: Mark Davis (
Date: Fri Jan 16 2009 - 14:08:19 CST

  • Next message: Russ Stygall: "Hyphen"

    Good suggestions. "not in common ordinary use" is way too long for a menu,
    but "Uncommon" would probably do the trick.


    On Fri, Jan 16, 2009 at 11:53, Asmus Freytag <> wrote:

    > On 1/15/2009 9:07 PM, Mark Davis wrote:
    >> Good points. There are two purposes, really.
    > I'll address each of them in turn, but that'll destroy the autonumbering...
    >> 1. I have an UTC action to update UTR#39, which provides for sets
    >> of characters that people may want to exclude from identifiers.
    >> It has an 'archaic' category, and I need to update the contents.
    >> The Latin micro sign does not belong on an "obsolete" list. In an
    > identifier context you need to handle it by mapping it to Greek micro, but
    > you have to be realistic that many keyboards will support one and not the
    > other.
    >> 1. Independently, in doing a character picker
    >> (, we found it
    >> useful to put the archaic/obsolete characters in separate
    >> sections. This is work we are looking at at Google, but we're
    >> also making the data available so that others could use/tweek if
    >> they wish.
    >> For a character picker, as you explained elsewhere, the task is not a
    > partition, but potentially several overlapping sets, each geared to specific
    > orthographies or notations.
    > For IPA, you suggested, again elsewhere, that you might split the
    > "official" from the "unofficial" set. Given that the official set has
    > changed, I suggest that you use different names for these sets: "core" and
    > "extended". The point that you want to make it easier for people to find
    > frequently used characters by removing the distractions is well taken, but
    > it's important to do it in a way that suggests nothing about a *preference*.
    > Similar approaches are useful whenever you have a "basic" and an "extended"
    > repertoire.
    > For mathematical notation purposes, you might look at the data tables with
    > UTR#25 to give you an idea how to structure input and what to cover.
    > For punctuations and symbols you might look around, there's been some work
    > done on arranging symbols by shape (dots, dot patterns, stars, circles,
    > crosses, lines, angles, curves, etc.) or by symmetry (rotational, vertical,
    > horizontal, both, etc.). There was a site "" or so, that used
    > that scheme to document a large number of symbols. (But, as on that site,
    > once you locate a symbol, you need explanation about its context and
    > meaning).
    > Note that there may have been some confusion from my message. By
    >> "obsolete" or "archaic",
    > Best avoid such terms - even for #39 I suggest that you rename the category
    > to "not in common ordinary use" or something. Remember that any list you
    > create *will* be taken out of context by somebody. (That's happened to
    > practically all the list you've generated for Unicode, so this one's not
    > going to be an exception). Having the category named in a way that clearly
    > relates to the criteria for classification is a good method to mitigate that
    > problem.
    > A./
    > we don't mean that the character itself is deprecated or that people
    >> shouldn't use it; what we mean is that it isn't customarily used in modern
    >> languages in typical publications (corner newspapers, magazines, etc.). For
    >> example, you wouldn't expect to see words written in Cuneiform in the NY
    >> Times. Of course, they may occur in technical journals, especially those
    >> dealing with archaic languages, or have occasional decorative use.
    >> Mark

    This archive was generated by hypermail 2.1.5 : Fri Jan 16 2009 - 14:09:18 CST