Re: Obsolete characters

From: Asmus Freytag (
Date: Fri Jan 16 2009 - 13:53:45 CST

  • Next message: Mark Davis: "Re: Obsolete characters"

    On 1/15/2009 9:07 PM, Mark Davis wrote:
    > Good points. There are two purposes, really.
    I'll address each of them in turn, but that'll destroy the autonumbering...
    > 1. I have an UTC action to update UTR#39, which provides for sets
    > of characters that people may want to exclude from identifiers.
    > It has an 'archaic' category, and I need to update the contents.
    The Latin micro sign does not belong on an "obsolete" list. In an
    identifier context you need to handle it by mapping it to Greek micro,
    but you have to be realistic that many keyboards will support one and
    not the other.
    > 1. Independently, in doing a character picker
    > (, we found it
    > useful to put the archaic/obsolete characters in separate
    > sections. This is work we are looking at at Google, but we're
    > also making the data available so that others could use/tweek if
    > they wish.
    For a character picker, as you explained elsewhere, the task is not a
    partition, but potentially several overlapping sets, each geared to
    specific orthographies or notations.

    For IPA, you suggested, again elsewhere, that you might split the
    "official" from the "unofficial" set. Given that the official set has
    changed, I suggest that you use different names for these sets: "core"
    and "extended". The point that you want to make it easier for people to
    find frequently used characters by removing the distractions is well
    taken, but it's important to do it in a way that suggests nothing about
    a *preference*. Similar approaches are useful whenever you have a
    "basic" and an "extended" repertoire.

    For mathematical notation purposes, you might look at the data tables
    with UTR#25 to give you an idea how to structure input and what to cover.

    For punctuations and symbols you might look around, there's been some
    work done on arranging symbols by shape (dots, dot patterns, stars,
    circles, crosses, lines, angles, curves, etc.) or by symmetry
    (rotational, vertical, horizontal, both, etc.). There was a site
    "" or so, that used that scheme to document a large number of
    symbols. (But, as on that site, once you locate a symbol, you need
    explanation about its context and meaning).

    > Note that there may have been some confusion from my message. By
    > "obsolete" or "archaic",
    Best avoid such terms - even for #39 I suggest that you rename the
    category to "not in common ordinary use" or something. Remember that any
    list you create *will* be taken out of context by somebody. (That's
    happened to practically all the list you've generated for Unicode, so
    this one's not going to be an exception). Having the category named in a
    way that clearly relates to the criteria for classification is a good
    method to mitigate that problem.

    > we don't mean that the character itself is deprecated or that people
    > shouldn't use it; what we mean is that it isn't customarily used in
    > modern languages in typical publications (corner newspapers,
    > magazines, etc.). For example, you wouldn't expect to see words
    > written in Cuneiform in the NY Times. Of course, they may occur in
    > technical journals, especially those dealing with archaic languages,
    > or have occasional decorative use.
    > Mark

    This archive was generated by hypermail 2.1.5 : Fri Jan 16 2009 - 13:55:46 CST