RE: Ol Chiki character name typo?

From: Peter Constable (petercon@microsoft.com)
Date: Tue Nov 27 2007 - 15:30:09 CST

  • Next message: Curtis Clark: "Re: Ol Chiki character name typo?"

    > From: David Starner [mailto:prosfilaes@gmail.com]

    > On Nov 27, 2007 10:14 AM, Peter Constable <petercon@microsoft.com>
    > wrote:
    > > relevant ISO documents have been available during those two years
    > (e.g. go to the SC2 doc register at
    > http://lucia.itscj.ipsj.or.jp/itscj/servlets/ScmDoc10?Com_Id=02 and
    > look for N3909).
    >
    > Wow. Yes, Douglas Adam's comments on what publicly available means are
    > certainly applicable here. Let's start with a webpage for ISO 10646,
    > ISO/IEC JTC 1/SC 2, that's stored at
    > http://lucia.itscj.ipsj.or.jp/itscj/servlets/ScmDoc10?Com_Id=02.
    > What's lucia, itscj, ipsj, ScmDoc10 or Com_ID=02?

    What does it matter? There are links to the SC2 site from the Unicode site, including a link from the character pipeline page, and it's easy enough to search on something like "iso sc2" or "iso wg2" or "iso 10646" to find the site.

    > Looking at the page, most email list archives are easier to browse.
    > The names and subjects of documents are hidden in tiny text...

    What's at stake here is not how easy SC2 makes it for an arbitrary person to find arbitrary content in their docs. Rather, this discussion is about whether there is an appropriate process to avoid things like a misspelled word in a character name getting discovered too late for anyone to do anything about it.

    > > Part of that working arrangement has always entailed that
    > > there is a stage in the encoding of a character when the
    > > character names and code positions are locked down. For
    > > Ol Chiki, that was reached some time ago. The process in place
    > > assumes that errors in character names have been identified and
    > > corrected by reviewers before that point.
    >
    > That's broken by design. You can't make major changes up to the last
    > second, like complete changes of the character names, and expect all
    > the minor details, like the spelling, to be correct. One also might
    > hope that someone might make a list of words used in the character
    > names, where fthora and fhtora would jump out at one.

    Major changes such as complete changes of character names are *not* made up to the last second. That's the whole point. The character name in question has been stable for over two years in each record maintained by the either UTC or WG2; and the year-long review period when these particular characters should have been reviewed ended a year ago. This should have been fully baked and out the ISO door several months ago, except that for some reason the preparation of the final yes/no ballot at the JTC1 level was delayed. That ballot is happening now, though it's still just a final yes/no -- the period for technical changes has passed.

    When someone proposes to encode a new script such as Ol Chiki, chances are that at that point they are the party engaged in the process that's most familiar with the contents of their proposal, including the character names. And when they engage in the process, it's expected that they champion their proposal through the process. That includes tracking the relevant documents -- so maybe Joe Blow doesn't know that JTC1/SC2/N3209 is located on a particular site and contains names for Ol Chiki characters that need review, but the person who submitted the proposal probably does and should be looking to make sure what gets added to the drafts doesn't have errors.

    Of course, note that in this case, there was at no point any error in the drafts: they always included the names that were proposed. The bug was in the original design spec, so to speak.

    With enough additional resources, certainly more work could be done to review spellings. Is anyone unhappy with the current process volunteering? (Nobody is getting paid to check these spellings.) Someone volunteering to do the work could certainly create and review word-frequency lists during the review period; just start participating in the work of your national body or in the work of UTC and take it on as part of your contribution. I certainly expect if someone was doing that and encountered results such as

    LETTER 5067
    LETTR 1

    they'd probably draw the right conclusion; but the word list for character names has around 5000 words occurring only once, and over 1100 more that occur only two or three times. If you see

    ADAK 1
    ADDAK 1

    Or

    AI 1
    AIN 30
    AINN 1
    AINU 1
    ANN 1

    or

    ALF 1
    ALFA 2

    or

    ARAEA 2
    ARAEAE 1
    ARAEA-EO 1
    ARAEA-I 1
    ARAEA-U 1

    how are you to judge if there's an error? Are you going to investigate hundreds or thousands of such cases out of >7000 words?

    Again, any volunteers?

    Peter



    This archive was generated by hypermail 2.1.5 : Tue Nov 27 2007 - 15:32:21 CST