Re: Pau Cin Hau scripts proposal : confusive N3865 and better older N3781

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Tue Jul 20 2010 - 22:31:13 CDT

  • Next message: Nishan Naseer: "Re: Indian Rupee Sign (U+20B9) proposal - copyright/licencing issue"

    > Message du 21/07/10 04:11
    > De : "Kenneth Whistler" <kenw@sybase.com>
    > A : verdy_p@wanadoo.fr
    > Copie à : unicode@unicode.org, kenw@sybase.com
    > Objet : Re: Pau Cin Hau scripts proposal : confusive N3865 and better older N3781
    >
    >
    > Philippe Verdy said:
    >
    > > A side note about this preliminary proposal for allocating blocks in
    > > the SMP for the two Pau Cin Hau scripts (including one for the large
    > > "logographic" script, with 1050 signs):
    > >
    > > http://std.dkuug.dk/JTC1/SC2/WG2/docs/n3865.pdf
    > >
    > > (authored by Anshuman Pandey, in MIT)
    > >
    > > If the non-logographic Pau Cin Hau script (currently counting 57 signs
    > > in this preliminary report that does not give its sources and does not
    > > give examples)
    >
    > Those are given in the earlier N3781, which itself is cited in
    > this short document. N3865 also indicates that "A formal proposal
    > for the Pau Cin Hau Syllabary will be submitted shortly." That is
    > the notice that a revision of N3781 will be forthcoming, with
    > more details. N3781 was clearly labelled "Preliminary", and the
    > author has worked extensively on it since February.
    >
    > N3865 *only* specifies the sizes of the anticipated repertoires
    > to encode, as guidance to the Roadmap Committee for UTC and WG2.

    That's exactly what I understood, but this was already anticipated in
    the earlier document, which gave the same estimation for the
    repertoire of the two scripts. but probably this small PDF was
    composed too fast. This does not remove themerit of the work already
    performed in N3781 and N3784 (for tone/length marks used in the small
    script, but possibly used as well on the large script).

    And it does not offer a clear guidance for the large script (is it
    really logographic? I have serious doubts, even if this may apparently
    look logographic in its presentation, only because of its relatively
    small size, and in fact it may already contain what was later
    formalized by the smaller script that derived from it, by systemizing
    the notation of syllables with distinctive letters made from some
    common traits).

    The prior document however contained an interesting remark : the
    smaller alphabetic script still has a modern use, and it could fit in
    the BMP. N3865 still does not answer to that question : BMP or SMP for
    the smaller script (that easily fits in two columns) ?

    But the BMP Roadmap is now almost fully allocated, and the religious
    community using the small script is very small (and living in a
    country where communication is not easy). Given that the ISO 10646
    "implementation levels" are about to be abandonned (because everyone
    now implements only the "level 3" that requires the full support of
    supplementary planes), this may not be so important to fit the small
    alphabetic script in the BMP (even if it's clear that the large script
    will go to the SMP).

    But I still think that the wording in N3865 just gives confusion about
    the nature of the two described scripts, that should even be treated
    separately and don't need to be encoded at the same time. Further
    research is needed for the large script about how it really works,
    because I've not found any example of it for now.

    Some interesting reading, showing actual examples not found in N3781
    for the small alphabetic script (and romanizations of composed
    syllables):

    http://www.scribd.com/doc/3852585/Pau-Cin-Hau-Lai

    The small script is probably one of the most regular and systemic
    found. This means that its implementation should be very easy.

    And sorry about my wording what I said that "long s" is "deprecated" :
    it is of course not within Unicode (and it is used in many documents
    when they need to show the distinction or need to reproduce medieval
    texts), but it is really deprecated within the orthographies of
    modern languages, that no longer attach an importance to the
    distinction between "small s" and "long s". The Latin script as it is
    used now (as well as the Greek script) does not differentiate clearly
    all the coda consonnants from the initial consonnants that are needed
    in many languages ; the distinction was already confusive and
    extremely irregular, even from the same authors or the same typists,
    when they were frequently used in medieval texts.



    This archive was generated by hypermail 2.1.5 : Tue Jul 20 2010 - 22:34:13 CDT