From: Doug Ewell (firstname.lastname@example.org)
Date: Sun Apr 11 2010 - 11:19:21 CDT
"Janusz S. "BieÅ„"" <jsbien at mimuw dot edu dot pl> wrote:
> The next stage will be to assign PUA code point to them, primarily for
> the purpose to encode the texts systematically for inclusion in the
> search engine
> At this stage my question is purely technical: what is the best form
> to prepare and maintain such a specification?
By definition, a PUA specification will not be reviewed or approved by
the Unicode or 10646 technical committees. It is a private
specification. You can encode text in it, teach your search engine to
recognize it, and distribute it to other interested parties ("private"
does not mean "secret"), but if you want any of these characters to be
formally encoded in Unicode/10646, you should follow Tex's link and
prepare a "real" Unicode/10646 proposal form. Characters do not enjoy
any special advantage in consideration for formal encoding merely by
having been listed in a PUA spec.
If these characters are only used in books *about* proposed new
orthographies, not books written *in* the orthographies, then a PUA
solution seems especially appropriate.
If you do want PUA assignments, it might be most appropriate to propose
these for inclusion in MUFI. They are medieval Latin glyphs, not a
completely different invented script, which would be appropriate for the
ConScript registry. However, you may wish to use the ConScript model in
writing your proposal, since it provides structure to describe many of
the important encoding and display issues.
Presenting the General Property values of these characters in a format
similar to UnicodeData.txt is probably a good idea (although you can
augment this with prose as well). You can describe the combining
sequences using the NamedSequences format, which is a better choice than
encoding them as precomposed characters.
The XeTeX approximations don't seem to shed much light on any issues
involving these letters.
-- Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org RFC 5645, 4645, UTN #14 | ietf-languages @ http://is.gd/2kf0s Â
This archive was generated by hypermail 2.1.5 : Sun Apr 11 2010 - 11:26:41 CDT