From: Asmus Freytag (email@example.com)
Date: Fri Nov 19 2010 - 02:22:04 CST
On 11/18/2010 11:15 PM, Peter Constable wrote:
> If you'd like a precedent, here's one:
Yes, I think discussion of precedents is important - it leads to the
formulation of encoding principles that can then (hopefully) result in
more consistency in future encoding efforts.
Let me add the caveat that I fully understand that character encoding
doesn't work by applying cook-book style recipes, and that principles
are better phrased as criteria for weighing a decision rather than as
With these caveats, then:
> IPA is a widely-used system of transcription based primarily on the Latin script. In comparison to the Janalif orthography in question, there is far more existing data. Also, whereas that Janalif orthography is no longer in active use--hence there are not new texts to be represented (there are at best only new citations of existing texts), IPA is as a writing system in active use with new texts being created daily; thus, the body of digitized data for IPA is growing much more that is data in the Janalif orthography. And while IPA is primarily based on Latin script, not all of its characters are Latin characters: bilabial and interdental fricative phonemes are represented using Greek letters beta and theta.
IPA has other characteristics in both its usage and its encoding that
you need to consider to make the comparison valid.
First, IPA requires specialized fonts because it relies on glyphic
distinctions that fonts not designed for IPA use will not guarantee.
(Latin a with and without hook, g with hook vs. two stories are just two
examples). It's also a notational system that requires specific training
in its use, and it is caseless - in distinction to ordinary Latin script.
While several orthographies have been based on IPA, my understanding is
that some of them saw the encoding of additional characters to make them
work as orthographies.
Finally, IPA, like other phonetic notations, uses distinctions between
letter forms on the character level that would almost always be
relegated to styling in ordinary text.
Because of these special aspects of IPA, I would class it in its own
category of writing systems which makes it less useful as a precedent
against which to evaluate general Latin-based orthographies.
> Given a precedent of a widely-used Latin writing system for which it is considered adequate to have characters of central importance represented using letters from a different script, Greek, it would seem reasonable if someone made the case that it's adequate to represent an historic Latin orthography using Cyrillic soft sign.
I think the question can and should be asked, what is adequate for a
historic orthography. (I don't know anything about the particulars of
Janalif, beyond what I read here, so for now, I accept your
categorization of it as if it were fact).
The precedent for historic orthographies is a bit uneven in Unicode.
Some scripts have extensive collection of characters (even duplicates or
near duplicates) to cover historic usage. Other historic orthographies
cannot be fully represented without markup. And some are now better
supported than at the beginning because the encoding has plugged certain
A helpful precedent in this case would be that of another minority or
historic orthography, or historic minority orthography for which the use
of Greek or Cyrillic characters with Latin was deemed acceptable. I
don't think Janalif is totally unique (although the others may not be
dead). I'm thinking of the Latin OU that was encoded based on a Greek
ligature, and the perennial question of the Kurdish Q an W (Latin
borrowings into Cyrillic - I believe these are now 051A and 051C).
Again, these may be for living orthographies.
/Against this backdrop, it would help if WG2 (and UTC) could point
to agreed upon criteria that spell out what circumstances should
favor, and what circumstances should disfavor, formal encoding of
borrowed characters, in the LGC script family or in the general case./
That's the main point I'm trying to make here. I think it is not enough
to somehow arrive at a decision for one orthography, but it is necessary
for the encoding committees to grab hold of the reasoning behind that
decision and work out how to apply consistent reasoning like that in
This may still feel a little bit unsatisfactory for those whose proposal
is thus becoming the test-case to settle a body of encoding principles,
but to that I say, there's been ample precedent for doing it that way in
Unicode and 10646.
So let me ask these questions:
A. What are the encoding principles that follow from the disposition
of the Janalif proposal?
B. What precedents are these based on resp. what precedents are
consciously established by this decision?
This archive was generated by hypermail 2.1.5 : Fri Nov 19 2010 - 02:24:57 CST