From: "Christopher John Fynn" <cfynn@gmx.net>
Date: 2003-08-16 23:36:45 -0700
To: <tibex@unicode.org>, "Rick McGowan" <rick@unicode.org>,  "Andrew C. West" <andrewcwest@alumni.princeton.edu>
Subject: Re: [tibex] Re: hPhags-pa Proposal
Cc: "David Germano" <dfg9w@virginia.edu>
Reply-To: "Christopher John Fynn" <cfynn@gmx.net>


----- Original Message -----
From: "Rick McGowan" <rick@unicode.org>
To: <tibex@unicode.org>
Sent: Wednesday, July 09, 2003 10:43 PM
Subject: [tibex] Re: hPhags-pa Proposal


> Does anyone have any further comments on the 'Phags-pa
proposal?
>
> http://uk.geocities.com/BabelStone1357/hPhags-pa/N2352.html

Yes - some comments and questions...

1.
In which order are 'Phags-pa characters to be entered (and
stored)? If characters are entered / stored in the order they
would normally be written some vowels characters would occur
prior to the consonant (or combination) which they effect and
others would be entered after the consonant(s).
I think any impact on both collation and on complexity of
rendering needs to be carefully considered before
properties/weights are assigned
to these characters. Has this been done? Also any effects
Unicode "normalisation" processes  will have on character
ordering and the effects of this need to be considered
carefully.
Since it seems difficult or impossible to get character
properties changed once assigned they need to be right in the
first place - even if it means a delay in the proposal going
forward

Possible collation problems for Tibetan, Mongolian and maybe
Uighur should probably be considered before character properties
are finalised.

2
Is there a real need to encode:-

a.
ABA4 (FA) as a separate character  rather than using HA + WA? -
It may be difficult for someone entering texts to disambiguate
the two and therefore if FA is encoded separate character you
could end up with "FA" being entered as either FA _or_ as  HA +
WA (which complicates string matching particularly if FA has no
official decomposition to HA + WA). Do  any combinations
actually occur which would be ambiguous if you used  the
characters HA+WA instead of a character FA? If  there are no
such instances then there is probably no need for a separate
character FA. If FA is encoded there probably needs to be a
strong note that it is *only* to be used for  Chinese and Old
Uighur FA and never for Tibetan HWA.

b.
ABA9 (SUPERFIXED LETTER RA)? If AB9C (LETTER RA) always
transforms into ABA9 when immediately followed by another
consonant then ABA9 is simply a context dependant glyph variant
of AB9C and should not be encoded as a separate character.

c.
ABA6 (SUBJOINED LETTER WA), ABA7 (SUBJOINED LETTER YA) and ABA8
(SUBJOINED LETTER RA)? Or are  these characters really only
context dependant glyph variants of AB97, AB9B and AB9A?

3.
Is there a need to encode a  'Phags-pa character equivalent to
Tibetan U+0F0B ? Tibetan text written in 'Phags-pa script
without some sort of TSHEG might be hard to disambiguate.  TSHEG
seems to be represented in  'Phags-pa by a space - but a
character with different properties than a normal space may be
required here. Effectively a kind of thin space character (or
ZWNJ) also seems to be used where a new stack would start within
a Tibetan tsheg-bar without such an intervening character the
apparent shaping rules for this script would seem to be that
glyphs for consonants join with each other.


4.
Are the characters proposed at ABAA through ABAE all vowels? If
so,  shouldn't the names of these characters reflect this as
they do in the equivalent Tibetan characters? e.g. I'd suggest
U+ABAA PHAGS-PA VOWEL I rather than U+ABAA PHAGS-PA LETTER I.
And shouldn't U+ABAF PHAGS-PA LETTER CANDRABINDU be  U+ABAF
PHAGS-PA SIGN CANDRABINDU

5.
Is there effectively a 'Phags-pa character equivalent to Tibetan
character 0F71 rather than 0F60[/0FB0]  (i.e. functioning as a
vowel rather than a consonant)? If only one character is encoded
it should be noted that it could be equivalent to either Tibetan
0F60[/0FB0] or 0F71 dependant on context.

6.
 > "the Tibetan use of the Phags-pa script seems now to have
virtually died out"

>From what I've seen modern Tibetan usage seems to be largely
decorative - but saying it has "virtually died out" is probably
a little too strong. Many Tibetans are familiar with the script
and  it continues to be used occasionally for decorative
purposes.
I have seen 'Phags-pa script  used to write text in murals and
other decorations of a number of recently constructed Tibetan
monasteries in India and Nepal. It is also found on the title
pages of some modern xylographs (e.g. the edition of   Mi la'i
rnam mgur published at Apo Rinpoche's monastery in Manali where
it is used to write "MAHA MUDRA" within the sides of the title
page border) and machine printed texts. It is also used in some
contemporary Tibetan seals.   The  'Phags-pa script was probably
only ever used for lengthy texts for a short period - ever since
then it's use has probably been largely decorative and for
seals. - Since this kind of use continues it is probably as
"alive" as it has been at anytime since it stopped being used
for writing long texts in Mongolian.

BTW There is a whole chapter (#16) on this script entitled "hor
yig gsar ba mdzad pa'i mdzad pa'i skor mdo tsam brjod pa"
[p134-146] in the book "bod yig 'bri tshul mthong ba skun smon"
by dpa'-ris sangs-rgyas published by the Minorities Publishing
House in China in 1997 (ISBN 7-02628-6).

7.
Has anyone run this proposal past the Chinese and Mongolian
national bodies represented on WG2? They probably have contacts
with experts in their countries who should look at this and  I
think it is prudent to get them involved as early as possible -
otherwise these national bodies are likely to request time for
consulting their experts before the proposal goes forward.

8.
There may be people at places like the British Library, the LOC
and libraries in China and Mongolia who are much more expert on
this script and have access to many more examples than anyone on
this list, the UTC or WG2. In the absence of real expertise and
with only a few examples I'm apprehensive that that there is a
danger assumptions may be made about the script which later turn
out to be wrong and difficult to correct. With little used
scripts like 'Phags-pa I'd be happier if experts on the script
were more actively sought out and their comments solicited -
rather than relying on them to somehow otherwise hear about
proposals and submit their comments.

9.
The 'Phags-pa script is clearly based on Tibetan script and it
is sometimes used to write Tibetan and so round trip mapping
between 'Phags-pa and Tibetan characters seems desirable - I
think this should be looked at and dealt with thoroughly within
the proposal. Also is there a need for round trip mapping
between 'Phags-pa and Mongolian and/or  'Phags-pa and CJK
characters? If so, this should be considered now since changes
to the proposed encoding may be needed to make this feasible.

10.
In the notes column shouldn't the 'Phags-pa consonants map to
both the equivalent Tibetan headline consonants (0F40-0F68) and
the equivalent Tibetan subjoined consonants (0F90-0FB8)? Which
Tibetan letter a 'Phags-pa letter actually mapped to would have
to be dependant on context.

11.
If this proposal is accepted will any additions have to be made
to the notes for the corresponding characters in the Tibetan
block of  TUS?

12.
 The individual named 'Phags-pa Lama in the proposal is more
properly referred to as 'Phags-pa bLo-gros rGyal-mtshan
[1235-1280] (or 'gro- mgon 'phags-pa blo-gros rgyal-mtshan)  -
there are several other important lamas with the name 'Phags-pa.
A short biographical sketch [in Tibetan] of 'gro-mgon 'phags-pa
blo-gros rgyal-mtshan can be found on pages 351 to 353 of
gangs-can mkhas-grub rim-byon ming-mdzod (a Tibetan biographical
dictionary) [ISBN 5421-0200-1].

13.
When this script gets encoded there may need to be a note in the
standard
stressing that PHAGS-PA LETTER QA and  PHAGS-PA LETTER GGA are
*not* variants of  PHAGS-PA LETTER KHA and PHAGS-PA LETTER GA as
I can easily see people making this mistake when trying to enter
Tibetan words in this script.

- Chris


From: "Andrew C. West" <andrewcwest@alumni.princeton.edu>
Date: 2003-08-19 06:07:06 -0700
To: tibex@unicode.org
Subject: [tibex] Re: hPhags-pa Proposal
Cc: dfg9w@virginia.edu
X-Sent: 19 Aug 2003 10:07:03 GMT
X-Sent-From: andrewcwest@alumni.princeton.edu
Sender: tibex-bounce@unicode.org

Dear Chris,

Thank you for your list of queries and comments. It is a pleasure to be able to
discuss the 'Phags-pa script with someone as knowledgeable as yourself. Whilst
it is true that some of your questions could have been asnswered by reading the
latest the version of the proposal, I am always happy to clarify what I have
written in the proposal. Incidentally, a PDF version of the final version of the
proposal is available from me on request, as the on-line HTML version is
sometimes difficult to access.

I append my responses to your questions below. I trust that you will find all of
my responses to be satisfactory, and that they will alleviate some of the
concerns about the proposal that you have expressed.

Kind Regards,

Andrew

> 1.
> In which order are 'Phags-pa characters to be entered (and
> stored)? If characters are entered / stored in the order they
> would normally be written some vowels characters would occur
> prior to the consonant (or combination) which they effect and
> others would be entered after the consonant(s).

No. In the 'Phags-pa script all letters except the candrabindu are written in
pronunciation order. That is to say vowels always come after the consonant that
they modify (e.g. "gi" is written with the letter I *beneath* the letter GA, not
above as in Tibetan). Thus for the 'Phags-pa script visual order equals logical
order (except for words with a candrabindu, which I will touch upon below).

This means that the encoding model for the 'Phags-pa script is extremely simple
: "each letter of a syllable unit is encoded in visual order from top to bottom"
(Proposal Section 6).

The problematic letter is the Candrabindu, which is always written as the first
letter of a 'Phags-pa syllable cluster, even though it is always logically the
last letter. Thus OM is written as the 'Phags-pa letter MA preceded by the
candrabindu. In the first draft of my proposal I advocated encoding strictly in
logical order, so that OM would be represented in memory as <O, CANDRABINDU>,
but rendered as CANDRABINDU followed by O. However, Rick McGowan and Ken
Whistler suggested that this disunity between logical order in memory and
rendering order on screen would make dynamic rendering, cursor movement, word
select and other such operations difficult and confusing, especially for
syllables longer than a simple OM (and there are some quite long 'Phags-pa
syllables with a candrabindu). They therefore suggested that I change the
proposed encoding model so that all letters, including the candrabindu, are
treated as normal spacing letters and encoded in visual order (i.e.
<CANDRABINDU, O> for OM), which I did. This change certainly makes text
processing of the 'Phags-pa script much simpler. However, I admit to still
having qualms about this change to the encoding model (for example, how does it
affect collation ?), and would welcome feedback from others more knowledgeable
than me in this area. Nevertheless, I'm not too bothered by this issue, and am
happy to go along whatever the UTC in its wisdom decides is the best way to deal
with the candrabindu. I might note that the person most qualified to answer
questions on the practicality of encoding the candrabindu last but rendering it
first is Paul Nelson, who I understand is a member of the UTC. So hopefully when
the UTC discuss my 'Phags-pa proposal they will be able to come to a decision as
to the suitability or otherwise of the method of dealing with the candrabindu
advocated in the proposal.

> I think any impact on both collation and on complexity of
> rendering needs to be carefully considered before
> properties/weights are assigned
> to these characters. Has this been done? Also any effects
> Unicode "normalisation" processes  will have on character
> ordering and the effects of this need to be considered
> carefully.

Due to the nature of the 'Phags-pa script (vowels are independant letters, and
so do not need weighting), these issues are not relevant other than with regard
to the candrabindu, which is discussed above. As the candrabindu sits above a
string of independant letters rather than modifying a stack comprising a single
base consonant and optional dependant subjoined consonants and/or dependant
vowel signs (as is the case with Tibetan), it is very difficult to reorder it
from one end of a syllable clsuter to the other. Therefore, it has been decided
to encode the candrabindu as a normal spacing letter, encoded in visual order,
with the result that Normalization does not apply to the 'Phags-pa script.

> Since it seems difficult or impossible to get character
> properties changed once assigned they need to be right in the
> first place - even if it means a delay in the proposal going
> forward

I certainly agree upon the necessity of getting these right in the first place.
However, with the possible exception of the candrabindu, character properties of
'Phags-pa letters are very straight forward. I am confident that the UTC is able
to make an informed decision as to the validity of the proposed character
propoerties given in Table 1 of the proposal.

> Possible collation problems for Tibetan, Mongolian and maybe
> Uighur should probably be considered before character properties
> are finalised.

The only collation problems I can envisage are to do with the candrabindu.

> 2
> Is there a real need to encode:-
> 
> a.
> ABA4 (FA) as a separate character  rather than using HA + WA? -
> It may be difficult for someone entering texts to disambiguate
> the two and therefore if FA is encoded separate character you
> could end up with "FA" being entered as either FA _or_ as  HA +
> WA (which complicates string matching particularly if FA has no
> official decomposition to HA + WA). Do  any combinations
> actually occur which would be ambiguous if you used  the
> characters HA+WA instead of a character FA? If  there are no
> such instances then there is probably no need for a separate
> character FA. If FA is encoded there probably needs to be a
> strong note that it is *only* to be used for  Chinese and Old
> Uighur FA and never for Tibetan HWA.

This is a very good question, that I was hoping someone would ask.

There is some disagreement between authorities on the transcription of the
letter FA in the proposal. Nicholas Poppe in his influential "The Mongolian
Monuments in HP'AGS-PA Script" (1957) transcribes it as "f", whereas Professor
Junast, who is the leading authority on the 'Phags-pa script in China
transcribes it as "hu" (with an inverted breve below the "u"), which is
identical to his transcription of the letter HWA, which is extremely unfortunate
as "hw" and "f" do occur in the same positions in Chinese 'Phags-pa texts (e.g.
"hua" [flower] and "fa" [raft]).

There are two reasons why I strongly believe that FA should be encoded
separately :

A. The earliest descriptions of the 'Phags-pa script (e.g. "Fashu Kao" [1334]
and "Shushi Huiyao" [1376]) list forty-one letters, one of which is the letter
FA (see Illustration 1 of the proposal). This indicates that the earliest user
community considered the letter FA to be a distinct letter in its own right.

B. Although the letter FA superficially resembles the letter HA with a subjoined
letter WA (wa-zur), in Yuan dynasty Chinese 'Phags-pa texts such as "Baijiaxing
Mengguwen" [The 'Phags-pa version of the "Hundred Chinese Surnames"] and "Menggu
Ziyun" [Rhyming dictionary of Chinese] the letter FA and the compound letter HWA
are clearly differentiated : in the letter FA the upper part of the letter
resembling a letter HA with no tail kink joins smoothly onto the lower part of
the letter resembling a subjoined letter WA (as shown in Example 11 "fang" of
Table 3 in the proposal); whereas in the letter HWA there is a kink in the tail
of the letter HA before it joins onto the subjoined letter WA (as shown in
Example 4 "hwa" of Table 3 in the proposal). Scanned images of the relevant
letters showing their differences are provided on my "Description of the
'Phags-pa Script" page on my BabelStone1357 web site. In short, the 'Phags-pa
letter FA derives from HA plus wa-zur, but is distinct from the 'Phags-pa letter
combination HA plus subjoined-WA.

It may inded be difficult for someone who does not know the language of the text
that they are transcribing to disambiguate HWA and FA. It has proven notoriously
difficult for some people to disambiguate U+017F [LATIN SMALL LETTER LONG S] and
U+0066 [LATIN SMALL LETTER F], but that does not mean to say that we should
unify the two letters. Unicode encodes characters, it is up to the user to be
able to use the characters appropriately. In the case of Chinese 'Phags-pa texts
where HWA and FA may both occur, the context (and the fact that most Chinese
'Phags-pa texts are bi-script) makes clear what letter is meant even where the
actual glyphs used may be indistinguishable (of course, for a computer font the
glyphs for FA and HWA+Subjoined-WA should be clearly distinguished).

Rather than worrying about the fact that unqualified people may wrongly
transcribing a text, we should rather be concerned that Chinese words spelled
and pronounced differently (e.g. "hwan" and "fan") are in fact encoded
differently. It would certainly complicate string matching and collation if we
decided to represent these two 'Phags-pa syllables with exactly the same Unicode
characters - the 14th century 'Phags-pa rhyming dictionary of Chinese "Menggu
Ziyun" certainly treats "hwan" and "fan" as different spellings, it would be odd
if Unicode did not.

> b.
> ABA9 (SUPERFIXED LETTER RA)? If AB9C (LETTER RA) always
> transforms into ABA9 when immediately followed by another
> consonant then ABA9 is simply a context dependant glyph variant
> of AB9C and should not be encoded as a separate character.

> c.
> ABA6 (SUBJOINED LETTER WA), ABA7 (SUBJOINED LETTER YA) and ABA8
> (SUBJOINED LETTER RA)? Or are  these characters really only
> context dependant glyph variants of AB97, AB9B and AB9A?

I discuss the reason for separately encoding Subjoined RA, YA and WA, as well as
Superfixed Letter RA, is in some detail in Section 6 of the proposal.

It is very late at night, and I have been answering your questions in reverse
order, so at this stage I will simply quote the relevant text, and if you have
any further queries on the matter, please feel free to ask.

<quote>
It is proposed to encode subjoined forms of the letters WA, YA and RA, and a
superfixed form of the letter RA, in
addition to (and separately from) the ordinary letters WA, YA and RA. The reason
why these positional forms of the
letters WA, YA and RA must be encoded separately is that without an explicit
vowel "a" it would be impossible to
distinguish, and hence correctly render, normal and subjoined/superfixed forms
of the letters in a syllable with an
inherent "a" vowel. For example, the Phags-pa spelling of the Chinese word hai
"sea" is hay <HA, YA>,
whereas the Phags-pa spelling of the Chinese word xi� "summer" is hya <HA,
SUBJOINED-YA>. With no
explicit vowel, the only way to tell whether the second letter in each Phags-pa
syllable is the normal form of the
letter YA or the graphically distinct subjoined form of the letter YA is to
encode the two forms of the letter YA
separately. The same applies for the normal and graphically distinct subjoined
forms of the letters WA and RA.
Likewise, it is necessary to separately encode the graphically distinct
superfixed form of the letter RA that is found
before the letters KA, GA, NGA, JA, TA, DA, NA, BA, MA, TSA and DZA when writing
Tibetan (before the letter
NYA only, the normal form of the letter RA is used), as otherwise it would be
impossible to distinguish, and hence
correctly render, Tibetan words written in the Phags-pa script such as rnga
"drum" and rang "self". The
important thing here is to provide a mechanism for determining which graphic
form of the letter RA to render, not
necessarily to distinguish which is the base consonant. Thus it is not necessary
to separately encode superfixed
forms of the letters LA and SA that are also used in writing Tibetan, as the
normal and superfixed forms of the
letters LA and SA are identical. In fact, in the case of words with a superfixed
letter LA or SA, the base consonant
is indicated in Phags-pa spelling by suffixing the letter -A when there is no
explicit vowel (e.g. sam for Chinese
"three", but sm-a for Sanskrit "sma").
</quote>

> 3.
> Is there a need to encode a  'Phags-pa character equivalent to
> Tibetan U+0F0B ? Tibetan text written in 'Phags-pa script
> without some sort of TSHEG might be hard to disambiguate.  TSHEG
> seems to be represented in  'Phags-pa by a space - but a
> character with different properties than a normal space may be
> required here. Effectively a kind of thin space character (or
> ZWNJ) also seems to be used where a new stack would start within
> a Tibetan tsheg-bar without such an intervening character the
> apparent shaping rules for this script would seem to be that
> glyphs for consonants join with each other.

There is no 'Phags-pa equivalent of the tsheg mark [U+0F0B]. For all languages
written using the 'Phags-pa script, including Tibetan, the letters of a syllable
unit are ligated together, but there is whitespace between syllable units (as
discussed in Section 5 of the proposal). Thus for Tibetan 'Phags-pa text, a
tsheg-bar unit corresponds to a ligatured cluster of 'Phags-pa letters. As
whitespace demarcates the boundaries of these syllable units the tsheg is not
required, and not found. As line breaks occur between 'Phags-pa syllable
clusters within a polysyllabic word in Mongolian, Sanskrit, Tibetan, etc., the
space is best represented as a normal space (although one would be free to use a
non-breaking space if one wanted to inhibit natural line breaks).

I don't quite follow the second half of your question. Letters naturally ligate
together in the same way as they do in Mongolian, Ogham or Arabic. There is no
need to apply any special control character to produce this ligation. If you are
referring to complex tsheg-bars such as "padme" which comprise more than one
consonant-vowel stack, then the answer is that written in 'Phags-pa script the
tsheg-bar would be broken up into two syllable units (i.e. "pad me" in 'Phags-pa
script).

> 4.
> Are the characters proposed at ABAA through ABAE all vowels? If
> so,  shouldn't the names of these characters reflect this as
> they do in the equivalent Tibetan characters? e.g. I'd suggest
> U+ABAA PHAGS-PA VOWEL I rather than U+ABAA PHAGS-PA LETTER I.

As confirmed in Section B.5.a of the proposal, the proposed names are in
accordance with the Annex L ("Character Naming Guidelines") of ISO/IEC 10646-1:
2000. Rule 6 of Annex L states :

<quote>
The names are constructed from an appropriate set of the
applicable terms of the following grid and ordered in the
sequence of this grid.Exceptions are specified in Rule 11.
The words WITH and AND may be included for additional
clarity when needed.
1 Script 5 Attribute
2 Case 6 Designation
3 Type 7 Mark(s)
4 Language 8 Qualifier 

Examples of such terms:
Script	Latin,Cyrillic,Arabic
Case	capital,small
Type	letter,ligature,digit
Language	Ukrainian
Attribute	final,sharp,subscript,vulgar
Designation	customary name,name of letter
Mark	acute,ogonek,ring above,diaeresis
Qualifier	sign,symbol
Rule 9
</quote>

Thus "letter" is an appropriate "type"; "vowel" is not. Hence we have "MONGOLIAN
LETTER I" [U+1822] etc.

The Tibetan vowels have the name "TIBETAN VOWELS SIGN I" etc. as they are not
independant letters. However, the 'Phags-pa vowels are independant letters which
may occur in isolation from a base consonant, and so the term "letter" is
appropriate for them.

> And shouldn't U+ABAF PHAGS-PA LETTER CANDRABINDU be  U+ABAF
> PHAGS-PA SIGN CANDRABINDU

As discussed in answer to Question 1 above, the encoding model treats the
candrabindu as a normal, spacing letter, not a sign.

> 5.
> Is there effectively a 'Phags-pa character equivalent to Tibetan
> character 0F71 rather than 0F60[/0FB0]  (i.e. functioning as a
> vowel rather than a consonant)? If only one character is encoded
> it should be noted that it could be equivalent to either Tibetan
> 0F60[/0FB0] or 0F71 dependant on context.

The 'Phags-pa letter -A functions as both a consonant and a vowel lengthener
depending upon context.

This fact could certainly be noted in the relevant chapter of the Unicode
Standard when it is written (if the authors of the Standard believe that it is
appropriate).

Please note that the proposal does not go into the details of how the 'Phags-pa
script is used to spell Chinese, Mongolian, Tibetan, Sanskrit or Uighur, as I
considered that to be somewhat outside the scope of an encoding proposal. Some
examples of 'Phags-pa words in these languages are given in Section 5 of the
proposal, which gives some idea of how 'Phags-pa words are spelled (cf. Table 3
Example 9 "-an" which shows the letter -A used as a consonant; and Examples 14
"q-an" and 19 "'-a kad ddha ya" which show the letter -A used as a vowel
lengthener). Examples of the usage of every 'Phags-pa letter in writing Chinese
and Mongolian are given in my "Description of the 'Phags-pa Script" page on my
BabelStone1357 web site.

> 6.
> > "the Tibetan use of the Phags-pa script seems now to have
> > virtually died out"

> From what I've seen modern Tibetan usage seems to be largely
> decorative - but saying it has "virtually died out" is probably
> a little too strong. Many Tibetans are familiar with the script
> and  it continues to be used occasionally for decorative
> purposes.
> I have seen 'Phags-pa script  used to write text in murals and
> other decorations of a number of recently constructed Tibetan
> monasteries in India and Nepal. It is also found on the title
> pages of some modern xylographs (e.g. the edition of   Mi la'i
> rnam mgur published at Apo Rinpoche's monastery in Manali where
> it is used to write "MAHA MUDRA" within the sides of the title
> page border) and machine printed texts. It is also used in some
> contemporary Tibetan seals.   The  'Phags-pa script was probably
> only ever used for lengthy texts for a short period - ever since
> then it's use has probably been largely decorative and for
> seals. - Since this kind of use continues it is probably as
> "alive" as it has been at anytime since it stopped being used
> for writing long texts in Mongolian.

I'm pleased to hear that the 'Phags-pa script is still alive in Tibet. As
mentioned in the proposal, I am aware that it has been used as a decorative seal
 for architectural inscriptions, book titles and seals, but I have not
personally seen any examples that date from later than the early
twentieth-century. As you are able to confirm its continued use as a decorative
script for these purposes, I will gladly moderate the phrase "virtually died
out".

> BTW There is a whole chapter (#16) on this script entitled "hor
> yig gsar ba mdzad pa'i mdzad pa'i skor mdo tsam brjod pa"
> [p134-146] in the book "bod yig 'bri tshul mthong ba skun smon"
> by dpa'-ris sangs-rgyas published by the Minorities Publishing
> House in China in 1997 (ISBN 7-02628-6).

Thanks for the reference - I will keep an eye open for the book.

> 7.
> Has anyone run this proposal past the Chinese and Mongolian
> national bodies represented on WG2? They probably have contacts
> with experts in their countries who should look at this and  I
> think it is prudent to get them involved as early as possible -
> otherwise these national bodies are likely to request time for
> consulting their experts before the proposal goes forward.

Yes. See answer to Query 8.

> 8.
> There may be people at places like the British Library, the LOC
> and libraries in China and Mongolia who are much more expert on
> this script and have access to many more examples than anyone on
> this list, the UTC or WG2. In the absence of real expertise and
> with only a few examples I'm apprehensive that that there is a
> danger assumptions may be made about the script which later turn
> out to be wrong and difficult to correct. With little used
> scripts like 'Phags-pa I'd be happier if experts on the script
> were more actively sought out and their comments solicited -
> rather than relying on them to somehow otherwise hear about
> proposals and submit their comments.

Without wishing to appear immodest, I think that I have as much expertise in the
'Phags-pa script as almost anyone else in the world, and certainly there are few
other "real experts" that have the same degree of exposure to all three of the
major languages that 'Phags-pa is used to represent (Chinese, Mongolian and
Tibetan) and the same level of understanding of the principles of Unicode that I
have.

The reason why I have written this proposal is that I am actively engaged in
academic research into the 'Phags-pa script, and need to have it encoded in
order to facilitate the publication of the results of my research. Unlike some,
it is not a dilattentish interest in scripts that makes me want to write
proposals for scripts about which I know very little in the way that some people
collect stamps. Like other Unicode proposals that I am working on, my personal
need to use unencoded characters has driven me to write a proposal myself rather
than wait forever for someone with "real expertise" to do so on my behalf.

The proposal is based on months and months of intensive research utilising
resources at SOAS and the British Library (for example I have personally
examined the only extant manuscript copy of the 14th century 'Phags-pa Chinese
rhyming dictionary "Menggu Ziyun" rather than relying on facsimile reprints; and
have meticulously gone through every issue of the Chinese journals "Wen Wu"
[Cultural Relics] and "Minzu Yuwen" [Journal of the Languages and Scripts of
Minority Nationalities] in order to find as many examples of 'Phags-pa script
usage as possible, and to read all articles that have been written on the script
by experts such as Professor Junast). In short, my proposal is not merely an
abstract from the relevant pages of the Ladybird Book of World Scripts (as you
may think was the case from some of the Unicode proposals I've looked at).

I might add that I have been in contact with people in the British Library (Dr.
Susan Whitfield, head of the International Dunhuang Project), as well as experts
in Japan (Dr. Dai Matsui) and the PRC (Professor Quejingzhabu of the University
of Inner Mongolia, author of the authoritative work on Mongolian encoding
"Mengguwen Bianma", and erstwhile WG2 delegate). Indeed I have recently been
engaged in detailed correspondence with Professor Quejingzhabu about the
proposal, who has stated :

<quote>
I would also like to inform you and your colleagues that when those of us in
positions of importance in the relevant organizations, both within the Inner
Mongolia Autonomous Region and at the national level, heard about your "Proposal
to Encode the Phags-pa Script" we regarded it with great importance. I have
already suggested to the appropriate organizations that a working group be set
up expressly to examine your proposal, and to put forward the <b>official
Chinese position</b> on the proposal. I think that the proposal will gain
official approval.
</quote>

As to the perceived lack of examples in the proposal, out of the dozens of
monumental inscriptions, printed texts, manuscript documents, seals, coins etc.
that I have access to, I carefully decided to include five textual examples that
together illustrated use of all of the proposed characters (plus two other
illustrations that show the 'Phags-pa letters used in two important early
sources). Any more examples would have been unnecessary, and would have
detracted from the point that each of the five provided examples was making (in
each case I explain exactly why the particular example has been included -
they're not just there because that's all I've got). As I have stated in a
previous email, the extensive 'Phags-pa pages on my BabelStone1357 web site
provide many more examples, and also give links to a wide range of 'Phags-pa
artifacts that are available on the web. It would have been easy enough for me
to have added many more examples of 'Phags-pa usage in the proposal, but then
what would this have served other than make an already lengthy proposal even
longer ?

> 9.
> The 'Phags-pa script is clearly based on Tibetan script and it
> is sometimes used to write Tibetan and so round trip mapping
> between 'Phags-pa and Tibetan characters seems desirable - I
> think this should be looked at and dealt with thoroughly within
> the proposal. Also is there a need for round trip mapping
> between 'Phags-pa and Mongolian and/or  'Phags-pa and CJK
> characters? If so, this should be considered now since changes
> to the proposed encoding may be needed to make this feasible.

A. Tibetan
At a simplistic level, there is round-trip mapping between 'Phags-pa characters
(consonants, vowels and punctuation marks) and Tibetan for a 'Phags-pa Tibetan
text. However, due to the different textual layout (all letters corresponding to
a Tibetan tsheg-bar are written sequentially as pronounced in a single vertical
"syllable unit" [with the exception of the candrabindu]), there are some
differences between 'Phags-pa spelling of Tibetan and Tibetan spelling of
Tibetan. For example, in the case of Tibetan words with a superfixed letter LA
or SA, the base consonant is indicated in Phags-pa spelling by suffixing the
letter -A when there is no explicit vowel (e.g. "sam" for Chinese "three", but
"sm-a" for Sanskrit "sma"). Round-trip mapping between syllables such as Tibetan
"sma" and 'Phags-pa "sm-a" would be difficult.

Also, comparing the Tibetan text with the 'Phags-pa text of the same Buddhist
text on the famous multi-script inscriptions at Juyongguan north-west of Beijing
(which I have done as a matter of course), it is clear that some Sanskrit words
are simply spelled differently in the Tibetan text compared with the 'Phags-pa
text. The Tibetan reversed letter SHA [U+0F65], for example, always corresponds
to an ordinary 'Phags-pa letter SHA, whether in isolation or as part of the
compound letter KSHA. It would be impossible without linguistic knowledge (which
most mapping tables don't have) to know whether 'Phags-pa letter SHA maps to
U+0F64 or U+0F65.

B. Mongolian
Mongolian is much more problematic. As can be seen from the examples of
Mongolian 'Phags-pa words given in Section 5 of the proposal, many Mongolian
words are spelled differently in the Uighur-derived Mongolian script compared
with their spelling in the 'Phags-pa script, and there is not necessarily a
one-to-one correspondence. For example, the classical Mongolian "k" corresponds
to the 'Phags-pa letter KHA in all native Mongolian words with the single
exception of the common Mongolian word "yeke" (meaning "great, big") which is
spelled with the 'Phags-pa letter KA. (On the other hand the 'Phags-pa letter KA
in non-native Mongolian words generally corresponds to classical Mongolian "g").
Another major obstacle to round-trip mapping is that 'Phags-pa has two letter
Es, both of which correspond to the same letter in the Uighur-derived Mongolian
script. There are even greater complications if we were to look at how 'Phags-pa
and Uighur-derived Mongolian script deal with null initials ('Phags-pa letters A
and -A), but I will not go into that here. In short you would need a dictionary
to achieve round-trip mapping between 'Phags-pa script and Unicode Mongolian.

C. Chinese
You can't round trip map between CJK ideographs and words spelled in the
'Phags-pa script because there is a many-to-many relationship between the two.
You can't even round-trip map between 'Phags-pa and pinyin as Chinese 'Phags-pa
texts represent an earlier form of the Mandarin language than that represented
by pinyin.

In summary, I do not believe that full round-trip mapping between 'Phags-pa and
Tibetan and/or Mongolian is either achievable or particularly desirable. If you
were to twist the encoding of 'Phags-pa to more closely fit Unicode Tibetan
(even if that were possible), then you would simply make Mongolian and/or
Chinese 'Phags-pa texts unencodable !

> 10.
> In the notes column shouldn't the 'Phags-pa consonants map to
> both the equivalent Tibetan headline consonants (0F40-0F68) and
> the equivalent Tibetan subjoined consonants (0F90-0FB8)? Which
> Tibetan letter a 'Phags-pa letter actually mapped to would have
> to be dependant on context.

True. But in isolation the 'Phags-pa letters map to the Tibetan base consonants.

These notes are intended to be the sort of notes given in the Unicode code
charts, and are provided for information only. The notes preceded by an arrow
[U+2192] are "cross references", which indicate "a related character of
interest, but without indicating the nature of the relation" (TUS 4.0 ch.16).
(BTW, in the PDF of the final version of the proposal the arrow is unfortunately
invisible.)

If the UTC thinks that cross-referencing the 'Phags-pa consonants to both the
base form and the subjoined form of the corresponding Tibetan consonants is
useful, then I've got no objection to that.

> 11.
> If this proposal is accepted will any additions have to be made
> to the notes for the corresponding characters in the Tibetan
> block of  TUS?

No. Why should there be ?

Runic can be used to write Old English. Do we need a note on U+00E6 [LATIN SMALL
LETTER AE] stating that it corresponds to U+16AB [RUNIC LETTER AESC] ?

Mongolian can be written in Cyrillic script. Do we need notes about Mongolian
usage in the section of the Unicode Standard dealing with Cyrillic ?

> 12.
> The individual named 'Phags-pa Lama in the proposal is more
> properly referred to as 'Phags-pa bLo-gros rGyal-mtshan
> [1235-1280] (or 'gro- mgon 'phags-pa blo-gros rgyal-mtshan)  -
> there are several other important lamas with the name 'Phags-pa.
>  short biographical sketch [in Tibetan] of 'gro-mgon 'phags-pa
> blo-gros rgyal-mtshan can be found on pages 351 to 353 of
> gangs-can mkhas-grub rim-byon ming-mdzod (a Tibetan biographical
> dictionary) [ISBN 5421-0200-1].

It's an encoding proposal, not a history lesson !

I give a short biography of the 'Phags-pa Lama (with his proper name in both
Tibetan and Mongolian script) in my "Overview of the 'Phags-pa Script" page on
my BabelStone1357 web site.

> 13.
> When this script gets encoded there may need to be a note in the standard
> stressing that PHAGS-PA LETTER QA and  PHAGS-PA LETTER GGA are
> *not* variants of  PHAGS-PA LETTER KHA and PHAGS-PA LETTER GA as
> I can easily see people making this mistake when trying to enter
> Tibetan words in this script.

Why ?

There are many scripts which have similar letters, and Unicode does not insert
similar warnings. What about the Unicode Runic block that includes a mixture of
runic letters used in different futharks (i.e. different writing systems). There
are quite a few examples of runes that have very similar forms, but have
different phonetic values, and are used in different futharks. Certainly anyone
who does not know which runic system each runic letter is used in and what
phonetic value it represents could easily use the wrong rune. Does Unicode warn
us about this ? No. Could someone inadvertantly type PHAGS-PA LETTER QA for
PHAGS-PA LETTER KHA ? Yes, but only if they did not know the 'Phags-pa script
(mind you is they knew Tibetan they would realise which letter corresponded to
Tibetan KA and GA from its position in the code charts). Unicode encodes
characters, users use them - and one has to assume a certain level of competance
in the script on the part of the user. For example, I know nothing about the
Arabic script; but if I wanted to enter some Arabic text in Unicode should I
look to the Unicode Standard for advise on how to write Arabic, or should I take
an evening course in Arabic at my local college ?

From: "Andrew C. West" <andrewcwest@alumni.princeton.edu>
Date: 2003-08-19 10:12:56 -0700
To: tibex@unicode.org
Subject: [tibex] Re: hPhags-pa Proposal
X-Sent: 19 Aug 2003 14:12:49 GMT
X-Sent-From: andrewcwest@alumni.princeton.edu
Sender: tibex-bounce@unicode.org

On Tue, 19 Aug 2003 01:55:11 +0100, "Christopher John Fynn" wrote:

> I agree, it need not be one-to-one - but if e.g. PHAGS-PA LETTER
> FA is encoded it would probably have to be represented in
> Tibetan by U+0F67 + U+0FAD (or maybe U+0F67 + U+0FA5 which is
> the way "FA" is currently written in Tibetan documents published
> in China). This would make round trip mapping between PHAGS-PA
> and Tibetan difficult.

PHAGS-PA LETTER FA is not used to write Tibetan, and the PHAGS-PA script is not
used to write modern Tibetan. There really is no mapping problem to consider
here. If for some bizarre reason you did want to convert modern Tibetan with
either <0F67, 0FAD> or <0F67, 0FA5>, then you would probably map to PHAGS-PA
LETTER FA. As you state you could not then roundtrip back to the original
encoding. But then as I have been at pains to emphasise in my repsonse to your
original questions, there is no convenient one-to-one mapping between 'Phags-pa
and Chinese/Mongolian/Tibetan anyway. Many languages can be written in more than
one script, and in many cases there is no one-to-one relationship, and hence
roundtrip mapping is impossible. This really is not an encoding issue.

> I think a TSHEG like space character to mark word boundaries in
> Tibetan and another kind of thin space or zwnj like character to
> indicate the equiv. of separation of stacks within a word will
> be
> needed. Although already encoded characters might be used for
> these, it may be cleaner to add two new characters in this block
> specifically for these purposes.

You cannot encode a character that does not exist simply for compatability with
another script !

As I have stated in my previous email, syllable division of Tibetan 'Phags-pa
text is not necessarily the same as in Tibetan text. 'Phags-pa syllable units
are separated by breaking whitespace - there really is absolutely no reason to
encode this whitespace with any other character than U+0020.

As far as I am aware there is no "separation of stacks within a word" when
written in the 'Phags-pa script, so there is no need for a ZWNJ or other control
character within a syllable unit. If you have seen an example of Tibetan
'Phags-pa text where you think such an approach is justified, please show me an
example.

> Also I think the queries I
> raised about  the characters proposed  for ABA4, ABA6, ABA7,
> ABA8, ABA9 need to be resolved.

The reasons for encoding these characters are already given in the proposal.

ABA4 (FA) should be encoded for the reasons given in my previous email (i.e. FA
is graphically distinct from H plus Subjoined-WA).

ABA6..ABA9 (Subjoined and Superfixed letters) must be encoded for the reasons
given in the proposal that I quoted in my previous email. Namely, in words with
only an inherant vowel it would be impossible to differentiate a base consonant
from a graphically-distinct modifier consonant unless the subjoined/superfixed
forms are encoded separately.

Cf. Chinese "hay" [sea] and "hya" [summer]. If there was only one letter YA,
then both words would be encoded <HA, YA>, yet the two words are not only
pronounced differently, but are written differently.

Cf. Tibetan "rang" [self] and "rnga" [drum]. If there was only one letter RA,
then both words would be encoded <RA, NGA>, yet the two words are not only
pronounced differently, but are written differently.

The only other solution would be to encode an Implicit Vowel ... but I really do
not think that any of us would want to follow that path !

> Aside from these issues it is pretty clear how Tibetan is
> written in this script. However  I think it is also essential to
> proactively get people who are familiar with Mongolian, Uighur,
> Chinese & so on written in this script to look over the
> proposal. There may be specific issues or conventions with this
> script and  those languages which we can't spot.

I am familiar with both Mongolian and Chinese.

Dr. Dai Matsui (who has seen the proposal and made no adverse comments) has
researched 'Phags-pa seals on Uighur documents and is familiar with Old Uighur.
I might add that 'Phags-pa Uighur texts are limited to two or three words that
are found on some seals attached to documents written in the Old Uighur script
(I have of course also examined images of these seals myself in the course of
preparing the proposal). The most commonly occuring 'Phags-pa Uighur word is
given as an example in Section 5 of the proposal.

> Three  other things...
> 1. In some other examples I've seen there seem to be two kinds
> of  PHAGS-PA  HEAD MARK - One which is the glyph proposed for
> U+ABB0 and another with a single loop. The first is probably
> equivalent to Tibetan U+0F04 plus U+0F05 and the second to
> U+0F04 (or maybe the first is equiv. to U+0F04 plus U+0F05 plus
> U+0F05 and the second to U+0F04 plus U+0F05).   It could be
> argued that these are two variants of the same character as they
> perform the same function - but maybe it is safer to encode them
> as two separate characters.

I have actively looked for examples of head marks in Tibetan 'Phags-pa texts,
but so far the only ones I have found are the double-looped variety. Cf. Example
4 in my proposal; and the seal of the 13th Dalai Lama. N.B. in the latter
example the double-looped head mark corresponds to a single U+0F04 in the
Tibetan text ... so how would you roundtrip map that if there were two 'Phags-pa
head marks ?

If you could provide an image of an example of a single-looped 'Phags-pa Head
Mark, then I agree that it may be worthwhile encoding it separately (although as
you say, they could be considered to be simply glyph variants).

However, for Mongolian only a single head mark U+1800 [MONGOLIAN BIRGA] is
encoded, and the four other forms of the birga (including single-looped,
double-looped and triple-looped forms equivalent to <0F04>, <0F04, 0F05> and
<0F04, 0F05, 0F05> respectively) are intended to be represented as Standardized
Variants (although at present these are not yet officially defined). For
compatability the Mongolian experts may prefer not to encode two separate
'Phags-pa head marks.

For those who are wondering why we cannot simply define two 'Phags-pa letters
corresponding to U+0F04 and U+0F05, and represent single-looped, double-looped
and triple-looped forms of the head mark by combining these two characters as
appropriate, the problem is that as 'Phags-pa is a vertical script the
hypothetical equivalents to U+0F04 and U+0F05 would ligate vertically, whereas
they need to be ligated horizontally within the vertical line of text.

> Especially since we have both U+ABB1
> PHAGS-PA MARK SHAD and U+ABB2 PHAGS-PA MARK DOUBLE SHAD  when
> U+ABB2 could have been represented by U+ABB1 plus U+ABB1 (I'm
> *not* arguing that it should be.).

I was just waiting for someone to raise the Double Shad issue. I believe that
the double shad should be encoded separately for two reasons :

A. For compatability with U+0965 [DEVANAGARI DOUBLE DANDA] and U+0F0E [TIBETAN
MARK NYIS SHAD].

B. Because the user community seems to view it as a separate character. Cf.
Example 4 in my proposal, where the 'Phags-pa Double Shad corresponds to Tibetan
shad marks on the same line, not two shad marks on separate lines as would be
the case if it was conceived as merely two shad marks in succession.

>  2.  Do isolated vowels (not attached to consonants) ever occur?
> If not, shouldn't the vowels be combining characters as in the
> Tibetan script block?

Yes, in the 'Phags-pa script vowels may occur in isolation, as discussed in
Section 5 of the proposal (see Table 3 Example 1 for the isolate letter "U" that
represents Chinese Wu); although when writing Tibetan in 'Phags-pa script an
initial vowel is attached to PHAGS-PA LETTER A (in Mongolian and Chinese this is
not the case).

> 3. Finally, where there are multiple variants of  a character
> within a single style of 'Phags-pa script are we only going to
> allow for one variant within a font or should we expand the
> proposal to include specific <character> + <variant selector
> character> pairs to indicate these?

No. These are simple glyph variants, and should not be encoded using variation
selectors or otherwise.

Table 2 of the proposal is informative only, and merely shows some of the glyph
forms of 'Phags-pa letters in different script styles.
 
> Generally I think the proposal Andrew put together is great and
> he's obviously put a tremendous amount of research and work into
> it. Sorry I'm late in the day with all these comments &
> questions on it - but I was on holiday in India with my family
> for the past six or seven weeks.

Thank you. I hope that my responses to your questions are making you appreciate
that I have put lot of thought into the proposal, and have attempted to cover
all the bases.

Regards,

Andrew