From: "Christopher John Fynn" Date: 2003-08-16 23:36:45 -0700 To: , "Rick McGowan" , "Andrew C. West" Subject: Re: [tibex] Re: hPhags-pa Proposal Cc: "David Germano" Reply-To: "Christopher John Fynn" ----- Original Message ----- From: "Rick McGowan" To: Sent: Wednesday, July 09, 2003 10:43 PM Subject: [tibex] Re: hPhags-pa Proposal > Does anyone have any further comments on the 'Phags-pa proposal? > > http://uk.geocities.com/BabelStone1357/hPhags-pa/N2352.html Yes - some comments and questions... 1. In which order are 'Phags-pa characters to be entered (and stored)? If characters are entered / stored in the order they would normally be written some vowels characters would occur prior to the consonant (or combination) which they effect and others would be entered after the consonant(s). I think any impact on both collation and on complexity of rendering needs to be carefully considered before properties/weights are assigned to these characters. Has this been done? Also any effects Unicode "normalisation" processes will have on character ordering and the effects of this need to be considered carefully. Since it seems difficult or impossible to get character properties changed once assigned they need to be right in the first place - even if it means a delay in the proposal going forward Possible collation problems for Tibetan, Mongolian and maybe Uighur should probably be considered before character properties are finalised. 2 Is there a real need to encode:- a. ABA4 (FA) as a separate character rather than using HA + WA? - It may be difficult for someone entering texts to disambiguate the two and therefore if FA is encoded separate character you could end up with "FA" being entered as either FA _or_ as HA + WA (which complicates string matching particularly if FA has no official decomposition to HA + WA). Do any combinations actually occur which would be ambiguous if you used the characters HA+WA instead of a character FA? If there are no such instances then there is probably no need for a separate character FA. If FA is encoded there probably needs to be a strong note that it is *only* to be used for Chinese and Old Uighur FA and never for Tibetan HWA. b. ABA9 (SUPERFIXED LETTER RA)? If AB9C (LETTER RA) always transforms into ABA9 when immediately followed by another consonant then ABA9 is simply a context dependant glyph variant of AB9C and should not be encoded as a separate character. c. ABA6 (SUBJOINED LETTER WA), ABA7 (SUBJOINED LETTER YA) and ABA8 (SUBJOINED LETTER RA)? Or are these characters really only context dependant glyph variants of AB97, AB9B and AB9A? 3. Is there a need to encode a 'Phags-pa character equivalent to Tibetan U+0F0B ? Tibetan text written in 'Phags-pa script without some sort of TSHEG might be hard to disambiguate. TSHEG seems to be represented in 'Phags-pa by a space - but a character with different properties than a normal space may be required here. Effectively a kind of thin space character (or ZWNJ) also seems to be used where a new stack would start within a Tibetan tsheg-bar without such an intervening character the apparent shaping rules for this script would seem to be that glyphs for consonants join with each other. 4. Are the characters proposed at ABAA through ABAE all vowels? If so, shouldn't the names of these characters reflect this as they do in the equivalent Tibetan characters? e.g. I'd suggest U+ABAA PHAGS-PA VOWEL I rather than U+ABAA PHAGS-PA LETTER I. And shouldn't U+ABAF PHAGS-PA LETTER CANDRABINDU be U+ABAF PHAGS-PA SIGN CANDRABINDU 5. Is there effectively a 'Phags-pa character equivalent to Tibetan character 0F71 rather than 0F60[/0FB0] (i.e. functioning as a vowel rather than a consonant)? If only one character is encoded it should be noted that it could be equivalent to either Tibetan 0F60[/0FB0] or 0F71 dependant on context. 6. > "the Tibetan use of the Phags-pa script seems now to have virtually died out" >From what I've seen modern Tibetan usage seems to be largely decorative - but saying it has "virtually died out" is probably a little too strong. Many Tibetans are familiar with the script and it continues to be used occasionally for decorative purposes. I have seen 'Phags-pa script used to write text in murals and other decorations of a number of recently constructed Tibetan monasteries in India and Nepal. It is also found on the title pages of some modern xylographs (e.g. the edition of Mi la'i rnam mgur published at Apo Rinpoche's monastery in Manali where it is used to write "MAHA MUDRA" within the sides of the title page border) and machine printed texts. It is also used in some contemporary Tibetan seals. The 'Phags-pa script was probably only ever used for lengthy texts for a short period - ever since then it's use has probably been largely decorative and for seals. - Since this kind of use continues it is probably as "alive" as it has been at anytime since it stopped being used for writing long texts in Mongolian. BTW There is a whole chapter (#16) on this script entitled "hor yig gsar ba mdzad pa'i mdzad pa'i skor mdo tsam brjod pa" [p134-146] in the book "bod yig 'bri tshul mthong ba skun smon" by dpa'-ris sangs-rgyas published by the Minorities Publishing House in China in 1997 (ISBN 7-02628-6). 7. Has anyone run this proposal past the Chinese and Mongolian national bodies represented on WG2? They probably have contacts with experts in their countries who should look at this and I think it is prudent to get them involved as early as possible - otherwise these national bodies are likely to request time for consulting their experts before the proposal goes forward. 8. There may be people at places like the British Library, the LOC and libraries in China and Mongolia who are much more expert on this script and have access to many more examples than anyone on this list, the UTC or WG2. In the absence of real expertise and with only a few examples I'm apprehensive that that there is a danger assumptions may be made about the script which later turn out to be wrong and difficult to correct. With little used scripts like 'Phags-pa I'd be happier if experts on the script were more actively sought out and their comments solicited - rather than relying on them to somehow otherwise hear about proposals and submit their comments. 9. The 'Phags-pa script is clearly based on Tibetan script and it is sometimes used to write Tibetan and so round trip mapping between 'Phags-pa and Tibetan characters seems desirable - I think this should be looked at and dealt with thoroughly within the proposal. Also is there a need for round trip mapping between 'Phags-pa and Mongolian and/or 'Phags-pa and CJK characters? If so, this should be considered now since changes to the proposed encoding may be needed to make this feasible. 10. In the notes column shouldn't the 'Phags-pa consonants map to both the equivalent Tibetan headline consonants (0F40-0F68) and the equivalent Tibetan subjoined consonants (0F90-0FB8)? Which Tibetan letter a 'Phags-pa letter actually mapped to would have to be dependant on context. 11. If this proposal is accepted will any additions have to be made to the notes for the corresponding characters in the Tibetan block of TUS? 12. The individual named 'Phags-pa Lama in the proposal is more properly referred to as 'Phags-pa bLo-gros rGyal-mtshan [1235-1280] (or 'gro- mgon 'phags-pa blo-gros rgyal-mtshan) - there are several other important lamas with the name 'Phags-pa. A short biographical sketch [in Tibetan] of 'gro-mgon 'phags-pa blo-gros rgyal-mtshan can be found on pages 351 to 353 of gangs-can mkhas-grub rim-byon ming-mdzod (a Tibetan biographical dictionary) [ISBN 5421-0200-1]. 13. When this script gets encoded there may need to be a note in the standard stressing that PHAGS-PA LETTER QA and PHAGS-PA LETTER GGA are *not* variants of PHAGS-PA LETTER KHA and PHAGS-PA LETTER GA as I can easily see people making this mistake when trying to enter Tibetan words in this script. - Chris From: "Andrew C. West" Date: 2003-08-19 06:07:06 -0700 To: tibex@unicode.org Subject: [tibex] Re: hPhags-pa Proposal Cc: dfg9w@virginia.edu X-Sent: 19 Aug 2003 10:07:03 GMT X-Sent-From: andrewcwest@alumni.princeton.edu Sender: tibex-bounce@unicode.org Dear Chris, Thank you for your list of queries and comments. It is a pleasure to be able to discuss the 'Phags-pa script with someone as knowledgeable as yourself. Whilst it is true that some of your questions could have been asnswered by reading the latest the version of the proposal, I am always happy to clarify what I have written in the proposal. Incidentally, a PDF version of the final version of the proposal is available from me on request, as the on-line HTML version is sometimes difficult to access. I append my responses to your questions below. I trust that you will find all of my responses to be satisfactory, and that they will alleviate some of the concerns about the proposal that you have expressed. Kind Regards, Andrew > 1. > In which order are 'Phags-pa characters to be entered (and > stored)? If characters are entered / stored in the order they > would normally be written some vowels characters would occur > prior to the consonant (or combination) which they effect and > others would be entered after the consonant(s). No. In the 'Phags-pa script all letters except the candrabindu are written in pronunciation order. That is to say vowels always come after the consonant that they modify (e.g. "gi" is written with the letter I *beneath* the letter GA, not above as in Tibetan). Thus for the 'Phags-pa script visual order equals logical order (except for words with a candrabindu, which I will touch upon below). This means that the encoding model for the 'Phags-pa script is extremely simple : "each letter of a syllable unit is encoded in visual order from top to bottom" (Proposal Section 6). The problematic letter is the Candrabindu, which is always written as the first letter of a 'Phags-pa syllable cluster, even though it is always logically the last letter. Thus OM is written as the 'Phags-pa letter MA preceded by the candrabindu. In the first draft of my proposal I advocated encoding strictly in logical order, so that OM would be represented in memory as , but rendered as CANDRABINDU followed by O. However, Rick McGowan and Ken Whistler suggested that this disunity between logical order in memory and rendering order on screen would make dynamic rendering, cursor movement, word select and other such operations difficult and confusing, especially for syllables longer than a simple OM (and there are some quite long 'Phags-pa syllables with a candrabindu). They therefore suggested that I change the proposed encoding model so that all letters, including the candrabindu, are treated as normal spacing letters and encoded in visual order (i.e. for OM), which I did. This change certainly makes text processing of the 'Phags-pa script much simpler. However, I admit to still having qualms about this change to the encoding model (for example, how does it affect collation ?), and would welcome feedback from others more knowledgeable than me in this area. Nevertheless, I'm not too bothered by this issue, and am happy to go along whatever the UTC in its wisdom decides is the best way to deal with the candrabindu. I might note that the person most qualified to answer questions on the practicality of encoding the candrabindu last but rendering it first is Paul Nelson, who I understand is a member of the UTC. So hopefully when the UTC discuss my 'Phags-pa proposal they will be able to come to a decision as to the suitability or otherwise of the method of dealing with the candrabindu advocated in the proposal. > I think any impact on both collation and on complexity of > rendering needs to be carefully considered before > properties/weights are assigned > to these characters. Has this been done? Also any effects > Unicode "normalisation" processes will have on character > ordering and the effects of this need to be considered > carefully. Due to the nature of the 'Phags-pa script (vowels are independant letters, and so do not need weighting), these issues are not relevant other than with regard to the candrabindu, which is discussed above. As the candrabindu sits above a string of independant letters rather than modifying a stack comprising a single base consonant and optional dependant subjoined consonants and/or dependant vowel signs (as is the case with Tibetan), it is very difficult to reorder it from one end of a syllable clsuter to the other. Therefore, it has been decided to encode the candrabindu as a normal spacing letter, encoded in visual order, with the result that Normalization does not apply to the 'Phags-pa script. > Since it seems difficult or impossible to get character > properties changed once assigned they need to be right in the > first place - even if it means a delay in the proposal going > forward I certainly agree upon the necessity of getting these right in the first place. However, with the possible exception of the candrabindu, character properties of 'Phags-pa letters are very straight forward. I am confident that the UTC is able to make an informed decision as to the validity of the proposed character propoerties given in Table 1 of the proposal. > Possible collation problems for Tibetan, Mongolian and maybe > Uighur should probably be considered before character properties > are finalised. The only collation problems I can envisage are to do with the candrabindu. > 2 > Is there a real need to encode:- > > a. > ABA4 (FA) as a separate character rather than using HA + WA? - > It may be difficult for someone entering texts to disambiguate > the two and therefore if FA is encoded separate character you > could end up with "FA" being entered as either FA _or_ as HA + > WA (which complicates string matching particularly if FA has no > official decomposition to HA + WA). Do any combinations > actually occur which would be ambiguous if you used the > characters HA+WA instead of a character FA? If there are no > such instances then there is probably no need for a separate > character FA. If FA is encoded there probably needs to be a > strong note that it is *only* to be used for Chinese and Old > Uighur FA and never for Tibetan HWA. This is a very good question, that I was hoping someone would ask. There is some disagreement between authorities on the transcription of the letter FA in the proposal. Nicholas Poppe in his influential "The Mongolian Monuments in HP'AGS-PA Script" (1957) transcribes it as "f", whereas Professor Junast, who is the leading authority on the 'Phags-pa script in China transcribes it as "hu" (with an inverted breve below the "u"), which is identical to his transcription of the letter HWA, which is extremely unfortunate as "hw" and "f" do occur in the same positions in Chinese 'Phags-pa texts (e.g. "hua" [flower] and "fa" [raft]). There are two reasons why I strongly believe that FA should be encoded separately : A. The earliest descriptions of the 'Phags-pa script (e.g. "Fashu Kao" [1334] and "Shushi Huiyao" [1376]) list forty-one letters, one of which is the letter FA (see Illustration 1 of the proposal). This indicates that the earliest user community considered the letter FA to be a distinct letter in its own right. B. Although the letter FA superficially resembles the letter HA with a subjoined letter WA (wa-zur), in Yuan dynasty Chinese 'Phags-pa texts such as "Baijiaxing Mengguwen" [The 'Phags-pa version of the "Hundred Chinese Surnames"] and "Menggu Ziyun" [Rhyming dictionary of Chinese] the letter FA and the compound letter HWA are clearly differentiated : in the letter FA the upper part of the letter resembling a letter HA with no tail kink joins smoothly onto the lower part of the letter resembling a subjoined letter WA (as shown in Example 11 "fang" of Table 3 in the proposal); whereas in the letter HWA there is a kink in the tail of the letter HA before it joins onto the subjoined letter WA (as shown in Example 4 "hwa" of Table 3 in the proposal). Scanned images of the relevant letters showing their differences are provided on my "Description of the 'Phags-pa Script" page on my BabelStone1357 web site. In short, the 'Phags-pa letter FA derives from HA plus wa-zur, but is distinct from the 'Phags-pa letter combination HA plus subjoined-WA. It may inded be difficult for someone who does not know the language of the text that they are transcribing to disambiguate HWA and FA. It has proven notoriously difficult for some people to disambiguate U+017F [LATIN SMALL LETTER LONG S] and U+0066 [LATIN SMALL LETTER F], but that does not mean to say that we should unify the two letters. Unicode encodes characters, it is up to the user to be able to use the characters appropriately. In the case of Chinese 'Phags-pa texts where HWA and FA may both occur, the context (and the fact that most Chinese 'Phags-pa texts are bi-script) makes clear what letter is meant even where the actual glyphs used may be indistinguishable (of course, for a computer font the glyphs for FA and HWA+Subjoined-WA should be clearly distinguished). Rather than worrying about the fact that unqualified people may wrongly transcribing a text, we should rather be concerned that Chinese words spelled and pronounced differently (e.g. "hwan" and "fan") are in fact encoded differently. It would certainly complicate string matching and collation if we decided to represent these two 'Phags-pa syllables with exactly the same Unicode characters - the 14th century 'Phags-pa rhyming dictionary of Chinese "Menggu Ziyun" certainly treats "hwan" and "fan" as different spellings, it would be odd if Unicode did not. > b. > ABA9 (SUPERFIXED LETTER RA)? If AB9C (LETTER RA) always > transforms into ABA9 when immediately followed by another > consonant then ABA9 is simply a context dependant glyph variant > of AB9C and should not be encoded as a separate character. > c. > ABA6 (SUBJOINED LETTER WA), ABA7 (SUBJOINED LETTER YA) and ABA8 > (SUBJOINED LETTER RA)? Or are these characters really only > context dependant glyph variants of AB97, AB9B and AB9A? I discuss the reason for separately encoding Subjoined RA, YA and WA, as well as Superfixed Letter RA, is in some detail in Section 6 of the proposal. It is very late at night, and I have been answering your questions in reverse order, so at this stage I will simply quote the relevant text, and if you have any further queries on the matter, please feel free to ask. It is proposed to encode subjoined forms of the letters WA, YA and RA, and a superfixed form of the letter RA, in addition to (and separately from) the ordinary letters WA, YA and RA. The reason why these positional forms of the letters WA, YA and RA must be encoded separately is that without an explicit vowel "a" it would be impossible to distinguish, and hence correctly render, normal and subjoined/superfixed forms of the letters in a syllable with an inherent "a" vowel. For example, the Phags-pa spelling of the Chinese word hai "sea" is hay , whereas the Phags-pa spelling of the Chinese word xiˆ "summer" is hya . With no explicit vowel, the only way to tell whether the second letter in each Phags-pa syllable is the normal form of the letter YA or the graphically distinct subjoined form of the letter YA is to encode the two forms of the letter YA separately. The same applies for the normal and graphically distinct subjoined forms of the letters WA and RA. Likewise, it is necessary to separately encode the graphically distinct superfixed form of the letter RA that is found before the letters KA, GA, NGA, JA, TA, DA, NA, BA, MA, TSA and DZA when writing Tibetan (before the letter NYA only, the normal form of the letter RA is used), as otherwise it would be impossible to distinguish, and hence correctly render, Tibetan words written in the Phags-pa script such as rnga "drum" and rang "self". The important thing here is to provide a mechanism for determining which graphic form of the letter RA to render, not necessarily to distinguish which is the base consonant. Thus it is not necessary to separately encode superfixed forms of the letters LA and SA that are also used in writing Tibetan, as the normal and superfixed forms of the letters LA and SA are identical. In fact, in the case of words with a superfixed letter LA or SA, the base consonant is indicated in Phags-pa spelling by suffixing the letter -A when there is no explicit vowel (e.g. sam for Chinese "three", but sm-a for Sanskrit "sma"). > 3. > Is there a need to encode a 'Phags-pa character equivalent to > Tibetan U+0F0B ? Tibetan text written in 'Phags-pa script > without some sort of TSHEG might be hard to disambiguate. TSHEG > seems to be represented in 'Phags-pa by a space - but a > character with different properties than a normal space may be > required here. Effectively a kind of thin space character (or > ZWNJ) also seems to be used where a new stack would start within > a Tibetan tsheg-bar without such an intervening character the > apparent shaping rules for this script would seem to be that > glyphs for consonants join with each other. There is no 'Phags-pa equivalent of the tsheg mark [U+0F0B]. For all languages written using the 'Phags-pa script, including Tibetan, the letters of a syllable unit are ligated together, but there is whitespace between syllable units (as discussed in Section 5 of the proposal). Thus for Tibetan 'Phags-pa text, a tsheg-bar unit corresponds to a ligatured cluster of 'Phags-pa letters. As whitespace demarcates the boundaries of these syllable units the tsheg is not required, and not found. As line breaks occur between 'Phags-pa syllable clusters within a polysyllabic word in Mongolian, Sanskrit, Tibetan, etc., the space is best represented as a normal space (although one would be free to use a non-breaking space if one wanted to inhibit natural line breaks). I don't quite follow the second half of your question. Letters naturally ligate together in the same way as they do in Mongolian, Ogham or Arabic. There is no need to apply any special control character to produce this ligation. If you are referring to complex tsheg-bars such as "padme" which comprise more than one consonant-vowel stack, then the answer is that written in 'Phags-pa script the tsheg-bar would be broken up into two syllable units (i.e. "pad me" in 'Phags-pa script). > 4. > Are the characters proposed at ABAA through ABAE all vowels? If > so, shouldn't the names of these characters reflect this as > they do in the equivalent Tibetan characters? e.g. I'd suggest > U+ABAA PHAGS-PA VOWEL I rather than U+ABAA PHAGS-PA LETTER I. As confirmed in Section B.5.a of the proposal, the proposed names are in accordance with the Annex L ("Character Naming Guidelines") of ISO/IEC 10646-1: 2000. Rule 6 of Annex L states : The names are constructed from an appropriate set of the applicable terms of the following grid and ordered in the sequence of this grid.Exceptions are specified in Rule 11. The words WITH and AND may be included for additional clarity when needed. 1 Script 5 Attribute 2 Case 6 Designation 3 Type 7 Mark(s) 4 Language 8 Qualifier Examples of such terms: Script Latin,Cyrillic,Arabic Case capital,small Type letter,ligature,digit Language Ukrainian Attribute final,sharp,subscript,vulgar Designation customary name,name of letter Mark acute,ogonek,ring above,diaeresis Qualifier sign,symbol Rule 9 Thus "letter" is an appropriate "type"; "vowel" is not. Hence we have "MONGOLIAN LETTER I" [U+1822] etc. The Tibetan vowels have the name "TIBETAN VOWELS SIGN I" etc. as they are not independant letters. However, the 'Phags-pa vowels are independant letters which may occur in isolation from a base consonant, and so the term "letter" is appropriate for them. > And shouldn't U+ABAF PHAGS-PA LETTER CANDRABINDU be U+ABAF > PHAGS-PA SIGN CANDRABINDU As discussed in answer to Question 1 above, the encoding model treats the candrabindu as a normal, spacing letter, not a sign. > 5. > Is there effectively a 'Phags-pa character equivalent to Tibetan > character 0F71 rather than 0F60[/0FB0] (i.e. functioning as a > vowel rather than a consonant)? If only one character is encoded > it should be noted that it could be equivalent to either Tibetan > 0F60[/0FB0] or 0F71 dependant on context. The 'Phags-pa letter -A functions as both a consonant and a vowel lengthener depending upon context. This fact could certainly be noted in the relevant chapter of the Unicode Standard when it is written (if the authors of the Standard believe that it is appropriate). Please note that the proposal does not go into the details of how the 'Phags-pa script is used to spell Chinese, Mongolian, Tibetan, Sanskrit or Uighur, as I considered that to be somewhat outside the scope of an encoding proposal. Some examples of 'Phags-pa words in these languages are given in Section 5 of the proposal, which gives some idea of how 'Phags-pa words are spelled (cf. Table 3 Example 9 "-an" which shows the letter -A used as a consonant; and Examples 14 "q-an" and 19 "'-a kad ddha ya" which show the letter -A used as a vowel lengthener). Examples of the usage of every 'Phags-pa letter in writing Chinese and Mongolian are given in my "Description of the 'Phags-pa Script" page on my BabelStone1357 web site. > 6. > > "the Tibetan use of the Phags-pa script seems now to have > > virtually died out" > From what I've seen modern Tibetan usage seems to be largely > decorative - but saying it has "virtually died out" is probably > a little too strong. Many Tibetans are familiar with the script > and it continues to be used occasionally for decorative > purposes. > I have seen 'Phags-pa script used to write text in murals and > other decorations of a number of recently constructed Tibetan > monasteries in India and Nepal. It is also found on the title > pages of some modern xylographs (e.g. the edition of Mi la'i > rnam mgur published at Apo Rinpoche's monastery in Manali where > it is used to write "MAHA MUDRA" within the sides of the title > page border) and machine printed texts. It is also used in some > contemporary Tibetan seals. The 'Phags-pa script was probably > only ever used for lengthy texts for a short period - ever since > then it's use has probably been largely decorative and for > seals. - Since this kind of use continues it is probably as > "alive" as it has been at anytime since it stopped being used > for writing long texts in Mongolian. I'm pleased to hear that the 'Phags-pa script is still alive in Tibet. As mentioned in the proposal, I am aware that it has been used as a decorative seal for architectural inscriptions, book titles and seals, but I have not personally seen any examples that date from later than the early twentieth-century. As you are able to confirm its continued use as a decorative script for these purposes, I will gladly moderate the phrase "virtually died out". > BTW There is a whole chapter (#16) on this script entitled "hor > yig gsar ba mdzad pa'i mdzad pa'i skor mdo tsam brjod pa" > [p134-146] in the book "bod yig 'bri tshul mthong ba skun smon" > by dpa'-ris sangs-rgyas published by the Minorities Publishing > House in China in 1997 (ISBN 7-02628-6). Thanks for the reference - I will keep an eye open for the book. > 7. > Has anyone run this proposal past the Chinese and Mongolian > national bodies represented on WG2? They probably have contacts > with experts in their countries who should look at this and I > think it is prudent to get them involved as early as possible - > otherwise these national bodies are likely to request time for > consulting their experts before the proposal goes forward. Yes. See answer to Query 8. > 8. > There may be people at places like the British Library, the LOC > and libraries in China and Mongolia who are much more expert on > this script and have access to many more examples than anyone on > this list, the UTC or WG2. In the absence of real expertise and > with only a few examples I'm apprehensive that that there is a > danger assumptions may be made about the script which later turn > out to be wrong and difficult to correct. With little used > scripts like 'Phags-pa I'd be happier if experts on the script > were more actively sought out and their comments solicited - > rather than relying on them to somehow otherwise hear about > proposals and submit their comments. Without wishing to appear immodest, I think that I have as much expertise in the 'Phags-pa script as almost anyone else in the world, and certainly there are few other "real experts" that have the same degree of exposure to all three of the major languages that 'Phags-pa is used to represent (Chinese, Mongolian and Tibetan) and the same level of understanding of the principles of Unicode that I have. The reason why I have written this proposal is that I am actively engaged in academic research into the 'Phags-pa script, and need to have it encoded in order to facilitate the publication of the results of my research. Unlike some, it is not a dilattentish interest in scripts that makes me want to write proposals for scripts about which I know very little in the way that some people collect stamps. Like other Unicode proposals that I am working on, my personal need to use unencoded characters has driven me to write a proposal myself rather than wait forever for someone with "real expertise" to do so on my behalf. The proposal is based on months and months of intensive research utilising resources at SOAS and the British Library (for example I have personally examined the only extant manuscript copy of the 14th century 'Phags-pa Chinese rhyming dictionary "Menggu Ziyun" rather than relying on facsimile reprints; and have meticulously gone through every issue of the Chinese journals "Wen Wu" [Cultural Relics] and "Minzu Yuwen" [Journal of the Languages and Scripts of Minority Nationalities] in order to find as many examples of 'Phags-pa script usage as possible, and to read all articles that have been written on the script by experts such as Professor Junast). In short, my proposal is not merely an abstract from the relevant pages of the Ladybird Book of World Scripts (as you may think was the case from some of the Unicode proposals I've looked at). I might add that I have been in contact with people in the British Library (Dr. Susan Whitfield, head of the International Dunhuang Project), as well as experts in Japan (Dr. Dai Matsui) and the PRC (Professor Quejingzhabu of the University of Inner Mongolia, author of the authoritative work on Mongolian encoding "Mengguwen Bianma", and erstwhile WG2 delegate). Indeed I have recently been engaged in detailed correspondence with Professor Quejingzhabu about the proposal, who has stated : I would also like to inform you and your colleagues that when those of us in positions of importance in the relevant organizations, both within the Inner Mongolia Autonomous Region and at the national level, heard about your "Proposal to Encode the Phags-pa Script" we regarded it with great importance. I have already suggested to the appropriate organizations that a working group be set up expressly to examine your proposal, and to put forward the official Chinese position on the proposal. I think that the proposal will gain official approval. As to the perceived lack of examples in the proposal, out of the dozens of monumental inscriptions, printed texts, manuscript documents, seals, coins etc. that I have access to, I carefully decided to include five textual examples that together illustrated use of all of the proposed characters (plus two other illustrations that show the 'Phags-pa letters used in two important early sources). Any more examples would have been unnecessary, and would have detracted from the point that each of the five provided examples was making (in each case I explain exactly why the particular example has been included - they're not just there because that's all I've got). As I have stated in a previous email, the extensive 'Phags-pa pages on my BabelStone1357 web site provide many more examples, and also give links to a wide range of 'Phags-pa artifacts that are available on the web. It would have been easy enough for me to have added many more examples of 'Phags-pa usage in the proposal, but then what would this have served other than make an already lengthy proposal even longer ? > 9. > The 'Phags-pa script is clearly based on Tibetan script and it > is sometimes used to write Tibetan and so round trip mapping > between 'Phags-pa and Tibetan characters seems desirable - I > think this should be looked at and dealt with thoroughly within > the proposal. Also is there a need for round trip mapping > between 'Phags-pa and Mongolian and/or 'Phags-pa and CJK > characters? If so, this should be considered now since changes > to the proposed encoding may be needed to make this feasible. A. Tibetan At a simplistic level, there is round-trip mapping between 'Phags-pa characters (consonants, vowels and punctuation marks) and Tibetan for a 'Phags-pa Tibetan text. However, due to the different textual layout (all letters corresponding to a Tibetan tsheg-bar are written sequentially as pronounced in a single vertical "syllable unit" [with the exception of the candrabindu]), there are some differences between 'Phags-pa spelling of Tibetan and Tibetan spelling of Tibetan. For example, in the case of Tibetan words with a superfixed letter LA or SA, the base consonant is indicated in Phags-pa spelling by suffixing the letter -A when there is no explicit vowel (e.g. "sam" for Chinese "three", but "sm-a" for Sanskrit "sma"). Round-trip mapping between syllables such as Tibetan "sma" and 'Phags-pa "sm-a" would be difficult. Also, comparing the Tibetan text with the 'Phags-pa text of the same Buddhist text on the famous multi-script inscriptions at Juyongguan north-west of Beijing (which I have done as a matter of course), it is clear that some Sanskrit words are simply spelled differently in the Tibetan text compared with the 'Phags-pa text. The Tibetan reversed letter SHA [U+0F65], for example, always corresponds to an ordinary 'Phags-pa letter SHA, whether in isolation or as part of the compound letter KSHA. It would be impossible without linguistic knowledge (which most mapping tables don't have) to know whether 'Phags-pa letter SHA maps to U+0F64 or U+0F65. B. Mongolian Mongolian is much more problematic. As can be seen from the examples of Mongolian 'Phags-pa words given in Section 5 of the proposal, many Mongolian words are spelled differently in the Uighur-derived Mongolian script compared with their spelling in the 'Phags-pa script, and there is not necessarily a one-to-one correspondence. For example, the classical Mongolian "k" corresponds to the 'Phags-pa letter KHA in all native Mongolian words with the single exception of the common Mongolian word "yeke" (meaning "great, big") which is spelled with the 'Phags-pa letter KA. (On the other hand the 'Phags-pa letter KA in non-native Mongolian words generally corresponds to classical Mongolian "g"). Another major obstacle to round-trip mapping is that 'Phags-pa has two letter Es, both of which correspond to the same letter in the Uighur-derived Mongolian script. There are even greater complications if we were to look at how 'Phags-pa and Uighur-derived Mongolian script deal with null initials ('Phags-pa letters A and -A), but I will not go into that here. In short you would need a dictionary to achieve round-trip mapping between 'Phags-pa script and Unicode Mongolian. C. Chinese You can't round trip map between CJK ideographs and words spelled in the 'Phags-pa script because there is a many-to-many relationship between the two. You can't even round-trip map between 'Phags-pa and pinyin as Chinese 'Phags-pa texts represent an earlier form of the Mandarin language than that represented by pinyin. In summary, I do not believe that full round-trip mapping between 'Phags-pa and Tibetan and/or Mongolian is either achievable or particularly desirable. If you were to twist the encoding of 'Phags-pa to more closely fit Unicode Tibetan (even if that were possible), then you would simply make Mongolian and/or Chinese 'Phags-pa texts unencodable ! > 10. > In the notes column shouldn't the 'Phags-pa consonants map to > both the equivalent Tibetan headline consonants (0F40-0F68) and > the equivalent Tibetan subjoined consonants (0F90-0FB8)? Which > Tibetan letter a 'Phags-pa letter actually mapped to would have > to be dependant on context. True. But in isolation the 'Phags-pa letters map to the Tibetan base consonants. These notes are intended to be the sort of notes given in the Unicode code charts, and are provided for information only. The notes preceded by an arrow [U+2192] are "cross references", which indicate "a related character of interest, but without indicating the nature of the relation" (TUS 4.0 ch.16). (BTW, in the PDF of the final version of the proposal the arrow is unfortunately invisible.) If the UTC thinks that cross-referencing the 'Phags-pa consonants to both the base form and the subjoined form of the corresponding Tibetan consonants is useful, then I've got no objection to that. > 11. > If this proposal is accepted will any additions have to be made > to the notes for the corresponding characters in the Tibetan > block of TUS? No. Why should there be ? Runic can be used to write Old English. Do we need a note on U+00E6 [LATIN SMALL LETTER AE] stating that it corresponds to U+16AB [RUNIC LETTER AESC] ? Mongolian can be written in Cyrillic script. Do we need notes about Mongolian usage in the section of the Unicode Standard dealing with Cyrillic ? > 12. > The individual named 'Phags-pa Lama in the proposal is more > properly referred to as 'Phags-pa bLo-gros rGyal-mtshan > [1235-1280] (or 'gro- mgon 'phags-pa blo-gros rgyal-mtshan) - > there are several other important lamas with the name 'Phags-pa. > short biographical sketch [in Tibetan] of 'gro-mgon 'phags-pa > blo-gros rgyal-mtshan can be found on pages 351 to 353 of > gangs-can mkhas-grub rim-byon ming-mdzod (a Tibetan biographical > dictionary) [ISBN 5421-0200-1]. It's an encoding proposal, not a history lesson ! I give a short biography of the 'Phags-pa Lama (with his proper name in both Tibetan and Mongolian script) in my "Overview of the 'Phags-pa Script" page on my BabelStone1357 web site. > 13. > When this script gets encoded there may need to be a note in the standard > stressing that PHAGS-PA LETTER QA and PHAGS-PA LETTER GGA are > *not* variants of PHAGS-PA LETTER KHA and PHAGS-PA LETTER GA as > I can easily see people making this mistake when trying to enter > Tibetan words in this script. Why ? There are many scripts which have similar letters, and Unicode does not insert similar warnings. What about the Unicode Runic block that includes a mixture of runic letters used in different futharks (i.e. different writing systems). There are quite a few examples of runes that have very similar forms, but have different phonetic values, and are used in different futharks. Certainly anyone who does not know which runic system each runic letter is used in and what phonetic value it represents could easily use the wrong rune. Does Unicode warn us about this ? No. Could someone inadvertantly type PHAGS-PA LETTER QA for PHAGS-PA LETTER KHA ? Yes, but only if they did not know the 'Phags-pa script (mind you is they knew Tibetan they would realise which letter corresponded to Tibetan KA and GA from its position in the code charts). Unicode encodes characters, users use them - and one has to assume a certain level of competance in the script on the part of the user. For example, I know nothing about the Arabic script; but if I wanted to enter some Arabic text in Unicode should I look to the Unicode Standard for advise on how to write Arabic, or should I take an evening course in Arabic at my local college ? From: "Andrew C. West" Date: 2003-08-19 10:12:56 -0700 To: tibex@unicode.org Subject: [tibex] Re: hPhags-pa Proposal X-Sent: 19 Aug 2003 14:12:49 GMT X-Sent-From: andrewcwest@alumni.princeton.edu Sender: tibex-bounce@unicode.org On Tue, 19 Aug 2003 01:55:11 +0100, "Christopher John Fynn" wrote: > I agree, it need not be one-to-one - but if e.g. PHAGS-PA LETTER > FA is encoded it would probably have to be represented in > Tibetan by U+0F67 + U+0FAD (or maybe U+0F67 + U+0FA5 which is > the way "FA" is currently written in Tibetan documents published > in China). This would make round trip mapping between PHAGS-PA > and Tibetan difficult. PHAGS-PA LETTER FA is not used to write Tibetan, and the PHAGS-PA script is not used to write modern Tibetan. There really is no mapping problem to consider here. If for some bizarre reason you did want to convert modern Tibetan with either <0F67, 0FAD> or <0F67, 0FA5>, then you would probably map to PHAGS-PA LETTER FA. As you state you could not then roundtrip back to the original encoding. But then as I have been at pains to emphasise in my repsonse to your original questions, there is no convenient one-to-one mapping between 'Phags-pa and Chinese/Mongolian/Tibetan anyway. Many languages can be written in more than one script, and in many cases there is no one-to-one relationship, and hence roundtrip mapping is impossible. This really is not an encoding issue. > I think a TSHEG like space character to mark word boundaries in > Tibetan and another kind of thin space or zwnj like character to > indicate the equiv. of separation of stacks within a word will > be > needed. Although already encoded characters might be used for > these, it may be cleaner to add two new characters in this block > specifically for these purposes. You cannot encode a character that does not exist simply for compatability with another script ! As I have stated in my previous email, syllable division of Tibetan 'Phags-pa text is not necessarily the same as in Tibetan text. 'Phags-pa syllable units are separated by breaking whitespace - there really is absolutely no reason to encode this whitespace with any other character than U+0020. As far as I am aware there is no "separation of stacks within a word" when written in the 'Phags-pa script, so there is no need for a ZWNJ or other control character within a syllable unit. If you have seen an example of Tibetan 'Phags-pa text where you think such an approach is justified, please show me an example. > Also I think the queries I > raised about the characters proposed for ABA4, ABA6, ABA7, > ABA8, ABA9 need to be resolved. The reasons for encoding these characters are already given in the proposal. ABA4 (FA) should be encoded for the reasons given in my previous email (i.e. FA is graphically distinct from H plus Subjoined-WA). ABA6..ABA9 (Subjoined and Superfixed letters) must be encoded for the reasons given in the proposal that I quoted in my previous email. Namely, in words with only an inherant vowel it would be impossible to differentiate a base consonant from a graphically-distinct modifier consonant unless the subjoined/superfixed forms are encoded separately. Cf. Chinese "hay" [sea] and "hya" [summer]. If there was only one letter YA, then both words would be encoded , yet the two words are not only pronounced differently, but are written differently. Cf. Tibetan "rang" [self] and "rnga" [drum]. If there was only one letter RA, then both words would be encoded , yet the two words are not only pronounced differently, but are written differently. The only other solution would be to encode an Implicit Vowel ... but I really do not think that any of us would want to follow that path ! > Aside from these issues it is pretty clear how Tibetan is > written in this script. However I think it is also essential to > proactively get people who are familiar with Mongolian, Uighur, > Chinese & so on written in this script to look over the > proposal. There may be specific issues or conventions with this > script and those languages which we can't spot. I am familiar with both Mongolian and Chinese. Dr. Dai Matsui (who has seen the proposal and made no adverse comments) has researched 'Phags-pa seals on Uighur documents and is familiar with Old Uighur. I might add that 'Phags-pa Uighur texts are limited to two or three words that are found on some seals attached to documents written in the Old Uighur script (I have of course also examined images of these seals myself in the course of preparing the proposal). The most commonly occuring 'Phags-pa Uighur word is given as an example in Section 5 of the proposal. > Three other things... > 1. In some other examples I've seen there seem to be two kinds > of PHAGS-PA HEAD MARK - One which is the glyph proposed for > U+ABB0 and another with a single loop. The first is probably > equivalent to Tibetan U+0F04 plus U+0F05 and the second to > U+0F04 (or maybe the first is equiv. to U+0F04 plus U+0F05 plus > U+0F05 and the second to U+0F04 plus U+0F05). It could be > argued that these are two variants of the same character as they > perform the same function - but maybe it is safer to encode them > as two separate characters. I have actively looked for examples of head marks in Tibetan 'Phags-pa texts, but so far the only ones I have found are the double-looped variety. Cf. Example 4 in my proposal; and the seal of the 13th Dalai Lama. N.B. in the latter example the double-looped head mark corresponds to a single U+0F04 in the Tibetan text ... so how would you roundtrip map that if there were two 'Phags-pa head marks ? If you could provide an image of an example of a single-looped 'Phags-pa Head Mark, then I agree that it may be worthwhile encoding it separately (although as you say, they could be considered to be simply glyph variants). However, for Mongolian only a single head mark U+1800 [MONGOLIAN BIRGA] is encoded, and the four other forms of the birga (including single-looped, double-looped and triple-looped forms equivalent to <0F04>, <0F04, 0F05> and <0F04, 0F05, 0F05> respectively) are intended to be represented as Standardized Variants (although at present these are not yet officially defined). For compatability the Mongolian experts may prefer not to encode two separate 'Phags-pa head marks. For those who are wondering why we cannot simply define two 'Phags-pa letters corresponding to U+0F04 and U+0F05, and represent single-looped, double-looped and triple-looped forms of the head mark by combining these two characters as appropriate, the problem is that as 'Phags-pa is a vertical script the hypothetical equivalents to U+0F04 and U+0F05 would ligate vertically, whereas they need to be ligated horizontally within the vertical line of text. > Especially since we have both U+ABB1 > PHAGS-PA MARK SHAD and U+ABB2 PHAGS-PA MARK DOUBLE SHAD when > U+ABB2 could have been represented by U+ABB1 plus U+ABB1 (I'm > *not* arguing that it should be.). I was just waiting for someone to raise the Double Shad issue. I believe that the double shad should be encoded separately for two reasons : A. For compatability with U+0965 [DEVANAGARI DOUBLE DANDA] and U+0F0E [TIBETAN MARK NYIS SHAD]. B. Because the user community seems to view it as a separate character. Cf. Example 4 in my proposal, where the 'Phags-pa Double Shad corresponds to Tibetan shad marks on the same line, not two shad marks on separate lines as would be the case if it was conceived as merely two shad marks in succession. > 2. Do isolated vowels (not attached to consonants) ever occur? > If not, shouldn't the vowels be combining characters as in the > Tibetan script block? Yes, in the 'Phags-pa script vowels may occur in isolation, as discussed in Section 5 of the proposal (see Table 3 Example 1 for the isolate letter "U" that represents Chinese Wu); although when writing Tibetan in 'Phags-pa script an initial vowel is attached to PHAGS-PA LETTER A (in Mongolian and Chinese this is not the case). > 3. Finally, where there are multiple variants of a character > within a single style of 'Phags-pa script are we only going to > allow for one variant within a font or should we expand the > proposal to include specific + character> pairs to indicate these? No. These are simple glyph variants, and should not be encoded using variation selectors or otherwise. Table 2 of the proposal is informative only, and merely shows some of the glyph forms of 'Phags-pa letters in different script styles. > Generally I think the proposal Andrew put together is great and > he's obviously put a tremendous amount of research and work into > it. Sorry I'm late in the day with all these comments & > questions on it - but I was on holiday in India with my family > for the past six or seven weeks. Thank you. I hope that my responses to your questions are making you appreciate that I have put lot of thought into the proposal, and have attempted to cover all the bases. Regards, Andrew