From peterkirk@qaya.org Sat Nov 1 07:09:55 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 01 Nov 2003 07:09:59 -0500 (EST) Received: from ns3.eukhost.com (ns3.eukhost.com [64.5.60.201]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA1C9tG11487 for ; Sat, 1 Nov 2003 07:09:55 -0500 Received: from [213.162.124.237] (helo=qaya.org) by ns3.eukhost.com with asmtp (Exim 4.24) id 1AFuZd-00077j-Gz; Sat, 01 Nov 2003 12:09:45 +0000 Message-ID: <3FA3A288.5030501@qaya.org> Date: Sat, 01 Nov 2003 04:09:44 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20030925 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Philippe Verdy CC: hebrew@unicode.org Subject: [hebrew] Re: Hebrew composition model, with cantillation marks References: <038a01c39d86$44d90e40$2101a8c0@asimov> <03f401c39dc7$52205810$2101a8c0@asimov> <3F9FAEDA.8080503@qaya.org> <3FA1EFE7.1050905@kli.org> <3FA24A00.3020603@qaya.org> <0ae601c39fdf$69e95a90$2101a8c0@asimov> <3FA2CBF8.8050703@qaya.org> <0b6901c3a01d$bf178030$2101a8c0@asimov> In-Reply-To: <0b6901c3a01d$bf178030$2101a8c0@asimov> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - ns3.eukhost.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-archive-position: 613 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew NB copying all of the message as received to the Hebrew list, with my comments added, as it was sent there but has not appeared. On 31/10/2003 18:13, Philippe Verdy wrote: >>http://www.qaya.org/academic/hebrew/Issues-Hebrew-Unicode.html section >> >> > >This report confuses me even more now. Or may be there are composition >errors. >I am concerned by the terminology of characters used in the report, and that >are hyperlinked with characters with other names. > >It seems that all the anchor names in the HTML document have been mixed, >after an editing process. Can you confirm it? I think that this is an >undesired editorial error". For example: >- the "CGJ" anchor name is set to the Unicode definition of "bet", >- the "dagesh" anchor name is set to the Unicode definition of "CGJ", > > Some, but not all, of the anchors have been misplaced by a line or two. Sorry. Please follow the text and ignore the exact anchor positions. >One note: I am not an Hebrew expert and I don't know this language or any >language using it, but I try to figure out the rules for composing and >interpret it correctly. As the Unicode reference is completely opaque when >describing the script, and there are also a lot of issues for this language, >and almost all documents which I find are sometimes contradictory or contain >errors, it's quite a hard path to find the correct answers for the Hebrew >composition model. > >OK, this report is quite interesting, as it shows additional typographical >features of the Hebrew script that I was not aware,such as: > > >2.1) the vowel point holam, normally rendered at above-left which can get >shifted when redering in some cases at above-right the next alef if it is >silent with no other vowel point for itself, this silent alef taking the >role of a vowel holder for the previous consonnant, like if it was a virama >in a Indic script). However such typographical effects are quite minor, and >the first image in the report could be read without difficulty if the first >alef (on the right) did not have this shifted holam point, and the dot >rendered and its default position above-left the preceding zayin letter on >its right. > > True enough. In most scripts text with misplaced dots, accents etc can be read successfully. But it does not look correct. As a general principle, Unicode needs to support somehow the distinctions which typographers want to make rather than prescribing to them which distinctions they should make. >This does not break the model, unless there's an intent to correctly render >the shifted holam above alef, which normally combines with the previous >letter. In that case, a control could be inserted between the previous >combining sequence which contains the holam point, and the next grapheme >cluster starting by alef (unpronounced here), or a variant of alef could be >encoded (using a variant selector on this ALEF, to mean it is silent and >accepts a shifted holam from the previous grapheme cluster, this solution >being ignorable by renderers and collators unable to shift this holam to the >left next cluster). This would produce a sequence like: >,,,. The report also suggests that this is >working as a ligature, which can be controled by ZWJ or ZWNJ, where: >- HOLAM,ZWJ,ALEF forces the presence of the ligature to make a silent alef, >i.e. a shifted holam >- HOLAM,ZWNJ,ALEF force the absence of this ligature, i.e. the holam >displayed normally above-left the previous base letter. > >However it requires a renderer capable of making a language-based decision >(instead of a script-based decision) for the default placement of holam >coded before alef. ... > I don't think this is a language-based issue. Certainly the same rules are used for biblical Aramaic. The basic rule depends on the presence of any other combining mark with the alef - if there is one the alef is not silent. The alef is also not silent when followed by vav shruqa or holam male. These are rules which can be implemented in fonts. The placement needs to be specified in the font and not in the underlying text, and that is why it is wrong to require ZWJ to indicate that a ligature is required. >... My opinion is that the default will almost always be the >absence of the ligature, ignoring the special case of silent alef, and that >only the need to code differently the shifted HOLAM (with ZWJ) will be used >in practice, for correct rendering of the Hebrew language in a engine made >to be language neutral. > > I consider more valuable the opinions of those who know the script and languages in question. >If the other default is made, it would require that the renderer considers >the other posible diacritics coded after the ALEF to determine if this ALEF >is silent (this difficult result would be valid for Hebrew, but not >necessarily good for other languages using that script or transliterated to >it). ... > Can you, or anyone, tell me of any language written in Hebrew script where this result is not good? Or are you just speculating? We need some hard evidence if we are to recommend that every existing Hebrew document must be edited to insert ZWJ between holam and silent alef, so that the font is able to make the ligature as required. Even if there are minority languages in which the ligature must be disabled, this can be dealt with in the same way as the f,i ligature which is disabled in Turkish. >... In that second option, the two sequences with ZWNJ and ZWJ would be >necessary to forbid the application of this Hebrew language rule for the >placement of this HOLAM point on ALEF). > > >2.2) The same principe could be used with a final patah if it is "furtive", >i.e. logically associated with the previous syllable to its right, and >rendered below-right instead of below. However I see here that it is encoded >after the final ayin, het, or he-with-dagesh, and does not follow the >logical encoding model of hebrew syllables; this causes a problem if this >special rendering is necessary for the Hebrew language, but not for other >languages using the hebrew script, unless it is specially encoded to be >rendered and interpreted correctly. > > I am not sure what problem you are trying to solve by proposing a VS use here. Are you talking about prohibiting a shift of patah under a final consonant in languages other than Hebrew? This is again the Turkish f,i problem, disabling a ligature in a relative minority language. One solution might be to add ZWJ after the patah, to indicate that it should be rendered as if followed by another base character. > >May be a variant selector for this patah may be indicated in the encoded >text to force its position to become below-right instead of below in the >normal case (the variant selector would not be needed and implied for the >Hebrew language, using its placement rule on final letters of words). This >solution would be coherent with current renderers and the guideline to still >encode this furtive patah like a standard patah, even if its rendering >causes problems (the variant selector, when supported would correctly render >the furtive patah as a glyphic variant of the normal patah, in accordance >with Unicode guidelines for variant selectors usage). Of course it would >first require that Unicode accepts and publishes the sequence as >valid, to position correctly the patah glyph (this solution would also have >the merit of being able to collate it correctly before the final letter >instead of after it). The result would be for example: . >However there's a problem in Unicode related to the fact that combining >marks cannot have variants (variant selectors are not expected to occur in >the middle of a combining sequence). > >Another way to do it would be to create variants for the final base letter >AYIN, HET, or HE, to produce for example the sequences , >, . But this still requires allocating a >variant selector for these letters, to modify the way they manage the >default placement of patah (i.e. overriding the default below placement of a >further diacritic like patah, to place it below-right.) That is not clean, >and it certainly has some issues. > >Without the variant, there is no clean solution, unless this is also >considered as a special case ligature of the final patah with the previous >letter. But this would require placing a ZWJ before the final base letter, >even though there's no graphical interaction with the preceding base letter. > >So this is a case were a distinct PATAH character may be needed and given a >separate code point, unless Unicode relaxes the constraint on variant >selectors to allow them to be coded on combining characters (but here comes >the problem of normalization, as VS characters have combining class 0). > >Another solution would be to encode a hebrew-specific variant selector with >the same combining class as patah (class 17), which would have the >properties of a control character, no associated glyph as it would be a >control, and used before that PATAH combining mark. If I call this format >control "FURTIVE", then sample encodings would be , or >, or . I have not analyzed all >the issues that may result from using instead CGJ for this FURTIVE control >(which breaks the combining sequence with its combining class 0, and is >better fitted to create normalization overrides) > > You are proposing a complex solution to a non-problem. Anyway most fonts do not shift furtive patah. If fonts wish to do so, they may but need to be aware that in some hypothetical minority language cases the result is not quite ideal. >2.3) For holam above vav, as the default position of holam is above-left, >there's a problem to represent the glyphic distinction of "holam male" (i.e. >VAV with a HOLAM above-right). The report suggests this is similar to case >2.1, but only to because the base letter (here "vav", alias "waw") is also >unpronounced. The report suggests that a ZWJ control be coded between VAV >and HOLAM, but I think that a variant selector for VAV would be more >appropriate, and would work here given the Unicode constraints on VS >characters. > > This is another alternative which might be considered in addition to the six already described in the text and the appendix. If new characters are to be defined, defining a VS is less simple than my alternatives 3 and 4. > >2.4) I was not aware that holam and shin dots could have distinct glyphs, as >they both are dots in the samples I have seen. Also I was not aware that a >HOLAM on the preceding base letter on the right (normally above-left that >letter) could be merged with a SHIN-DOT on the current base letter (normally >above-right this letter). Samples I had seen displayed both dots on their >respective positions above the appropriate letter. This is a quite unique >feature of the Hebrew script to have diacritics above a letter floatting >around either in the separation space between base letters or even with the >surrounding base letters. This is really a special form of ligature >(convention A4) not used in the second convention B4 which is used in all >samples I had seen before, and that ligates together only the diacritics of >two consecutive letters. However the suggestion to remove (not encode) the >holam seems quite excessive and dangerous (and inconsistant with the better >practice of using a holam glyph for the merged dots, instead of the glyph >for shin dot). > > Agreed. >2.5) Holam with sin dot is exactly similar, but on the other side: the holam >above-left a letter merges with the sin dot above-right the next letter. > >The rest seems clear to me, and do not cause interpretation problems. If >only the editorial errors are corrected, in this HTML document, I think it >is certainly much more useful than the PDF on sil.org proposed by Microsoft, >Sun and other, and which does not solve any practical problem. > > > Thank you, Philippe. The document is still a draft. It doesn't need to be a formal proposal as the only new characters proposed are rather separate issues which are best proposed in separate and more simple proposals. Also it doesn't deal with most accent issues. I was intending to update it to include material on accents. But I am not sure if it worth doing so, as consensus has been reached on most of these issues (see for example http://www.qsm.co.il/Hebrew/Hebrew%20Issues.htm, although some disagreement remains over meteg), and separate proposals for accents are being prepared. Peter -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Sat Nov 1 12:31:41 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 01 Nov 2003 12:31:41 -0500 (EST) Received: from ns3.eukhost.com (ns3.eukhost.com [64.5.60.201]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA1HVfG32296 for ; Sat, 1 Nov 2003 12:31:41 -0500 Received: from [213.162.124.237] (helo=qaya.org) by ns3.eukhost.com with asmtp (Exim 4.24) id 1AFzb4-0002Ti-7m; Sat, 01 Nov 2003 17:31:34 +0000 Message-ID: <3FA3EDF6.2040501@qaya.org> Date: Sat, 01 Nov 2003 09:31:34 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20030925 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Philippe Verdy CC: hebrew@unicode.org Subject: [hebrew] Re: Hebrew composition model, with cantillation marks References: <038a01c39d86$44d90e40$2101a8c0@asimov> <03f401c39dc7$52205810$2101a8c0@asimov> <3F9FAEDA.8080503@qaya.org> <3FA1EFE7.1050905@kli.org> <3FA24A00.3020603@qaya.org> <0ae601c39fdf$69e95a90$2101a8c0@asimov> <3FA2CBF8.8050703@qaya.org> <0b6901c3a01d$bf178030$2101a8c0@asimov> <3FA3A288.5030501@qaya.org> <0c8301c3a098$87739d00$2101a8c0@asimov> In-Reply-To: <0c8301c3a098$87739d00$2101a8c0@asimov> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - ns3.eukhost.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-archive-position: 614 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew Thank you, Philippe. Again I am forwarding what you wrote to the Hebrew list as you intended. It really would help if you could persist in trying to join the Hebrew list. Many of the issues you mention have already been discussed there at length. On 01/11/2003 08:52, Philippe Verdy wrote: >From: "Peter Kirk" > > > >>>... My opinion is that the default will almost always be the >>>absence of the ligature, ignoring the special case of silent alef, and >>> >>> >that > > >>>only the need to code differently the shifted HOLAM (with ZWJ) will be >>> >>> >used > > >>>in practice, for correct rendering of the Hebrew language in a engine >>> >>> >made > > >>>to be language neutral. >>> >>> >>> >>> >>I consider more valuable the opinions of those who know the script and >>languages in question. >> >> > >My opinion reflects only what I have been able to find myself among various >sources, but of course experts may have better solutions than me, and may >have documented somewhere these rules. > >In practice, such documentation (in English or French or Spanish or Chinese >or a language which has a larger audience) is still missing and is needed >even for non Hebrew experts that will need to encode, use, render or >interpret a text in this script. Given the lack of information on this >subject in the Unicode reference, this does not help implementers to handle >Hebrew text correctly, and they can't be blamed for that. > > I agree. My document (http://www.qaya.org/academic/hebrew/Issues-Hebrew-Unicode.html) was intended to help implementers, as well as to give the background for any changes that might be needed to Unicode. John Hudson has also drafted a useful guide which will (I understand) be issued soon with the SBL Hebrew font. >For Unicode and for these users, there's a real need to make a comprehensive >character model for Hebrew, and that's what I'm trying to do from >informations I collect in various places. Before this discussion, I had no >idea how to handle Hebrew text correctly. In addition, the various places >discussing Hebrew are using sometimes different languages to describe >characters. > > I agree about the need for a comprehensive character model. It is unfortunate that this was not done before, before character names and combining classes were frozen. I'm not quite sure what sort of document that would be in practice. But I hope we are getting near to collecting the material required for this. >One example is the many common alias names used by Hebrew experts or users >(certainly coming from local cultural differences depending on the country >where Hebrew is spoken or written, and probably from several branches of >tradition in the jude diaspora). > > This is a significant problem which we were discussing on the list a couple of weeks ago. The problem is that the Hebrew accents have so many different names, and sometimes the same name is given to different characters. We agreed to add some aliases to the code charts, but we never agreed on what aliases to include. Apart from the various Jewish traditions, there is a rather different naming tradition among western, mostly Christian scholars, and some subtle differences in rendering. >Another case is the term "holam male", which you think is obvious for you, >but refers to a Hebrew script specificity which is not even named in the >Hebrew chapter of the Unicode reference. > > Actually this term was not known to me before I joined this list in July, although I had seen the English equivalent "full holem". But it is explained in section 2.3 of my document. However, I do assume there a certain basic understanding of Hebrew script. >>>If the other default is made, it would require that the renderer >>> >>> >considers > > >>>the other posible diacritics coded after the ALEF to determine if this >>> >>> >ALEF > > >>>is silent (this difficult result would be valid for Hebrew, but not >>>necessarily good for other languages using that script or transliterated >>> >>> >to > > >>>it). ... >>> >>> >>> >>Can you, or anyone, tell me of any language written in Hebrew script >>where this result is not good? Or are you just speculating? >> >> > >This is a speculation may be, but I'm not the author of it, as various >documents speaking about the Hebrew script say that what is valid when >transcripting the Hebrew language to that script, may not be valid when >transcripting other languages (a common concept shared by all scripts and >needed throughout the world due to worldwide internationalization of >exchanges). > > In general this is true, but I have never seen a suggestion that this particular shift of holam on to alef is specific to Hebrew. >Unless there is somewhere a ISO model or admitted national model for the >transcription to Hebrew of words or names normally written in other scripts >(similar to what exists for example between from Hindi or Chinese or Russian >to Latin), there will always be problems to handle minority or rare scripts >in applications or data exchanges. > > Jony may be able to let us know if there is any Israeli standard for transliteration into Hebrew. There are accepted rules e.g. those outlined in the last paragraph of http://www.weizmann.ac.il/home/comartin/ivrit/ansi.html. And there are specific methods used for more or less phonetic transcription, but I have no idea whether there is even a de facto standard. > > >>We need some >>hard evidence if we are to recommend that every existing Hebrew document >>must be edited to insert ZWJ between holam and silent alef, so that the >>font is able to make the ligature as required. Even if there are >>minority languages in which the ligature must be disabled, this can be >>dealt with in the same way as the f,i ligature which is disabled in >> >> >Turkish. > > >>>... In that second option, the two sequences with ZWNJ and ZWJ would be >>>necessary to forbid the application of this Hebrew language rule for the >>>placement of this HOLAM point on ALEF). >>> >>> >>>2.2) The same principe could be used with a final patah if it is >>> >>> >"furtive", > > >>>i.e. logically associated with the previous syllable to its right, and >>>rendered below-right instead of below. However I see here that it is >>> >>> >encoded > > >>>after the final ayin, het, or he-with-dagesh, and does not follow the >>>logical encoding model of hebrew syllables; this causes a problem if this >>>special rendering is necessary for the Hebrew language, but not for other >>>languages using the hebrew script, unless it is specially encoded to be >>>rendered and interpreted correctly. >>> >>> >>> >>> >>I am not sure what problem you are trying to solve by proposing a VS use >>here. Are you talking about prohibiting a shift of patah under a final >>consonant in languages other than Hebrew? This is again the Turkish f,i >>problem, disabling a ligature in a relative minority language. One >>solution might be to add ZWJ after the patah, to indicate that it should >>be rendered as if followed by another base character. >> >>You are proposing a complex solution to a non-problem. Anyway most fonts >>do not shift furtive patah. If fonts wish to do so, they may but need to >>be aware that in some hypothetical minority language cases the result is >>not quite ideal. >> >> > >That's why I think that furtive patah is a typographic effect that does not >need to be encoded explicitly, and you're probably right, if ZWJ or ZWNJ can >be used to alter the unpredictable default rendering (which may or may not >render the shifted patah under the same conditions). > > > >>>2.3) For holam above vav, as the default position of holam is above-left, >>>there's a problem to represent the glyphic distinction of "holam male" >>> >>> >(i.e. > > >>>VAV with a HOLAM above-right). The report suggests this is similar to >>> >>> >case > > >>>2.1, but only to because the base letter (here "vav", alias "waw") is >>> >>> >also > > >>>unpronounced. The report suggests that a ZWJ control be coded between VAV >>>and HOLAM, but I think that a variant selector for VAV would be more >>>appropriate, and would work here given the Unicode constraints on VS >>>characters. >>> >>> >>> >>> >>This is another alternative which might be considered in addition to the >>six already described in the text and the appendix. If new characters >>are to be defined, defining a VS is less simple than my alternatives 3 >>and 4. >> >> >> >>>However the suggestion to remove (not encode) the >>>holam seems quite excessive and dangerous (and inconsistant with the >>> >>> >better > > >>>practice of using a holam glyph for the merged dots, instead of the glyph >>>for shin dot). >>> >>> >>Agreed. >> >> >> >>>2.5) Holam with sin dot is exactly similar, but on the other side: the >>> >>> >holam > > >>>above-left a letter merges with the sin dot above-right the next letter. >>> >>>The rest seems clear to me, and do not cause interpretation problems. If >>>only the editorial errors are corrected, in this HTML document, I think >>> >>> >it > > >>>is certainly much more useful than the PDF on sil.org proposed by >>> >>> >Microsoft, > > >>>Sun and other, and which does not solve any practical problem. >>> >>> >>Thank you, Philippe. The document is still a draft. It doesn't need to >>be a formal proposal as the only new characters proposed are rather >>separate issues which are best proposed in separate and more simple >>proposals. Also it doesn't deal with most accent issues. I was intending >>to update it to include material on accents. But I am not sure if it >>worth doing so, as consensus has been reached on most of these issues >>(see for example http://www.qsm.co.il/Hebrew/Hebrew%20Issues.htm, >>although some disagreement remains over meteg), and separate proposals >>for accents are being prepared. >> >> > >Note that I read this document as a non-expert of Hebrew. This is still a >valuable opinion to consult non-experts to see if some implied (not >explicitly documented) cases are missing, or if the description, intended >for non-experts can be understood and implemented correctly by them. Even >the Unicode Technical Committee and the ISO10646 Working Group admit that >they cannot work with enough expertise and they are working with >subcommittees headed by liaison Rapporteurs. > >A lot is expected from these subgroups of experts, but the most important >thing is documentation that can be used to complement or enhance the Unicode >and ISO/IEC 10646 standards. > > One of the past problems has been the failure of Unicode etc to get enough input from enough Hebrew scholars and experts, other than those whose knowledge is mainly of the (typographically) simplified subset which is modern Hebrew. Part of the problem was that experts in ancient languages are not easily convinced of the importance of Unicode for their work - or they only become so when they discover that computers won't do what they want. But also some of the experts who were interested found their earlier contributions being ignored. There are now more contacts with the scholarly community which can be used if necessary. >For now, a first reader of the Unicode reference may legitimately think that >combining classes standardized in Unicode are creating a normative layout >property for characters. When one reads more precisely the standard, it is >wrong, even in the Latin script (look for example about the debates with >cedillas and commas below or above, or even the simple case of the dot above >the letter i in Turkish). > >Some of these issues are correctly documented in Unicode (like the dot above >i in the Altaic alphabet of Turkish and Azeri), but Hebrew is really not >usable directly when just reading the Unicode standard. On the opposite, a >much better effort has been made for Arabic where joining types have been >documented and standardized. > >Your discussions with me are quite fruitful. At least I have a better >knowledge of how the Hebrew script works, and what errors I (or someone else >in the same situation as me) could do with incorrect interpretations of >sometimes contradictory documents and opinions, and the lack of standard to >decide which solution to adopt. > >I do think that a simple table (like the one I did in Excel) in the Hebrew >Unicode reference would greatly help understanding the issues. Also common >aliases found for Hebrew abstract characters should be listed or explained >with recommanded "best practices", as well as encoding examples. ... > I agree. As the level of detail required is probably too much for the main text of the standard, some kind of technical report is required. >... And it's >significant that even Microsoft, which has the money to pay a lot of >experts, has not been able for now to create a working model for Hebrew in >Unicode, needing to use some private use characters to render correctly some >cases, that were easy to create in the legacy Windows codepage or the Hebrew >version of ISO-8859. > > > I know that Microsoft has worked hard on trying to create a working model, including by working with (though not paying!) semi-experts like myself. I don't think the PUA is used in its latest version of the model, except perhaps for not yet encoded characters like the inverted nun - but these were never supported by ISO 8859. But Microsoft has so far failed to support rendering of canonically ordered text, instead expecting text to be presented in a specific non-canonical order. I must say I wonder if anyone has yet successfully implemented the full Unicode Hebrew model including proper rendering of combining sequences in any canonically equivalent order. Microsoft has claimed that this cannot be done efficiently. While I and many others are sceptical about this claim, such scepticism sounds rather hollow coming from those who have not actually implemented the Unicode Hebrew model. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From tiro@tiro.com Sat Nov 1 14:18:41 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 01 Nov 2003 14:19:06 -0500 (EST) Received: from priv-edtnes51.telusplanet.net (defout.telus.net [199.185.220.240]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA1JIKG24370 for ; Sat, 1 Nov 2003 14:18:41 -0500 Received: from Sophia.tiro.com ([66.183.177.193]) by priv-edtnes51.telusplanet.net (InterMail vM.6.00.05.00 201-2115-109-20030812) with ESMTP id <20031101191744.BMNR21756.priv-edtnes51.telusplanet.net@Sophia.tiro.com>; Sat, 1 Nov 2003 12:17:44 -0700 Message-Id: <5.2.1.1.1.20031101110610.0304eed0@pop3.portal.ca> X-Sender: tiro@pop3.portal.ca X-Mailer: QUALCOMM Windows Eudora Version 5.2.1 Date: Sat, 01 Nov 2003 11:17:38 -0800 To: Peter Kirk From: John Hudson Subject: [hebrew] Re: Hebrew composition model, with cantillation marks Cc: Philippe Verdy , hebrew@unicode.org, Peter Constable In-Reply-To: <3FA3EDF6.2040501@qaya.org> References: <0c8301c3a098$87739d00$2101a8c0@asimov> <038a01c39d86$44d90e40$2101a8c0@asimov> <03f401c39dc7$52205810$2101a8c0@asimov> <3F9FAEDA.8080503@qaya.org> <3FA1EFE7.1050905@kli.org> <3FA24A00.3020603@qaya.org> <0ae601c39fdf$69e95a90$2101a8c0@asimov> <3FA2CBF8.8050703@qaya.org> <0b6901c3a01d$bf178030$2101a8c0@asimov> <3FA3A288.5030501@qaya.org> <0c8301c3a098$87739d00$2101a8c0@asimov> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-archive-position: 615 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: tiro@tiro.com Precedence: bulk X-list: hebrew At 09:31 AM 11/1/2003, Peter Kirk wrote: >>... And it's >>significant that even Microsoft, which has the money to pay a lot of >>experts, has not been able for now to create a working model for Hebrew in >>Unicode, needing to use some private use characters to render correctly some >>cases, that were easy to create in the legacy Windows codepage or the Hebrew >>version of ISO-8859. >I know that Microsoft has worked hard on trying to create a working model, >including by working with (though not paying!) semi-experts like myself. I >don't think the PUA is used in its latest version of the model, except >perhaps for not yet encoded characters like the inverted nun - but these >were never supported by ISO 8859. For the record, the approach used in the SBL Hebrew OpenType font, which works with Unicode, does not use *any* PUA characters for Hebrew. I'm really not sure to what Philippe is referring. At the moment, we support inverted nun with a glyph processing hack = /invertednun/. This isn't pretty, and I'm looking forward to the day when the nun hafucha is encoded. > But Microsoft has so far failed to support rendering of canonically > ordered text, instead expecting text to be presented in a specific > non-canonical order. I must say I wonder if anyone has yet successfully > implemented the full Unicode Hebrew model including proper rendering of > combining sequences in any canonically equivalent order. Microsoft has > claimed that this cannot be done efficiently. While I and many others are > sceptical about this claim, such scepticism sounds rather hollow coming > from those who have not actually implemented the Unicode Hebrew model. I've looked at this closely from the glyph processing end, and it simply isn't possible. I can either render text that is ordered as scholars input it, or I can render it as normalisation orders it (at least, I think I might be able to: there are a couple of things that could be very difficult), but I cannot render it both ways. Since most applications do not automatically normalise text, it is more important to us now to be able to support the logical and linguistic input order favoured by scholars. Asking users to try to input text in normalisation order is not an option, since it is so illogical; no one would use the fonts if they had to do that: they would stick with their various non-standard 8-bit solutions. So the ability to render Unicode canonically ordered Hebrew correctly relies on the normalised character string being reordered to 'display order'. The question is: is this something that Uniscribe should do, or is it something that an application should do before calling Uniscribe? John Hudson Tiro Typeworks www.tiro.com Vancouver, BC tiro@tiro.com I sometimes think that good readers are as singular, and as awesome, as great authors themselves. - JL Borges From cowan@mercury.ccil.org Sat Nov 1 14:24:23 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 01 Nov 2003 14:24:23 -0500 (EST) Received: from mercury.ccil.org (mercury.ccil.org [192.190.237.100]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA1JOMG25131 for ; Sat, 1 Nov 2003 14:24:22 -0500 Received: from cowan by mercury.ccil.org with local (Exim 3.35 #1 (Debian)) id 1AG1M9-0002bD-00; Sat, 01 Nov 2003 14:24:17 -0500 Date: Sat, 1 Nov 2003 14:24:17 -0500 To: John Hudson Cc: Peter Kirk , Philippe Verdy , hebrew@unicode.org, Peter Constable Subject: [hebrew] Re: Hebrew composition model, with cantillation marks Message-ID: <20031101192417.GF21632@mercury.ccil.org> References: <03f401c39dc7$52205810$2101a8c0@asimov> <3F9FAEDA.8080503@qaya.org> <3FA1EFE7.1050905@kli.org> <3FA24A00.3020603@qaya.org> <0ae601c39fdf$69e95a90$2101a8c0@asimov> <3FA2CBF8.8050703@qaya.org> <0b6901c3a01d$bf178030$2101a8c0@asimov> <3FA3A288.5030501@qaya.org> <0c8301c3a098$87739d00$2101a8c0@asimov> <5.2.1.1.1.20031101110610.0304eed0@pop3.portal.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5.2.1.1.1.20031101110610.0304eed0@pop3.portal.ca> User-Agent: Mutt/1.3.28i From: John Cowan X-archive-position: 616 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@mercury.ccil.org Precedence: bulk X-list: hebrew John Hudson scripsit: > The question is: is this something that Uniscribe should do, or > is it something that an application should do before calling Uniscribe? Since the point of Uniscribe seems to be to do complex script processing, then I would say Uniscribe should do it. Graphite at least would be able to do this in the font (I'm not sure about AAT), since after all Graphite can render English as Pig Latin! -- John Cowan jcowan@reutershealth.com www.ccil.org/~cowan www.reutershealth.com I must confess that I have very little notion of what [s. 4 of the British Trade Marks Act, 1938] is intended to convey, and particularly the sentence of 253 words, as I make them, which constitutes sub-section 1. I doubt if the entire statute book could be successfully searched for a sentence of equal length which is of more fuliginous obscurity. --MacKinnon LJ, 1940 From tiro@tiro.com Sat Nov 1 15:04:34 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 01 Nov 2003 15:04:34 -0500 (EST) Received: from priv-edtnes11-hme0.telusplanet.net (outbound03.telus.net [199.185.220.222]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA1K4XG30093 for ; Sat, 1 Nov 2003 15:04:33 -0500 Received: from Sophia.tiro.com ([66.183.177.193]) by priv-edtnes11-hme0.telusplanet.net (InterMail vM.6.00.05.00 201-2115-109-20030812) with ESMTP id <20031101200427.OIOC4138.priv-edtnes11-hme0.telusplanet.net@Sophia.tiro.com>; Sat, 1 Nov 2003 13:04:27 -0700 Message-Id: <5.2.1.1.1.20031101120019.02cd5518@pop3.portal.ca> X-Sender: tiro@pop3.portal.ca X-Mailer: QUALCOMM Windows Eudora Version 5.2.1 Date: Sat, 01 Nov 2003 12:04:22 -0800 To: John Cowan From: John Hudson Subject: [hebrew] Re: Hebrew composition model, with cantillation marks Cc: Peter Kirk , Philippe Verdy , hebrew@unicode.org, Peter Constable In-Reply-To: <20031101192417.GF21632@mercury.ccil.org> References: <5.2.1.1.1.20031101110610.0304eed0@pop3.portal.ca> <03f401c39dc7$52205810$2101a8c0@asimov> <3F9FAEDA.8080503@qaya.org> <3FA1EFE7.1050905@kli.org> <3FA24A00.3020603@qaya.org> <0ae601c39fdf$69e95a90$2101a8c0@asimov> <3FA2CBF8.8050703@qaya.org> <0b6901c3a01d$bf178030$2101a8c0@asimov> <3FA3A288.5030501@qaya.org> <0c8301c3a098$87739d00$2101a8c0@asimov> <5.2.1.1.1.20031101110610.0304eed0@pop3.portal.ca> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-archive-position: 617 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: tiro@tiro.com Precedence: bulk X-list: hebrew At 11:24 AM 11/1/2003, John Cowan wrote: >John Hudson scripsit: > > > The question is: is this something that Uniscribe should do, or > > is it something that an application should do before calling Uniscribe? > >Since the point of Uniscribe seems to be to do complex script processing, >then I would say Uniscribe should do it. Graphite at least would be >able to do this in the font (I'm not sure about AAT), since after all >Graphite can render English as Pig Latin! The point with Graphite and AAT is that you would *have* to do it in the font, which is one of the reasons there are so few Graphite and AAT fonts out there: font developers are not that masochistic. In OpenType, there is a cleaner distinction between character processing and glyph processing, and the assumption is that character processing is something that takes place outside the font. This inevitably leads to some debate regarding where, exactly, it takes place outside the font, but it sure makes font development a lot easier. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC tiro@tiro.com I sometimes think that good readers are as singular, and as awesome, as great authors themselves. - JL Borges From tiro@tiro.com Sat Nov 1 17:15:20 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 01 Nov 2003 17:15:20 -0500 (EST) Received: from priv-edtnes51.telusplanet.net (defout.telus.net [199.185.220.240]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA1MFJG23659 for ; Sat, 1 Nov 2003 17:15:19 -0500 Received: from Sophia.tiro.com ([66.183.177.193]) by priv-edtnes51.telusplanet.net (InterMail vM.6.00.05.00 201-2115-109-20030812) with ESMTP id <20031101221513.EJNT21756.priv-edtnes51.telusplanet.net@Sophia.tiro.com>; Sat, 1 Nov 2003 15:15:13 -0700 Message-Id: <5.2.1.1.1.20031101141235.02f0f6d8@pop3.portal.ca> X-Sender: tiro@pop3.portal.ca X-Mailer: QUALCOMM Windows Eudora Version 5.2.1 Date: Sat, 01 Nov 2003 14:15:08 -0800 To: "Philippe Verdy" From: John Hudson Subject: [hebrew] Re: Hebrew composition model, with cantillation marks Cc: "Peter Kirk" , In-Reply-To: <003e01c3a0c1$e8db8a20$2101a8c0@asimov> References: <0c8301c3a098$87739d00$2101a8c0@asimov> <038a01c39d86$44d90e40$2101a8c0@asimov> <03f401c39dc7$52205810$2101a8c0@asimov> <3F9FAEDA.8080503@qaya.org> <3FA1EFE7.1050905@kli.org> <3FA24A00.3020603@qaya.org> <0ae601c39fdf$69e95a90$2101a8c0@asimov> <3FA2CBF8.8050703@qaya.org> <0b6901c3a01d$bf178030$2101a8c0@asimov> <3FA3A288.5030501@qaya.org> <0c8301c3a098$87739d00$2101a8c0@asimov> <5.2.1.1.1.20031101110610.0304eed0@pop3.portal.ca> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-archive-position: 618 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: tiro@tiro.com Precedence: bulk X-list: hebrew At 01:48 PM 11/1/2003, Philippe Verdy wrote: >(So, for example, this excludes to use to create >a inverted nun as it looks like an abuse of the defined usage for CGJ; and >it seems much cleaner to ask for a variant selector, or for a separate >encoding of the inverted nun as a separate character with its own >properties) Yes, the abuse is freely acknowledged: this is a hack to be able to display the nun hafucha in existing applications. The nun hafucha should be separately encoded, and I'm pretty confident that it will be once the proposal is submitted. JH Tiro Typeworks www.tiro.com Vancouver, BC tiro@tiro.com I sometimes think that good readers are as singular, and as awesome, as great authors themselves. - JL Borges From everson@evertype.com Sat Nov 1 17:50:19 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 01 Nov 2003 17:50:19 -0500 (EST) Received: from [67.31.0.92] (dialup-67.31.0.92.Dial1.NewYork1.Level3.net [67.31.0.92]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA1MoIG30771 for ; Sat, 1 Nov 2003 17:50:18 -0500 Mime-Version: 1.0 X-Sender: evr001@mail.dna.ie Message-Id: In-Reply-To: <200311010112.hA11CMF24215@unicode.org> References: <200311010112.hA11CMF24215@unicode.org> Date: Sat, 1 Nov 2003 10:07:14 -0500 To: hebrew@unicode.org From: Michael Everson Subject: [hebrew] Re: Unicode Hebrew proposal: nomenclature.. Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-archive-position: 619 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew I was wondering if it was really necessary for Elaine to write an 80-page proposal. How many characters are being proposed? The TLG proposals were not user-friendly in format. I hope Elaine will look at some of my proposals for formatting. -- Michael Everson * * Everson Typography * * http://www.evertype.com From peterkirk@qaya.org Sat Nov 1 18:35:14 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 01 Nov 2003 18:35:14 -0500 (EST) Received: from ns3.eukhost.com (ns3.eukhost.com [64.5.60.201]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA1NZDG05614 for ; Sat, 1 Nov 2003 18:35:14 -0500 Received: from [213.162.124.237] (helo=qaya.org) by ns3.eukhost.com with asmtp (Exim 4.24) id 1AG5Gq-0008J0-QJ; Sat, 01 Nov 2003 23:35:05 +0000 Message-ID: <3FA4432A.5020808@qaya.org> Date: Sat, 01 Nov 2003 15:35:06 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20030925 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Philippe Verdy CC: hebrew@unicode.org Subject: [hebrew] Re: Hebrew composition model, with cantillation marks References: <0c8301c3a098$87739d00$2101a8c0@asimov> <038a01c39d86$44d90e40$2101a8c0@asimov> <03f401c39dc7$52205810$2101a8c0@asimov> <3F9FAEDA.8080503@qaya.org> <3FA1EFE7.1050905@kli.org> <3FA24A00.3020603@qaya.org> <0ae601c39fdf$69e95a90$2101a8c0@asimov> <3FA2CBF8.8050703@qaya.org> <0b6901c3a01d$bf178030$2101a8c0@asimov> <3FA3A288.5030501@qaya.org> <0c8301c3a098$87739d00$2101a8c0@asimov> <5.2.1.1.1.20031101110610.0304eed0@pop3.portal.ca> <003e01c3a0c1$e8db8a20$2101a8c0@asimov> In-Reply-To: <003e01c3a0c1$e8db8a20$2101a8c0@asimov> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - ns3.eukhost.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-archive-position: 620 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew Again keeping all of Philippe's text as he attempts to join the Hebrew list - so far none of his postings have appeared there... On 01/11/2003 13:48, Philippe Verdy wrote: >From: "John Hudson" > > >>So the ability to render Unicode canonically ordered Hebrew correctly >>relies on the normalised character string being reordered to 'display >>order'. The question is: is this something that Uniscribe should do, or is >>it something that an application should do before calling Uniscribe? >> >> > >That's a good question, and it is architectural, as it would affect the >compatibility of fonts or applications designed to work on this >architecture. > >But as Unicode stipulates that any application should be able to render and >use any canonically equivalent form of any string (including the normalized >form) to produce identical results, this means that all systems should work >with strings encoded either in canonical normalized order, or in any >canonical equivalent with precomposed or decomposed equivalents. > >So the question is whever normalization will actually break the semantic of >text so that its logical order cannot be infered back. This would be >impossible if strings were not encoded with additional controls that limit >the kind of reordering which may occur. > >So the question becomes first: where MUST I insert a control so that the >intended logical order can be preserved even after any canonical >normalization. This seems simple to determine, and we can deduce where a >combining or non combining control with class 0 can be used to preserve the >semantic across all normalizations. > >The solution should be thought without forgetting the case of precombined >hebrew characters (like SHIN LETTER WITH SHIN DOT, which is decomposable and >excluded from recomposition): the existence of these encoded characters >limit the places where a control can be safely inserted. > > If these characters are not actually used and are treated as deprecated, or for internal use of rendering engines only, and since they are never generated by normalisation, I think we can safely ignore their possible effect on where a control might be inserted. >And then the second question is: which control should we use ? To preserve >the canonical equivalences between strings intended to have identical >semantics and rendering independantly of the set of glyphs used in fonts, we >need to specify this control character exactly, and dangerous assumptions >(e.g. about the final rendering of holam with shin dot as a single shifted >holam with a removed shin dot) should not be used. > >Currently Unicode offers CGJ, ZWJ and ZWNJ to control the semantic of the >encoded text, but the rules about CGJ usage are not clear for now, and it >seems prudent not to use it between characters whose combining class order >is not a problem, because Unicode now states that CGJ should be used to >control the reordering of characters through normalization by splitting the >combining sequences in which such canonical reordering is allowed in >normalizations. (So, for example, this excludes to use to create >a inverted nun as it looks like an abuse of the defined usage for CGJ; and >it seems much cleaner to ask for a variant selector, or for a separate >encoding of the inverted nun as a separate character with its own >properties) > > Personally, I don't see a serious problem in the presence in a text of redundant CGJs. They do of course waste storage space. But they are ignored in collation etc, unless the default is deliberately overridden, and in rendering. So what is the problem with having an extra CGJ within a sequence of combining characters which is already in canonical order? >Without this convention on the correct use of these controls, there will be >several encoding alternatives that won't be canonically equivalent (per >Unicode definition), and it will be needed to define additional folding >rules in renderers. > > The only required folding rule is one already in place, that CGJ is default ignorable. >My opinion is that a CGJ should be used always between each vowel group >added to the same base letter or NBSP, so that we have a guarantee that >independant vowel groups won't be mixed (In that case this CGJ plays the >role of the "missing consonnant letter"). I'm not sure that we need a CGJ >between the base consonnant group and the first vowel group, as the semantic >is preserved by normalization, and as the logical reordering is still >possible from a normalized form (I mean that the same normalization >algorithm can be used with the corrected combining classes, to produce a >logical reordering of the grapheme clusters in the string). > > So, what you are saying here is that you don't need a CGJ in the sequence . If that CGJ is present, this sequence is normalised and it is also more or less logical - dagesh logically comes after shin/sin dot but these pairs can easily be made into collation contractions to ensure correct collation; and this sequence is also fairly easy to render. But if the CGJ is absent, normalisation turns this string into , which is much less logical and causes considerable problems both for rendering (if the rendering engine does not reorder at the character level) and for collation (a very large number of contractions need to be defined). Omitting the CGJ results in no loss of information or distinctions, but it does make implementation more difficult. This discussion is not dealing with right meteg which is a different issue. >This rule with CGJ only for the case of multiple vowel groups seems simple >to implement and produces the appropriate effect of keeping the characters >identity. ... > Agreed. In a case like , the CGJ is necessary to ensure that the vowels and accents remain properly ordered. But the CGJ is unnecessary when one of the vowels is holam, unless there are also two low accents or two high accents, as there is then no typographical interaction. >... This seems simple, but there remains a few questions for possibly >extremely rare cases: > >1) Can there be cases of "missing vowel points" in vowel groups, i.e. a >letter with multiple cantillation marks on only one (or zero) vowel >(possibly on top of a missing consonnant coded with CGJ)? ... > It is reasonably common to have two accents with one vowel on a base character. These are not missing consonant issues (when there is a missing consonant there are always two vowel points) but cases of dual cantillation, in the Ten Commandments and a few other places, also some unusual cases in which two accents fall on the same base character. As I mentioned before, the only problem sequences are a very few in which meteg should appear to the left of a low centred accent. Since both and (non-canonical) occur, the latter must be protected from reordering by inserting CGJ before the meteg, or some similar mechanism. >... For now we don't >have it, so we may need another control for the case of multiple >cantillation marks whose relative order is not preserved by the >normalization (because not all marks with the same positioning constraints >are given the same combining class). For now this problem only occurs with >accent TSINOR which has a correct combining class 228, but distinct from the >other accents which are also combining above-left and assigned the class >230. So the Unicode normalization will always reorder accent TSINOR(=zarqa) >before any of the other accents positioned above-left: >SEGOL(=segolta), PACHTA(=qadma), PAZER(=pazer gadol), TELISHA QETANA >(=tarsa). Is this an issue for interpretation? Must we be able to make the >distinction between the logical orders and >. If so, should we specify that only the order > be encoded with a CGJ in the middle to >preserve the reversed order even after normalization? > > Pazer should not be in this list. There are no cases in the Hebrew Bible text that I have of any combination of "ACCENT ZINOR" (I deliberately do not correct the transliteration to TSINOR as this is in fact NOT the accent tsinor but the accent tsinorit, but the name error cannot be corrected) with the other above left accents. Accents are very rarely used in non-biblical texts, but of course I can give no guarantee about all other languages using Hebrew script. >2) Can there be cases of multiple consonnant modifiers whose relative order >is important for semantics? This case can only occur, for now, between rafe >and varika, which share the same interaction position above the letter and >are assigned distinct combining classes 23 and 26 (and whose relative order >is not kept by normalization). From what I have read, this case won't happen >in actual texts, as rafe and varika are mutually exclusive in their usage, >one being a variant of the other with similar meaning. It may happen if they >are coded using multiple Qere marking to accomodate a text to several >cultures, but I don't think that such reordering of and > would create interpretation problems for actual readers (the >Unicode normalization only keeps the order , but cannot >represent the order which could require a CGJ in the middle. > > > Varika is never used in Hebrew. It is used in Judeo-Spanish, I understand. You will need to check this point with those who know Judeo-Spanish or those who proposed this as a separate Unicode character. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From tiro@tiro.com Sat Nov 1 20:39:57 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 01 Nov 2003 20:39:57 -0500 (EST) Received: from priv-edtnes46.telusplanet.net (defout.telus.net [199.185.220.240]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA21dvG12983 for ; Sat, 1 Nov 2003 20:39:57 -0500 Received: from Sophia.tiro.com ([66.183.177.193]) by priv-edtnes46.telusplanet.net (InterMail vM.6.00.05.00 201-2115-109-20030812) with ESMTP id <20031102013949.IFIY27778.priv-edtnes46.telusplanet.net@Sophia.tiro.com>; Sat, 1 Nov 2003 18:39:49 -0700 Message-Id: <5.2.1.1.1.20031101172744.02cc0a18@pop3.portal.ca> X-Sender: tiro@pop3.portal.ca X-Mailer: QUALCOMM Windows Eudora Version 5.2.1 Date: Sat, 01 Nov 2003 17:39:44 -0800 To: "Philippe Verdy" From: John Hudson Subject: [hebrew] Re: Hebrew composition model, with cantillation marks Cc: In-Reply-To: <009801c3a0d6$47fc04d0$2101a8c0@asimov> References: <0c8301c3a098$87739d00$2101a8c0@asimov> <038a01c39d86$44d90e40$2101a8c0@asimov> <03f401c39dc7$52205810$2101a8c0@asimov> <3F9FAEDA.8080503@qaya.org> <3FA1EFE7.1050905@kli.org> <3FA24A00.3020603@qaya.org> <0ae601c39fdf$69e95a90$2101a8c0@asimov> <3FA2CBF8.8050703@qaya.org> <0b6901c3a01d$bf178030$2101a8c0@asimov> <3FA3A288.5030501@qaya.org> <0c8301c3a098$87739d00$2101a8c0@asimov> <5.2.1.1.1.20031101110610.0304eed0@pop3.portal.ca> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-archive-position: 621 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: tiro@tiro.com Precedence: bulk X-list: hebrew At 04:14 PM 11/1/2003, Philippe Verdy wrote: > > So the ability to render Unicode canonically ordered Hebrew correctly > > relies on the normalised character string being reordered to 'display > > order'. The question is: is this something that Uniscribe should do, or is > > it something that an application should do before calling Uniscribe? > >Reordering characters in logical order seems a task that is complex to write >in fonts, but not impossible. Actually, I reordering *characters* in fonts is impossible: fonts are collections of glyphs. So perhaps what you are suggesting is that glyphs can be reordered to simulate character reordering. This depends on the font technology being used and, of course, on the complexity of the reordering necessary. AAT and Graphite can handle very complex glyph re-ordering, since they were designed with the assumption that all rendering would take place in the font. OpenType was designed with the assumption that generic character processing would take place outside the font, and the OpenType GSUB and GPOS lookup architecture is *very* ill-suited to glyph re-ordering. Frankly, I don't think much reordering can be done inside and OpenType font -- there is, for example, no recognition of states, e.g. word initial or word final, within OpenType, and no direct way of substituting yx -> xy --, and even if it could this would be an incredibly inefficient way of doing what can be done much faster and with less processing requirement at the character level. Glyph lookup processing is very processing intensive, and in a very complex font the impact on rendering speed is clearly visible onscreen. ... >This is a place where font marking would use some version tags, maintained >in a registry of known algorithms (maintained by OpenType.org, or a more >neutral organization not headed by Agfa-Monotype ?). OpenType.org is just a marketing effort by AgfaMonotype, it has nothing to do with development of the OpenType format, which is jointly owned by Microsoft and Adobe. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC tiro@tiro.com I sometimes think that good readers are as singular, and as awesome, as great authors themselves. - JL Borges From elaine_keown@yahoo.com Sun Nov 2 00:35:55 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 00:35:55 -0500 (EST) Received: from web80708.mail.yahoo.com (web80708.mail.yahoo.com [66.163.170.65]) by unicode.org (8.11.6/8.11.6) with SMTP id hA25ZsG24014 for ; Sun, 2 Nov 2003 00:35:54 -0500 Message-ID: <20031102053552.13433.qmail@web80708.mail.yahoo.com> Received: from [66.76.151.170] by web80708.mail.yahoo.com via HTTP; Sat, 01 Nov 2003 21:35:52 PST Date: Sat, 1 Nov 2003 21:35:52 -0800 (PST) From: Elaine Keown Subject: [hebrew] Re: Hebrew composition model, with cantillation marks To: Peter Kirk , Philippe Verdy Cc: hebrew@unicode.org In-Reply-To: <3FA4432A.5020808@qaya.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 622 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: elaine_keown@yahoo.com Precedence: bulk X-list: hebrew Elaine in Texas Hi, > Varika is never used in Hebrew. It is used in > Judeo-Spanish, I > understand. You will need to check this point with I think it's used in another pointing system for Hebrew--Palestinian, I believe. Elaine __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/ From elaine_keown@yahoo.com Sun Nov 2 01:15:35 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 01:15:35 -0500 (EST) Received: from web80706.mail.yahoo.com (web80706.mail.yahoo.com [66.163.170.63]) by unicode.org (8.11.6/8.11.6) with SMTP id hA26FYG30653 for ; Sun, 2 Nov 2003 01:15:34 -0500 Message-ID: <20031102061534.73672.qmail@web80706.mail.yahoo.com> Received: from [66.76.151.170] by web80706.mail.yahoo.com via HTTP; Sat, 01 Nov 2003 22:15:34 PST Date: Sat, 1 Nov 2003 22:15:34 -0800 (PST) From: Elaine Keown Subject: [hebrew] Re: Unicode Hebrew proposal: nomenclature.. To: Michael Everson , hebrew@unicode.org Cc: kenw@sybase.com In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 623 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: elaine_keown@yahoo.com Precedence: bulk X-list: hebrew Elaine Keown still in Texas Hi, > 80-page proposal. How many characters are being > proposed? Lots--In 2001 I had drawn them in groups of 4-8, but I never counted them until I started naming them in October.....there's now an embarrassing number... > The TLG proposals were not user-friendly in format. > I hope Elaine will look at some of my proposals for > formatting. Where are your proposals for formatting? Online? I assumed that Rick McGowan was telling me that the TLG had achieved some kind of ideal, a nirvana for writing proposals. Are there other dissenting opinions out there vis a vis the perfection of the TLG proposal? Speak up now, before I write pages of my "Archaic HTML"... I'm writing a teeny site for review by you all (Southern for "you plural") before I write anything else--I've been working on it this weekend. Elaine __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/ From elaine_keown@yahoo.com Sun Nov 2 01:24:09 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 01:24:09 -0500 (EST) Received: from web80707.mail.yahoo.com (web80707.mail.yahoo.com [66.163.170.64]) by unicode.org (8.11.6/8.11.6) with SMTP id hA26O9G32053 for ; Sun, 2 Nov 2003 01:24:09 -0500 Message-ID: <20031102062408.19104.qmail@web80707.mail.yahoo.com> Received: from [66.76.151.170] by web80707.mail.yahoo.com via HTTP; Sat, 01 Nov 2003 22:24:08 PST Date: Sat, 1 Nov 2003 22:24:08 -0800 (PST) From: Elaine Keown Subject: [hebrew] Re: Unicode Hebrew proposal: nomenclature.. To: Michael Everson , hebrew@unicode.org Cc: rick@unicode.org, kenw@sybase.com In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 624 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: elaine_keown@yahoo.com Precedence: bulk X-list: hebrew Elaine Keown still in Texas Hi, > 80-page proposal. How many characters are being > proposed? Lots--In 2001 I had drawn them in groups of 4-8, but I never counted them until I started naming them in October..... > The TLG proposals were not user-friendly in format. > I hope Elaine will look at some of my proposals for > formatting. Where are your proposals for formatting? Online? I assumed that Rick McGowan was telling me that the TLG had achieved some kind of ideal, a nirvana for writing proposals. Are there other dissenting opinions out there? Speak up now, before I write pages of my "Archaic HTML".... I'm writing a teeny site for review by you all (Southern for "you plural") before I write anything else--I've been working on it this weekend. Elaine __________________________________ Do you Yahoo!? Exclusive Video Premiere - Britney Spears http://launch.yahoo.com/promos/britneyspears/ From rick@unicode.org Sun Nov 2 01:31:23 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 01:31:28 -0500 (EST) Received: from izanami (ip-216-36-75-240.dsl.sjc.megapath.net [216.36.75.240]) by unicode.org (8.11.6/8.11.6) with SMTP id hA26VLG00594; Sun, 2 Nov 2003 01:31:22 -0500 Message-Id: <200311020631.hA26VLG00594@unicode.org> To: elaine_keown@yahoo.com Subject: [hebrew] Re: Unicode Hebrew proposal: nomenclature.. Cc: hebrew@unicode.org In-Reply-To: Date: Sat, 1 Nov 2003 22:31:14 -0800 From: Rick McGowan received: by Apple.Mailer (2.95.2) X-archive-position: 625 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk X-list: hebrew > I assumed that Rick McGowan was telling me that the > TLG had achieved some kind of ideal, a nirvana for > writing proposals. Uh... gee. Did I say that? ;-) Michael may think the TLG proposals are user un-friendly, however, as examples of scholarship that answer appropriate questions of usage and frequency, they are very good. The committee found that the proposals answered their questions. Michael has written many excellent proposals also, and you would do well to look at a good number of them on his web site. I guided you to the TLG proposals specifically because they are scholarly proposals about ancient usage of a modern script in some proximity to the middle east. Very few proposals are really ideal in all ways, so I hope you don't think that any one of them is "a niravana". Rick From everson@evertype.com Sun Nov 2 09:58:20 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 09:58:21 -0500 (EST) Received: from [67.31.3.196] (dialup-67.31.0.253.Dial1.NewYork1.Level3.net [67.31.0.253]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA2EwID30436 for ; Sun, 2 Nov 2003 09:58:20 -0500 Mime-Version: 1.0 X-Sender: evr001@mail.dna.ie Message-Id: In-Reply-To: <20031102061534.73672.qmail@web80706.mail.yahoo.com> References: <20031102061534.73672.qmail@web80706.mail.yahoo.com> Date: Sun, 2 Nov 2003 09:55:59 -0500 To: hebrew@unicode.org From: Michael Everson Subject: [hebrew] Re: Unicode Hebrew proposal: nomenclature.. Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-archive-position: 626 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 22:15 -0800 2003-11-01, Elaine Keown wrote: > > 80-page proposal. How many characters are being > > proposed? > >Lots--In 2001 I had drawn them in groups of 4-8, but I never counted >them until I started naming them in October.....there's now an >embarrassing number... Ten? Fifty? Four hundred? I was asking. > > The TLG proposals were not user-friendly in format. > > I hope Elaine will look at some of my proposals for > > formatting. > >Where are your proposals for formatting? Online? http://www.evertype.com/formal.html >I assumed that Rick McGowan was telling me that the TLG had achieved >some kind of ideal, a nirvana for writing proposals. I found it hard to digest their format. -- Michael Everson * * Everson Typography * * http://www.evertype.com From mark@kli.org Sun Nov 2 10:55:15 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 10:55:15 -0500 (EST) Received: from pi.meson.org (h-66-134-26-207.NYCMNY83.covad.net [66.134.26.207]) by unicode.org (8.11.6/8.11.6) with SMTP id hA2FtED05303 for ; Sun, 2 Nov 2003 10:55:15 -0500 Received: (qmail 10607 invoked from network); 2 Nov 2003 15:55:09 -0000 Received: from dhcp1.lan.lupine.org (HELO kli.org) (@192.168.1.101) by 192.168.1.100 with SMTP; 2 Nov 2003 15:55:09 -0000 Message-ID: <3FA528DC.50701@kli.org> Date: Sun, 02 Nov 2003 10:55:08 -0500 From: "Mark E. Shoulson" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en, fr MIME-Version: 1.0 To: Peter Kirk CC: Peter Constable , hebrew@unicode.org Subject: [hebrew] Re: [OT]--history...Re: Re: meteg + hataf and text processing distinctions References: <3FA2DDC5.1000400@qaya.org> In-Reply-To: <3FA2DDC5.1000400@qaya.org> X-Enigmail-Version: 0.76.3.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 627 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: mark@kli.org Precedence: bulk X-list: hebrew Peter Kirk wrote: > On 31/10/2003 13:25, Peter Constable wrote: > >> ... >> >> So, never a case of (e.g.) munah on either the right or left side of >> some vowel? >> >> > I have never seen this. Low accents other than meteg are always to the > left of vowels. There is just one regular set of exceptions. When > there are two low vowels and a low accent on the same base character, > as with Yerushala(y)im and also the exceptional word in Exodus 20:4, > the low accent is positioned between the two vowels. I would venture to say, rather, that the accent is positioned where it always goes: on the stressed syllable. In Yerushala(y)im, the stress is on the patah, not the hiriq. Hypothetically (and maybe not; there may be some qere/ketiv out there that instantiates this), if the second vowel were the one carrying the stress, the accent would follow it. ~mark From peterkirk@qaya.org Sun Nov 2 13:07:54 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 13:07:55 -0500 (EST) Received: from ns3.eukhost.com (ns3.eukhost.com [64.5.60.201]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA2I7sD30465 for ; Sun, 2 Nov 2003 13:07:54 -0500 Received: from [213.162.124.237] (helo=qaya.org) by ns3.eukhost.com with asmtp (Exim 4.24) id 1AGMdf-0006VC-3E; Sun, 02 Nov 2003 18:07:47 +0000 Message-ID: <3FA547F2.4060802@qaya.org> Date: Sun, 02 Nov 2003 10:07:46 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20030925 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Philippe Verdy CC: hebrew@unicode.org Subject: [hebrew] Re: Hebrew composition model, with cantillation marks References: <0c8301c3a098$87739d00$2101a8c0@asimov> <038a01c39d86$44d90e40$2101a8c0@asimov> <03f401c39dc7$52205810$2101a8c0@asimov> <3F9FAEDA.8080503@qaya.org> <3FA1EFE7.1050905@kli.org> <3FA24A00.3020603@qaya.org> <0ae601c39fdf$69e95a90$2101a8c0@asimov> <3FA2CBF8.8050703@qaya.org> <0b6901c3a01d$bf178030$2101a8c0@asimov> <3FA3A288.5030501@qaya.org> <0c8301c3a098$87739d00$2101a8c0@asimov> <5.2.1.1.1.20031101110610.0304eed0@pop3.portal.ca> <003e01c3a0c1$e8db8a20$2101a8c0@asimov> <3FA4432A.5020808@qaya.org> <00bd01c3a0e2$f55fb890$2101a8c0@asimov> In-Reply-To: <00bd01c3a0e2$f55fb890$2101a8c0@asimov> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - ns3.eukhost.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-archive-position: 628 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 01/11/2003 17:45, Philippe Verdy wrote: >From: "Peter Kirk" > > >>Again keeping all of Philippe's text as he attempts to join the Hebrew >>list - so far none of his postings have appeared there... >> >> > >And I still don't know why my posts which should go there aren't appearing. >There must be something that I have missed, but I have tried several >attempts to subscribe to the Hebrew list without success. Please point me to >the page or excerpt text explaining what to do... > > > Can anyone else help Philippe here? >>On 01/11/2003 13:48, Philippe Verdy wrote: >> >> >>>The solution should be thought without forgetting the case of precombined >>>hebrew characters (like SHIN LETTER WITH SHIN DOT, which is decomposable >>> >>> >and > > >>>excluded from recomposition): the existence of these encoded characters >>>limit the places where a control can be safely inserted. >>> >>> >>> >>> >>If these characters are not actually used and are treated as deprecated, >>or for internal use of rendering engines only, and since they are never >>generated by normalisation, I think we can safely ignore their possible >>effect on where a control might be inserted. >> >> > >They are not really deprecated, as they are part of the standard mappings >from legacy charsets that encode them and which are still widely used. What >has been done is to include them in the composition exclusion list. > >But Unicode does not require that a valid text be in normalized form, just >that any text canonically equivalent to its normalized form be handled by >compliant processes exactly as this normalized form. >This means that a text using (e.g.) a precomposed SHIN LETTER WITH SHIN DOT >must be renderable even if additional points or marks are added after it, >and even if they don't logically match with the logical or normalized or >display order. > > > OK. But then combinations in any order of shin, shin dot and other Hebrew combining characters must be renderable in some way even if they do not fit meet the ordering, CGJ positioning etc requirements which we may come up with. Actually using the precomposed shin with shin dot makes the ordering more logical. It is a shame that we cannot remove these from the composition exclusion list. >>>Currently Unicode offers CGJ, ZWJ and ZWNJ to control the semantic of the >>>encoded text, but the rules about CGJ usage are not clear for now, and it >>>seems prudent not to use it between characters whose combining class >>> >>> >order > > >>>is not a problem, because Unicode now states that CGJ should be used to >>>control the reordering of characters through normalization by splitting >>> >>> >the > > >>>combining sequences in which such canonical reordering is allowed in >>>normalizations. (So, for example, this excludes to use to >>> >>> >create > > >>>a inverted nun as it looks like an abuse of the defined usage for CGJ; >>> >>> >and > > >>>it seems much cleaner to ask for a variant selector, or for a separate >>>encoding of the inverted nun as a separate character with its own >>>properties) >>> >>> >>> >>> >>Personally, I don't see a serious problem in the presence in a text of >>redundant CGJs. They do of course waste storage space. But they are >>ignored in collation etc, unless the default is deliberately overridden, >>and in rendering. So what is the problem with having an extra CGJ within >>a sequence of combining characters which is already in canonical order? >> >> > >It's not the presence of additional CGJs that will matter, but rather their >absence in the case where the usage policy would have required their >encoding for correct processing and rendering on any platform, including >after normalization by any intermediate process or network transmission. > > Agreed. The problem comes when CGJ is omitted (from text otherwise ordered as if it were used) and so the text is reordered when normalised. The standard already specifies that the reordered version must be rendered identically to the the original version (although Uniscribe fails to do this), but not that the rendering must be equally quick and efficient. The greater problem may come with collation as it is difficult to ensure that the versions with and without CGJ are treated as identical. >That's why the number of CGJ needed to represent the alternate forms should >be minimal, and that why the absence of the CGJ should be specified as >producing the same default result everywhere. In that case, any other extra >CGJ is to be ignored and take the default rendering or processing as if they >were not coded. > >In that case only, you're right: redundant CGJs will not cause problems (but >they will still cause the string to be handled as NOT canonically equivalent >to the reduced codes, possibly breaking applications that expect to find a >canonical equivalent, for example if a signed text needs to be transcripted >through systems using legacy codepages or charsets, as these extra CGJs >would not be taken into account in the transcription, breaking the canonical >equivalence, and thus the signature). > >That's an issue for example with XML signatures, which are based on an exact >binary reproductibility of the transfered text, with help of a required >normalized form when computing or checking the signature. > > > I understand the issue. I suspect it would be safer for such security related applications to use unpointed Hebrew only. But I'm not sure how widely acceptable that would be. >>>Without this convention on the correct use of these controls, there will >>> >>> >be > > >>>several encoding alternatives that won't be canonically equivalent (per >>>Unicode definition), and it will be needed to define additional folding >>>rules in renderers. >>> >>> >>The only required folding rule is one already in place, that CGJ is >>default ignorable. >> >> > >This does not apply to CGJs that are significant for semantics, only to >redundant >CGJs... The "default ignorable" property does not seem relevant here, when >some >CGJs must be kept and some not. This is what I call folding here: the >removal of >unnecessary CGJs or alternate encoding forms that may be used because of >differences of input methods. > > > My point is that for folding purposes all or most of the CGJs can be treated as semantically insignificant and so as "default ignorable". They are significant only for normalisation, where they are already not ignored by the algorithm. I suspect that none of them need to be treated as significant for folding or collation. If some are, special rules may be applied to CGJs that match particular criteria, but others will be treated as default ignorable with the desired results. >>So, what you are saying here is that you don't need a CGJ in the >>sequence . If >>that CGJ is present, this sequence is normalised and it is also more or >>less logical - dagesh logically comes after shin/sin dot but these pairs >>can easily be made into collation contractions to ensure correct >>collation; and this sequence is also fairly easy to render. But if the >>CGJ is absent, normalisation turns this string into >dagesh, (meteg,) shin/sin dot (, accent)>, which is much less logical >>and causes considerable problems both for rendering (if the rendering >>engine does not reorder at the character level) and for collation (a >>very large number of contractions need to be defined). >> >> > >Basically yes. These CGJs are not _required_, only _useful_ to reduce the >number of contractions or other transforms to infer the logical order. > > > >>Omitting the CGJ results in no loss of information or distinctions, but it >>does make implementation more difficult. >> >> > >Exactly. But Unicode conformance for all canonically equivalent strings is >at this price. You admit that this is not impossible to do, and I can say >that this algorithm is working on a finite serie of code points, so even in >the worst case where M operations would be necessary to handle each >character, the algorithm working on a string of length N would run in at >most O(M*N) operations, i.e. less that M*O(N), and thus it would be still >asymptotic to linear. The final question is how we can optimize the >algorithm to reduce M. > > > I'm not sure about this. If N is the number of characters in the combining sequence and P is the number of possible characters in each position, the number of individual contractions required for collation of the combining sequence is O(P**N), i.e. exponential, with P potentially being the entire Unicode character space. Well, fortunately it is not this bad; but in order to collate shin and sin dot as a unit, there is a potential need to define as contractions every combination of combining characters with a lower (but >0) combining class than sin dot followed by sin dot. Each of the preceding combining characters may in principle occur more than once which makes the total number of contractions infinite. Even if we allow each of the P such characters to occur only once, or else not at all, and assume normalisation, this implies a need for 2**P contractions. And even if we restrict ourselves to Hebrew characters P=15. Can implementations of the collation algorithm handle 2**15 contractions just for this one issue in Hebrew? >>This discussion is not dealing with right meteg which is a different >> >> >issue. > >I agree. Same thing for medial meteg. They will probably be better handled >if variants of meteg are encoded (probably as a separate codepoint, unless >Unicode relaxes its current rules on variant selectors, or accept to encode >Hebrew specific variant selectors for this meteg, a solution that would keep >the existing codepoint for meteg for legacy applications that don't know >these specific variant selectors). > > > A variation selector would be a good solution here. Completely separate code points would be less good as it would be less easy for software to recognise that these are essentially the same character. I still think CGJ is a good solution for right meteg, but some other mechanism is required for medial meteg. Peter Constable has been exploring this issue on the list but has had little response. >>>This rule with CGJ only for the case of multiple vowel groups seems >>> >>> >simple > > >>>to implement and produces the appropriate effect of keeping the >>> >>> >characters > > >>>identity. ... >>> >>> >>> >>Agreed. In a case like >accent2)>, the CGJ is necessary to ensure that the vowels and accents >>remain properly ordered. But the CGJ is unnecessary when one of the >>vowels is holam, unless there are also two low accents or two high >>accents, as there is then no typographical interaction. >> >> > >In other words, there's a opportunity of CGJ only between vowel groups, >where it has the role of the missing consonnant, and the CGJ may be >removed (not encoded), provided that the second vowel cannot reorder >with the previous vowel or accents. > > >>It is reasonably common to have two accents with one vowel on a base >>character. These are not missing consonant issues (when there is a >>missing consonant there are always two vowel points) but cases of dual >>cantillation, in the Ten Commandments and a few other places, also some >>unusual cases in which two accents fall on the same base character. As I >>mentioned before, the only problem sequences are a very few in which >>meteg should appear to the left of a low centred accent. Since both >> and (non-canonical) >meteg> occur, the latter must be protected from reordering by inserting >>CGJ before the meteg, or some similar mechanism. >> >> > >For the case of meteg coded after etnahta, it can be viewed as if there was >a missing (but not logically implied) consonnant with the normal composition >model where meteg normally occurs after the base and vowel. In that case >the CGJ is very appropriate. > > > This is not an appropriate way of understanding the situation here - although that doesn't imply that the CGJ solution is inappropriate. In the Ten Commandments the actual situation is rather that the passage can be read in two different ways (something like one reading as part of reading through the whole passage and another as a list of ten items in isolation) and so is provided with two separate sets of accents for these two different readings. One reading is indicated consistently to the left of the other reading. In just one place the two readings imply different vowels as well as different accents on the word. There is no question of missing consonants. One might consider providing in some kind of higher level protocol two alternative forms of the text, just as might be done for Qere and Ketiv, but in many circumstances the requirement will be to render the text as traditionally represented on paper. >>Pazer should not be in this list. There are no cases in the Hebrew Bible >>text that I have of any combination of "ACCENT ZINOR" (I deliberately do >>not correct the transliteration to TSINOR as this is in fact NOT the >>accent tsinor but the accent tsinorit, but the name error cannot be >>corrected) with the other above left accents. Accents are very rarely >>used in non-biblical texts, but of course I can give no guarantee about >>all other languages using Hebrew script. >> >> > >(if you have read my excel sheet, I gave several aliases for accent tsinor, >and I marked those names that should not be used as they collide with other >characters, so yes I had noted the problem with this name, which is not >ambiguous if prefixed by the term "accent".) > > Actually I got this wrong. The character in question is U+05AE HEBREW ACCENT ZINOR, cc=228, which corresponds to tsinor or zarqa and is not badly misnamed. The serious naming error is with U+0598 HEBREW ACCENT ZARQA, cc=230, which corresponds to tsinorit - also "to be used when Zarqa or Tsinor are placed above", but as I understand it this never happens. >I don't know if someone had studied the case of ZINOR (=tsinorit) which >does not combine in the same class as other accents with which it could >collide at the same position. What you say is that this case has still not >been >seen, but it may be discovered. If this happens, where ZINOR must be >kept logically ordered with other accents, the CGJ will be there to allow >ZINOR to occur after those accents instead of always before with the >normalized order. > > In principle, CGJ could be used to deal with this case if it is ever discovered. > > >>>2) Can there be cases of multiple consonnant modifiers whose relative >>> >>> >order > > >>>is important for semantics? This case can only occur, for now, between >>> >>> >rafe > > >>>and varika, which share the same interaction position above the letter >>> >>> >and > > >>>are assigned distinct combining classes 23 and 26 (and whose relative >>> >>> >order > > >>>is not kept by normalization). From what I have read, this case won't >>> >>> >happen > > >>>in actual texts, as rafe and varika are mutually exclusive in their >>> >>> >usage, > > >>>one being a variant of the other with similar meaning. It may happen if >>> >>> >they > > >>>are coded using multiple Qere marking to accomodate a text to several >>>cultures, but I don't think that such reordering of and >>> would create interpretation problems for actual readers >>> >>> >(the > > >>>Unicode normalization only keeps the order , but cannot >>>represent the order which could require a CGJ in the >>> >>> >middle. > > > >>Varika is never used in Hebrew. It is used in Judeo-Spanish, I >>understand. You will need to check this point with those who know >>Judeo-Spanish or those who proposed this as a separate Unicode character. >> >> > >I knew that for Hebrew. And I knew that Varika was added for Judeo-Spanish, >but still it is probable that Judeo-Spanish contains occurences of Hebrew, >and >that both Hebrew and Jedeo-Spanish Qere are marked in a Ketiv text. I don't >know if the order of accent marks is used to determine which accent is used >in either Qere. I have read a few articles explaining that those Qere marks >were sometimes used to show multiple pronunciations of the same word. But >as the Varika glyph is very distinct from any other accent or vowel, I don't >think it will cause a problem to recognize to which Qere text each accent >belongs. Also it's possible that Varika and Rafe have in fact exactly the >same >function in the Qere text, so even if they are mixed, they would not create >errors or interpretation, only two possible positions on the rendered form >(once again this is an assumption, as I don't know the phonetics of Hebrew >and Juedo-Spanish languages). Am I wrong? > > > > > > I can't help you further on this one. Anyone else? -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Sun Nov 2 13:08:06 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 13:11:50 -0500 (EST) Received: from ns3.eukhost.com (ns3.eukhost.com [64.5.60.201]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA2I86D30543; Sun, 2 Nov 2003 13:08:06 -0500 Received: from [213.162.124.237] (helo=qaya.org) by ns3.eukhost.com with asmtp (Exim 4.24) id 1AGMdm-0006Vk-V0; Sun, 02 Nov 2003 18:07:55 +0000 Message-ID: <3FA547FA.9020603@qaya.org> Date: Sun, 02 Nov 2003 10:07:54 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20030925 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Jony Rosenne CC: "'Philippe Verdy'" , unicode@unicode.org, hebrew@unicode.org Subject: [hebrew] Re: Hebrew composition model, with cantillation marks References: <001401c3a15f$a749c2e0$0400c80a@QSM4> In-Reply-To: <001401c3a15f$a749c2e0$0400c80a@QSM4> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - ns3.eukhost.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-archive-position: 629 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 02/11/2003 08:37, Jony Rosenne wrote: >>As they will share the same combining class 220, the >>canonical ordering will preserve their relative order >> >> > >Although normalization preserves the order of combining marks of the same >class, I think no meaning should be attached to it, for two reasons: > >The collation algorithm ignores such differences in order > > This isn't true of the general case. There are some cases e.g. stacking diacritics in Vietnamese and IPA, where this ordering is significant and is preserved by the collation algorithm. If you are referring to a specific case, presumably in Hebrew, please remind us which one. >The naïve user has little control over them, especially when editing. > >Jony > > > > > > > > -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From tiro@tiro.com Sun Nov 2 14:21:24 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 14:21:25 -0500 (EST) Received: from priv-edtnes51.telusplanet.net (defout.telus.net [199.185.220.240]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA2JLOD13018 for ; Sun, 2 Nov 2003 14:21:24 -0500 Received: from Sophia.tiro.com ([66.183.177.193]) by priv-edtnes51.telusplanet.net (InterMail vM.6.00.05.00 201-2115-109-20030812) with ESMTP id <20031102192117.UXWH21756.priv-edtnes51.telusplanet.net@Sophia.tiro.com>; Sun, 2 Nov 2003 12:21:17 -0700 Message-Id: <5.2.1.1.1.20031102111130.0123cfb8@pop3.portal.ca> X-Sender: tiro@pop3.portal.ca X-Mailer: QUALCOMM Windows Eudora Version 5.2.1 Date: Sun, 02 Nov 2003 11:21:12 -0800 To: Michael Everson From: John Hudson Subject: [hebrew] Re: Unicode Hebrew proposal: nomenclature.. Cc: hebrew@unicode.org In-Reply-To: References: <20031102061534.73672.qmail@web80706.mail.yahoo.com> <20031102061534.73672.qmail@web80706.mail.yahoo.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-archive-position: 630 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: tiro@tiro.com Precedence: bulk X-list: hebrew At 06:55 AM 11/2/2003, Michael Everson wrote: >> > 80-page proposal. How many characters are being >> > proposed? >> >>Lots--In 2001 I had drawn them in groups of 4-8, but I never counted them >>until I started naming them in October.....there's now an embarrassing >>number... > >Ten? Fifty? Four hundred? I was asking. Let me presume to answer on Elaine's behalf. She has collected a large number of characters, not all of which may be suitable for proposal to Unicode (digraphs, glyph variants, etc.), but all of which she and others are interested in documenting and for which she hopes to obtain expert feedback. Such documentation will not always end up as part of a proposal to encode new characters in Unicode, but will be immensely useful for determining rendering rules and also for the kind of computational analysis that Elaine is interested in. So, sensibly, Elaine is approaching the project in two phases: the first is to get all the characters, combinations of characters and possible glyph variants into a state where people can look at them, and the second is to sort out which ones need to be proposed to Unicode. So the number of characters to be proposed is not yet known, although my own estimate is that it will be somewhere between fifty and one hundred. My own recommendation to Elaine is that this be broken down into smaller, logical sets that can be easily digested, e.g. Babylonian Vocalisation. This will likely speed the acceptance of the most obvious and least contentious characters. John Hudson Tiro Typeworks www.tiro.com Vancouver, BC tiro@tiro.com I sometimes think that good readers are as singular, and as awesome, as great authors themselves. - JL Borges From peterkirk@qaya.org Sun Nov 2 18:45:41 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 18:45:46 -0500 (EST) Received: from ns3.eukhost.com (ns3.eukhost.com [64.5.60.201]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA2NjaD30298 for ; Sun, 2 Nov 2003 18:45:41 -0500 Received: from [213.162.124.237] (helo=qaya.org) by ns3.eukhost.com with asmtp (Exim 4.24) id 1AGRuT-0002By-AM; Sun, 02 Nov 2003 23:45:29 +0000 Message-ID: <3FA5971B.4040501@qaya.org> Date: Sun, 02 Nov 2003 15:45:31 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20030925 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Philippe Verdy CC: Jony Rosenne , hebrew@unicode.org Subject: [hebrew] Re: Hebrew composition model, with cantillation marks References: <001101c3a17b$818c49d0$0400c80a@QSM4> <013001c3a194$74f80970$2101a8c0@asimov> In-Reply-To: <013001c3a194$74f80970$2101a8c0@asimov> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - ns3.eukhost.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-archive-position: 631 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 02/11/2003 14:55, Philippe Verdy wrote: >Jony Rosenne wrote: > > >>It is not reasonable to turn the Hebrew encoding upside down and inside >> >> >out > > >>to support some specific solutions devised by some Biblical scholars, >> >> > > > >>solutions that happen to produce the desired rendering when used with some >>particular imperfect rendering engines and custom fonts, >> >> > >These "scholars" are not concerned by the existence of imperfect rendering >engines. ... > On the contrary, the scholars are very concerned that the only rendering engines available, at least on the platforms which they mostly use, are imperfect. They are not interested in the theoretical niceties of Unicode. They want solutions which they can use. They will quickly become frustrated with the whole Unicode process if their hard work in presenting proposals does not lead to proper implementations which do what the standard says they must do, and they will continue to use the existing legacy solutions which cause so many problems not least for interoperation with modern Hebrew. >... As long as perfect engines can be built. The need to encode >Biblical Hebrew is more complex than just rendering problems, as it is >related to the reordering of some combinations of codepoints, for which the >renderer cannot reliably decide, simply because they can't control the >different input orders that Unicode incorrectly fold into a single one >during normalization. > >Without CGJs inserted to control the order, Biblical Hebrew cannot simply be >encoded in a way that is preserved through normalization. I agree that the >proposal on SIL.org is only an attempt to simplify the rendering problem for >some cases, but it cannot be used alone, as it still does not describe how >to represent reliably multiple vowels and marks on the same base letter. > >The extensive HTML document indicated by Peter Kirk is much more useful to >understand the problems, and this effectively resolves the problems, without >needing any supplementary code points for Hebrew points >(http://www.qaya.org/academic/hebrew/Issues-Hebrew-Unicode.html) > >And with a little more documentation in the Unicode reference, the existing >codepoints can be reused for Biblical Hebrew. Only a few characters will be >really needed, as they don't have any equivalent and are missing in Unicode: >- the "inverted nun" punctuation mark >- the upper and lower dots used to mark digits or emphasize some words (his >document shows that they have distinct glyphs and representation, and can be >combined like cantillation marks with other vowel points like holam or >cantillation marks or shin/sin/dots, with a slight positioning adjustment in >a "more above" or "more below" position. > > > > > > -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Sun Nov 2 18:55:47 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 18:55:47 -0500 (EST) Received: from ns3.eukhost.com (ns3.eukhost.com [64.5.60.201]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA2NtlD01341 for ; Sun, 2 Nov 2003 18:55:47 -0500 Received: from [213.162.124.237] (helo=qaya.org) by ns3.eukhost.com with asmtp (Exim 4.24) id 1AGS4I-0002Qq-RD; Sun, 02 Nov 2003 23:55:39 +0000 Message-ID: <3FA5997C.2040509@qaya.org> Date: Sun, 02 Nov 2003 15:55:40 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5) Gecko/20030925 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Philippe Verdy CC: hebrew@unicode.org Subject: [hebrew] Re: Hebrew composition model, with cantillation marks References: <038a01c39d86$44d90e40$2101a8c0@asimov> <03f401c39dc7$52205810$2101a8c0@asimov> <3F9FAEDA.8080503@qaya.org> <3FA1EFE7.1050905@kli.org> <3FA24A00.3020603@qaya.org> <0ae601c39fdf$69e95a90$2101a8c0@asimov> <3FA2CBF8.8050703@qaya.org> <0b6901c3a01d$bf178030$2101a8c0@asimov> <3FA3A288.5030501@qaya.org> <0c8301c3a098$87739d00$2101a8c0@asimov> <3FA3EDF6.2040501@qaya.org> <013801c3a198$ca605e40$2101a8c0@asimov> In-Reply-To: <013801c3a198$ca605e40$2101a8c0@asimov> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit X-MailScanner-Information: Please contact the ISP for more information X-MailScanner: Found to be clean X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - ns3.eukhost.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-archive-position: 632 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew Philippe, your important contributions are still coming only to me although you sent a copy to the Hebrew list. Please don't count on me to forward all that you try to send there. On 02/11/2003 15:26, Philippe Verdy wrote: >From: "Peter Kirk" > > > >>I know that Microsoft has worked hard on trying to create a working >>model, including by working with (though not paying!) semi-experts like >>myself. I don't think the PUA is used in its latest version of the >>model, except perhaps for not yet encoded characters like the inverted >>nun - but these were never supported by ISO 8859. But Microsoft has so >>far failed to support rendering of canonically ordered text, instead >>expecting text to be presented in a specific non-canonical order. I must >>say I wonder if anyone has yet successfully implemented the full Unicode >>Hebrew model including proper rendering of combining sequences in any >>canonically equivalent order. Microsoft has claimed that this cannot be >>done efficiently. While I and many others are sceptical about this >>claim, such scepticism sounds rather hollow coming from those who have >>not actually implemented the Unicode Hebrew model. >> >> > >If we just use the simple Excel table I was able to create, it gives a good >idea of what the UniScribe layer can do, even if fonts and the basic Windows >GDI renderer cannot be changed easily: > >Nothing forbids Uniscribe to "normalize" its input in logical order, using >alternate combining class values matching the positioning constraints. >That way, all canonically equivalent strings (including thoise encoded in >logical order and those in NF forms and those using the "precomposed" >characters) will be correctly ordered. For this process, CGJs must be kept, >so that the intended order needed for "multi-pointed" Hebrew will be >kept during this logical normalization. > >I do think that this is exactly what a layout engine is supposed to do, >i.e. preparing strings so that they can be rendered correctly with basic >fonts. I don't think this creates a performance issue as in any case, >the layout engine should render identically all canonicaly equivalent >strings, and thus will need its normalization step to simplify the >problem... I looked in the GDI+ (for Windows XP) and UniScribe API >and there does not seem to be any problem to implement it, including >for cursor selection (which for now is quite strange and bogous in its >implementation, notably in BiDi contexts). > > I agree with the above, which is based on what TUS 4.0 section 5.13 specifies and/or recommends. >I understand the problem of the Unicode Working Commitee and its >members, with the lack of cooperation, but it's even more strange that >they did not involve volontary contributions in a open forum to try >designing a working model in Unicode before standardizing it. The same >remark applies to Microsoft when it simply ignored to consult its >intended users about rendering issues. Isn't there a Microsoft team in >Israel? Aren't there any competent standardization organism working >since long to document in some translated book the character model >of Hebrew? I looked in the index of some famous national libraries, >and such tutorial books do exist in English, French, German and Spanish. >Probably other languages as well... Microsoft and Unicode have no >excuse: these documents existed since long, and even if competent >scholars were not available to create for them a computing model, >they should have read these books and asked precisely to experts the >precise details they do not understand, or about possible contradictions >between books. > > To be fair to Microsoft, it has consulted about the biblical variant of the Hebrew character model, and is well aware of the issues involved. Its failure to implement the model adequately is not from ignorance. >So the question is: can a computer analyst take the time to search >and read those translated linguistic resources? Does it require long >and expensive searches in national libraries? Aren't the Library of >Congress or the Bibliothèque Nationale de France involved at least >as volontary members of Unicode to make such searches? Aren't >there many Universities with students of languages around the world >that could be involved to make this search as a cursus project? > >I think that similar problems can be solved like this for Tibetan and >Old Brahmic languages, even if the national standardization organisms >do not have the resources, money, time and competence to make these >searches themselves. Waiting passively for submissions is not a good >strategy, and seeking people that could help in the projects seems >a better long term solution for old languages: it's not to these people >to adapt to Unicode, it's up to Unicode to try adapting its communication >with them, in terms that can be understood by them (don't speak about >combining class values with linguists: that's alien for them). > > > > > > -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From mark@kli.org Sun Nov 2 20:47:37 2003 Received: with ECARTIS (v1.0.0; list hebrew); Sun, 02 Nov 2003 20:48:07 -0500 (EST) Received: from pi.meson.org (h-66-134-26-207.NYCMNY83.covad.net [66.134.26.207]) by unicode.org (8.11.6/8.11.6) with SMTP id hA31lDD00613 for ; Sun, 2 Nov 2003 20:47:37 -0500 Received: (qmail 12261 invoked from network); 3 Nov 2003 01:46:38 -0000 Received: from dhcp1.lan.lupine.org (HELO kli.org) (@192.168.1.101) by 192.168.1.100 with SMTP; 3 Nov 2003 01:46:38 -0000 Message-ID: <3FA5B2DA.5060205@kli.org> Date: Sun, 02 Nov 2003 20:43:54 -0500 From: "Mark E. Shoulson" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en, fr MIME-Version: 1.0 To: Peter Kirk CC: Philippe Verdy , hebrew@unicode.org Subject: [hebrew] Re: Hebrew composition model, with cantillation marks References: <0c8301c3a098$87739d00$2101a8c0@asimov> <038a01c39d86$44d90e40$2101a8c0@asimov> <03f401c39dc7$52205810$2101a8c0@asimov> <3F9FAEDA.8080503@qaya.org> <3FA1EFE7.1050905@kli.org> <3FA24A00.3020603@qaya.org> <0ae601c39fdf$69e95a90$2101a8c0@asimov> <3FA2CBF8.8050703@qaya.org> <0b6901c3a01d$bf178030$2101a8c0@asimov> <3FA3A288.5030501@qaya.org> <0c8301c3a098$87739d00$2101a8c0@asimov> <5.2.1.1.1.20031101110610.0304eed0@pop3.portal.ca> <003e01c3a0c1$e8db8a20$2101a8c0@asimov> <3FA4432A.5020808@qaya.org> <00bd01c3a0e2$f55fb890$2101a8c0@asimov> <3FA547F2.4060802@qaya.org> In-Reply-To: <3FA547F2.4060802@qaya.org> X-Enigmail-Version: 0.76.3.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 633 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: mark@kli.org Precedence: bulk X-list: hebrew Peter Kirk wrote: > On 01/11/2003 17:45, Philippe Verdy wrote: > > I understand the issue. I suspect it would be safer for such security > related applications to use unpointed Hebrew only. But I'm not sure > how widely acceptable that would be. Jony can verify this, but it's likely not a big deal. It should be borne in mind that more than 90% of Modern Hebrew as used by Modern Israelis on a day to day basis (outside of prayer-books and such, and those only for the religious) is unpointed. ISO8859-8 could have had vowel-points, but didn't, and while if you asked me I'd have said that was a mistake, it remains that ISO8859-8 was/is sufficient for the overwhelming majority of the needs of the Internet's Hebrew-speaking population. Unicode wants to be sufficient for more than that, and that is as it should be, of course, but it's still true that almost everyone who will use it for Hebrew will use vowel-points maybe twice a page, and accents pretty much never (and in all honesty, even people who are concerned with accents will by and large not care if the meteg is always on the left. This is not to say that Unicode shouldn't handle the case, but we mustn't lose our perspective on it: it is a very obscure and marginal case). > Actually I got this wrong. The character in question is U+05AE HEBREW > ACCENT ZINOR, cc=228, which corresponds to tsinor or zarqa and is not > badly misnamed. The serious naming error is with U+0598 HEBREW ACCENT > ZARQA, cc=230, which corresponds to tsinorit - also "to be used when > Zarqa or Tsinor are placed above", but as I understand it this never > happens. I'd like to get this straightened out in future write-ups of Unicode: U+05AE should be used for *all* instances of tsinor and zarqa, including "auxiliary" ones (placed on stressed syllables when the stress is not final), whether or not the particular style of printing prefers top-left or top-center (of last letter). I'd almost venture to say that *most* printings I've seen place zarqa (and segol, for that matter) squarely on top of the letter. Top-left is the correct combining code for it, and we'll leave the exact details of how it is rendered in each font up to the font designer. U+0598 should be used *only* for the tsinnorit, which can't occur on the final letter of the word (wellllll... I'm not sure if it ever happens, but I suppose it could be that it's on the final letter of one word because that word is considered joined by maqaf to the next word, but the maqaf is left out because the tsinnorit indicates the joining enough... This sort of thing happens in the 3Books system, sorry). I think this needs to be spelled out in the annotation. ~mark From rosennej@qsm.co.il Mon Nov 3 00:38:26 2003 Received: with ECARTIS (v1.0.0; list hebrew); Mon, 03 Nov 2003 00:38:27 -0500 (EST) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.11.6/8.11.6) with ESMTP id hA35cQD31083 for ; Mon, 3 Nov 2003 00:38:26 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.9.3p2/8.9.3) with SMTP id VAA15876; Sun, 2 Nov 2003 21:38:19 -0800 (PST) (envelope-from rosennej@qsm.co.il) Received: from [212.235.72.130] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id p740LCA2 authenticated by POP; Sun, 02 Nov 2003 21:38:13 -0700 (PST) From: "Jony Rosenne" To: "'Philippe Verdy'" Cc: Subject: [hebrew] Re: Hebrew composition model, with cantillation marks Date: Mon, 3 Nov 2003 07:38:23 +0200 Message-ID: <002101c3a1cc$b3e893b0$0400c80a@QSM4> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_0022_01C3A1DD.777163B0" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4510 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 In-Reply-To: <00c901c3a18a$9b33ae00$2101a8c0@asimov> Importance: Normal X-archive-position: 634 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew This is a multi-part message in MIME format. ------=_NextPart_000_0022_01C3A1DD.777163B0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit In your system, all fonts are awful. Attached a better version of the sample plus the screenshot. Jony > -----Original Message----- > From: Philippe Verdy [mailto:verdy_p@wanadoo.fr] > Sent: Sunday, November 02, 2003 11:45 PM > To: Jony Rosenne > Cc: hebrew@unicode.org > Subject: Re: Hebrew composition model, with cantillation marks > > > Your document renders all the 4 fonts with the correct places > for me (I use Office 2000 on Windows XP, with your embedded > fonts for Guttman Frnew and Ezra SIL). > > Despite the fonts look & feel vary, thee diacritics are all > placed correctly, including cantillation marks which are > ordered correctly with all of them, including when they > collide for the same position below the character. > > I don't see what you mean in your message when you say: > > > With Word XP, the font Guttman Frnew seems to be correct > and Ezra SIL > > incorrect, and the standard fonts such as Times New Roman > and Tahoma > > just awful. See attachment. > > The difference may be that I use Word 2000 from Office 2000 > Professional, not Word XP. Also my Windows XP system is > installed with the Unicode support for almost all languages > supported by Windows, and all its charset converters, all its > input methods, and all its default fonts. So yes it is > prepared to support Hebrew, as well as Arabic (and my Word > installation contains the localization components for Arabic > as well), but I don't have the Word MUI for Hebrew. > > If you want to demonstrate things, you should prepare a image > screenshot of the document from the print preview, in a GIF > image for example, as your demonstration did not convince me... > > You'll see here the snapshot I get in the attached image. > Beside the common problem of the directionality of the > English text which is instable in a Hebrew document in Word > 2000, the Hebrew text is correct... But is it really "text" ? > You only give a few space-separated characters... > ------=_NextPart_000_0022_01C3A1DD.777163B0 Content-Type: image/gif; name="order3.gif" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="order3.gif" R0lGODlhugHGAfcAAAAAAAAASgAAcwBKSgBKcwBKnABzcwBznABzvUoAAEoASkoAc0pKAEpKnEpz c0pzvUqcnEqc3nMAAHMASnMAc3NKAHNKnHNzAHNzSnNzc3NzvXOcnHO9vXO93nO9/5xKAJxKSpxK c5xzAJxznJycSpycc5ycvZyc3pze3pze/71zAL1zSr1zc72cSr3enL3/vb3/3r3//96cSt6cc97e nN7/vd7/3t7///+9c/+9nP/enP/evf//vf//3v////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// /////////////////////////////////////////////////////ywAAAAAugHGAQAI/gB9CBxI sKDBgwgTKlzIsKHDhxAjSpxIsaLFixgzatzIsaPHjyBDihxJsqTJkyhTqlzJsqXLlzBjypxJs6bN mzhz6tzJs6fPn0CDCh1KtKjRo0iTKl3KtKnTp1CjSp1KtarVq1izat3KtavXr2DDih1LtqzZs2jT RuzxAYDbt3DhIvDJQ0LcAjcc6kgQF4CAGGqXqugLYG5gkGwJEzass65iAHgFDu5reLLcw0z3AgiQ AjNJGW4ZExws+qZjyHkFao48UDNng6ALe87M9/VsyX85xi4tEDTvmpN540BN0HXngrtvKzWufG/u jckNDv8907Ht4glYq659nGB05UeZ/t8G/Vzj96Cay7fOnnr75u4Dz4MnKn60X8CnAXhImF8/wsRv qWeff6dFMNBk+xVIUH/7IcRgQYkJeJF8CGn2mGgI+qCgd3Bd19Bw7+nFHWyhzRfeiALlV4ANbcVl IIQtxsWbZXe1xyCIbkVwI1wvAlgjjIsdqNiLDEUIGIeyjSaghUNqaNdbHuAIAJFS8vjQadopVF98 JZpYVH1VKqbeac9NRqQPkz2HZXuSubUBX28ROdmbVgIoZ5cpPllmjj4w6aJDsSXJpaC4HblelrFl OSecfPoA4p33PWSZhAZt2RueXgpl6ZpChtipdpOxFpt6AJZmWYNsQeqfD6mi6VaD/gIldl2o7dGa Z6QRBVraeWwtmYBolp4a6wcG2jkQgLAWGaOnFaKIJHWZ8rSpXWM+CeuWwd26KnbMurqtQRkWtBqb PsT2IrZ4OkbpQpYG2pevRCIL7qsHbdjmtw2522ilzg4KbbQ6TYtrp7AmyuZerIEoobEEJxTusyQm aTB22qlraEb6wrWuvwc93JDHkvaVrHseXkoowD4JLODDZi6U7UHngTzgyN4+lmTLC1msG6a9Xszv wDM/hCPND+l7Zp/9mvwvyjep7HO4DP/XInUT30u01UDajFrU/FHr84SYormxvB3Tq1CpMuf6loeW lhs20zs5XRDULR6dtd0CgQiq/tll40t2Qlw76PXOJ6uwMYUNI/Ro4hbheGbbiMONk9xB34v3sVMj VLW3V3M+75QM4ayQztC9rZDCXzM+d7dpwwy6QaVyW7LbJ0ueE+WMb9431Zi27jlypgNP3OiDl157 nqJZB9/nRINoKt8L+YaQ6Eh3q7Trl9u+Eu5Yk7y88AvXXXnfNLctbtIGkW5e8HthuC+rH5SXNvW/ MwRalsNCD7npg2WvfUrcqx+ANnaa8vXLdzIDEP5YNYK8DDB1A1HfQ9oVvOkMRG9s6pnqOvU8fCXk fuRylFvUs7/aJcZ//zsJ9/5WPb6xRVQjvFjsMAc9grAwgk+6TmJYwyRYvTA1/vYq2lt2FTzS3Kpk zjnSDfMGJffU8INhmwwS0fcyG7YFhSkkiY8S9q4Y+KiJTuwLawBkG045qYs4ROOCnqTGMP4IfpQB 1BDzp7U50u4xf+lPDK3Ylwi4C4sZ05jPtpgaP9kMi1kMCY3mGKa/fLGGVcLbIt9SMTYGyFB6BNoF 3yidPxWkPxsD0nMeaTPDBFKQmZSQZQwjJUTW7JKfk1EL6+jKRDoFlBC0pS53ycte+vKXwAymMIdJ zGIa85jITKYyl8nMZjrzmdCMpjSnSc1qWvOa2MymNrfJzW5685vgDKc4x0nOcprznOhMpzrXyc52 uvOd8IynPMVSl9lhpJ7f/rsIPufJTud5xJ8cASg/07nPjhR0IwcdqEIXytB7WoBNPKCAz1aQz9E9 dEESLQhFIcKDi0YwowTZ6EN6EIJcUoSkJm2oWQZTHtBcB0SdK1tLmQXTkM30pU/sWi0dUpedqtQr PuJMUDvjrpjS0C1CjdFriqqstaVgqHf04EEMSZgFaolRj7HqT7PSHwR8cS400uoa5fLVV4pVW6Ep a1hD+LM6hvJ8ddTkVrPy1S/ihUZvheMQ7XoDvKa0rsu6qyAZQtUmiSiur5srV9a6Vinl1ax9hQte HJtSyDZ2sDmzpM3OOta4claxUfGRgfJjSkqyFXYxGi0bS+uWz4r2jF0K/tRno7rZ07rOs7YFrVQi +gIJ2EYFEPDt8lSwtI/29rfBLRlxG8Jb4UomucMt7gTZwxGE5Va3UMHBXHhwghRN4KkmAJdRL7jd 7mrouz0I79zG66jyehe84tXNbCEig/lidykyOBoOKMWD7wJKv/z1r/0AnLr+VpQi9b2uRBJ8X6yo YGTLhdl8Hzw3qs2WwqOxsILpa19AdbjBR8FwnhLkXPi5cjAFWCPRWvWxZNWFxLZhscuOB5sP2+do DAZxVSKct8hERwaPPY2hcGBVIFdWMqIhcl5+nFcJas7G7lFPjnU8lRyzeJ91kW6bApAaGa9Ry+Ua I7Hy9JosO8SI9oNy/ptwrGYqAwUHttGBAvIpA3smBM7HkTOd7XwQPKtmzjDjc0WU3BFCu1kqe3kR j9fY4UQjuV4SsK+j0fSburT5zpfuc6YPvZMf6rlsPvU0oOe1U1Hns38fMfRGVM3pp4CmAx/4TRIX DABYyzoBj+WQraeK6yNLhNUZAXarmRImwviUibT8dVyPXdjHILLZhh02UzIZF0F3TWvWhjS2Dyw1 xL4PcMtKtrSbQkpOQqTckt1w1rKq7k16+7HFxqOvxw2USXpyIva2Er61Fupw87sh6I42vfFrs2xD 8TEGv61iEj4oxM72lIrZ9MAnTvGKWxwnesyjXdh7bVRuXJ+azXVm/uMicsGRfN4X30nGYyDQiay8 5S4POco7DkuLrDzlUmELw8/9gZ3j/OdAD7rQh6lzblek6EMnp2bADJGlJ32cO6iAzx0S9ak//epY z7rWt871rnv962APu9jHTvaym/3saE+72tfO9ra7/e1wj7vc5073utv97njPu973zve++/3vgA+8 4AdP+MIb/vCIT7ziF8/4xjv+8ZCPvOQnj0OmUz6FFrL8QsJUZvOGJd99sTrdmaR5zUXRjmOxkN0c w/G9O6b0oiywXWAvkBnk0/ZFod+IE58YMMuAZkZKiHUpYuD0CXgo5ppe6/Xee4cMR7k59Y6a6/w9 6hcl+X1eft6b/s8QHJUH5vUCaUT2MkXRxwT7kef+yM22xIOwYObVYxv6foL+rTNJrDT6V/4VYpkA oKAtWkZ+3YE6TccouaEieWFvEaCAmBMZQ+NuUsVXHZMbD5gv38YDGuA6+xEbyWI4gFGBn2RJqLIs SXVyekVj0YRqr5RYw8EaMtOCqSEzAmgyKGgQOQAfunNY5eEY2vF8y+ODfIQai7QfQ1gQw5EbstId D1SE9vNt2rU6rzIpXtQif8GE8fEcGGQy2mEpw3FsxwRC+aMeMDg+IqQo+tNrEEMRtiIRLHUxKNYe QEgQcXiCBdACrNRaImAY45JGVNIoPiIAF3CHNTgoZ2JmneIW/hlgIKTxh4GIbKJhZEjSI20hZdYz ZdXkGGfCgWk0MqTDeiGIKyf0ibCnQKcVSJUBNG+4SSUzh51yJh7ThuuRWK8HhT7UFp8FcbwROQ+z Q4WUAEeTHkdSU3xoRdqHTJrxOHxBJC5VfY2yjBHzIqwIR6PYImeFI3ujHqnIRKtoPW5DiZR4HQc1 ixyCjXL1jBXWSVLVjW6IKxDnKeJ4K99Xcsl0jNxyJxKShbBohK2VgJqkfmy4R+AGPfnYJj1oPdGo juN4Mc5oemFDHgpZjsLziw+AjlfjkAm5ZhZ4HTIwAHwBKyrghcf0jtUzgnI1LsEHV3hxkjREe7Eh aCw0kK5S/pDbODsWySXf+D2vJ5IIaZOVVX8akoH6mI41eSlIeEUNQY/n9X/E8WnatJBuwxrKkz52 gRdR+YlUWTwrWRGqdzYt0oGoODwiNJPM6I0P6SGheEZERJZNmFinE31DiZC8+DGRIgNzUVN02U2p OGt0RDOzuER92RYl44+bB2EAeRAv+ZUy+YPc+JaMySyeeCtp+ZB55ZOK45ZAU5ODYU8ihWweUHw5 KQHFuExdZY5w1SCUqRkNUkX5E4BomEZsmTVeSY7fZ5CLeZm2OSsDo5ONOZnfVplCaZvPMSo+owMN kEG22AIwFAAlIHHDxJTXJiBjeEbQOTwgCEefZUZOdDWH/rmDWBmNB7mbZXkcKqmbwNmTvdlnlqmW swQrXRgx27KV21QXvEkoSMcxmKNDqdVwEIlsNQSfsJmG+3UBpiVCoNNRebE4BkqU4ckhuJkk1ngD uLeTCho650mRpqeetDWgVskmMIlN0IYoXZIYNBMdIrpub3EAVGier3mILpZD8IFLaXIkTBJnjIKb HpKZ8PEwVVIANcBGd3Kj3AhX++lul4Oj9gGkJlg2I2NB3fShxqlvhrks2YMjSWVfi8ZroVdRp1EZ 0wk0BLiCteZvsCYy/PkiCzlJY9ohpyZwSJJuh/gWaepUsRg8qlGcEAICRjdNCSod5nd5ylF8yuen u6QC/lrVA3gqqLY0gzBDe4jqGZkHI8zZqGkRb2ApqYmkL5FqqZq6qZzaqZ76qaAaqqI6qqRaqqZ6 qqiaqqrqqcO3Ea1aXdS1qjXhlBgTpGDTp7L6Ea+aEbuqEb2aq8A6TTowkXJ4NBg4fsR6QcYKlE2X rHmzrMGqRW0BKwmFZgA3rTgEpExXomSWo4MYgiAJrqN6fzcgJVeJeldlWuZ6A6R1WPu4ru36IbUl NPMqqTOaAlLCGX4yXvearymwr0dZo/iqpgC7lhciRKVkqf2qpv2BRQvrVA0bsE7lrxHbfVqTa5Ra mIjaf53Bsespl0jVsXLaQ2cmpx5LsiC7cHn6plmq/qk9hR1LWqnEg4wxa18vux41225RGn0j1ZWa 6mfxgT/D0XpA2xtCm46K4yGWKEKhWZoYgZqaCol5YqwSwHRS6yRUa3lXe7OVlxHcahFf26iE2h5G hnTWJynaAWQw0HPHcbZoS7YCsLa24bY8F2PYuhZsmz9Ne3g/NLXVoyNVy3NQKQERQI+G2LODW7jJ 6CSwh5Rhe1iSiLSTN7Z5U5ScoQI+R7mO4kh5i7krOzcJw7mv4bknlbdwVIz1+biCColehjSax7pj hh2lB7u/+CsboboSgbuX92n7JUN323SA1rt8FJq8Syq/ixG6y3OSK3k91bpgGBHNG7tcAmXRi2My /gu28QN/qCWPjGc4LrAAD7miZyYA3xu+4eq94BuJHMG1Nke4nJqxoge/n4ueCDe/c3pIEvtvCotV 5laA7KaV/OumEZGxYLR5cbW3h0dtPMsQCry860cYoQl6ckqh26ap6IbAF3wRGZy7/iZGChZwGhqt IjzCJFzC/PetEWGtJkwURdihIQMXRDikL1zAFmGF/wjDp8qkq4bCK9zD56TDGgHEPnwUKom8k6i9 Q3wT0rMzjJrETvzEUBzFUjzFVFzFVnzFWJzFWrzFXNzFXvzFYBzGYjzGZFzGZnzGaJzGarzGbNzG bvzGcBzHcjzHdFzHdnzHeJzHerzHfNzHfvzH/oAcyII8yIRcyIZ8yIicyIq8yIzcyI4MFu7ic0V8 E5NsxXokyUd8dB1crz0rw09MSpjsyQHpbde7EJVMxRaCwZlcuqRcylwpykkcOZ3MvSNHy6gFy0Ps Ow5xyu2Ly69syySsy9fqy8yFldhLzJCXSaUBfosjPB5wShonVzzYZatssQvsZJ8oS7EHW/tpNLdM gRPMn1KlzFzXjtRqzN6SJUkIzTGAzdMchr5WnVLZzTaTLEZibwakGPY8iWvbRyeIrhmKw1nXjnbb LafhIdPMzu48ldQMy/L8iYdTcEoImAygsnKoNbW4GRUdekoJwW1KGLjKaUs8IIxRf0yCjIkF/jIL PTy8jI6Si81cMiN48kVE8qjach0WMkpKJZ4keBxbyiEyzcND984E+ogAQAAVAJaa2D1pJCFE3dLE h87FzNLUSC5ZSDsCsiEzNKfLvI8MzNBcR1VqAtZw5EfMMpAqLdVPXc0aAdNthVmVDIx6hTdZOMlO BtNijcTSBm3y4ykC6Li2WCs1tNJihsy1bFJ8rURsDY/4YRf2JNd2jc4SlNhYJ8EDs9QgFJdIydSM Pc+FDcwmtzGWrdPSPDhyPc+vMcmRPTCjrdecViXnPDBOd5bOKIycLZ1fs9aGTTwQCdtNrdilHSmn DSSpvdirjYVxEdug7WakCNGYxNCKCrVv/pnWwf3Zrq1tlNLcY0XaTm3afMFfjt0Zqr3YOqPdv/10 Vbnd6/gXz9t7gknd3U3Vu/2csseNpFPJOpPe3FKU/Ujeg6PfnZ10Ky2GbjGmmegXvfWa8J3bZA3V MVeOAw7c8c3f4lvXxu3fkRLh6B3e6EhCfOEACSB/APAm0Oc3PguF1m1QUn1Gs/Ol+F08wpnNkSsg x93Y3PilSSdF3cEkxhvCJ7gyT6Tj+pni8nrNK+4tOG2AEs7guHLQOw4n3D1IGL4nzMLj1+1mzcaR mmQZdqMrzPPWEdfQGEvD87wuWf7dS17mQ4bRNETjUy6jAewWWr7ch2aKTgYis+OfGyTO/pTUo/I9 5kZOz5Rx34uN29zi0W0u5ZqkPnZ+5ECXHweIzntI3N/TOpBu40RuzS/t6Nwc6aBY6NjszdvMR1lt zJdu6I+c6qq+6qze6q7+6rAe67I+67Re67Z+67h+yJamsyLkSu0ZFbuOqsEOEb9useHaE8Oe67Lq wpstJHRewyW5uKvz7PYhIc3eJtROg2RKa/oMdC5s27eSqf94j4NN1hvx7eXuygimNVrL7jjnwkbK LU2MEfBuPU7XEfU+RUItR1pjX+3o1RbXoXGJJAicEQIf2M7sEQcvs0tNEq2EEQ8f8Jp079V+5RfR oRTv7BY/7cPJFzKd7SArcWtIcR3a/vDhzusKr0kmf0bi7ioCsvJEbRJJ6LWAab8NBpN3XqEogfPo 3Mz4DjQ5L74j4fMXQfQDB5M+OfMvgfTfpvQ/T4mv6fQqJO1PS/UTN5A47h4tP+5rPvF8sfVYr7HV s/UjN++QKfEUvqLCbBKw+Fq0SBHFPi9pbzdrX5kgqZPOx5Z4P275FhmkO+lLH+YoifJMlD19Ty6A P8BC/9X7vkmFOHsSH0eSUYIOzPZB8p8gYW+80X4lsfcPbvYqtUhFBvAwIfpslYMcYfq3RfaMD/qe P2wufNG2uhKx724hfcKhhOc2v76QnxGv32q1T6CLT/v76TgjEfzG3xK/z1G9T/IQ/rlKwPH8AP0R tQ/9LrH8V9L8V9/flxH9bt79x8/9068S2M9T2n/0y3YTlq3zPy9uItHAbJrN3nbsoX+xG5/yNgPy M5b/98/bpAwQEXwMJFiQhwQACRUuZJhQYEGIESVOpFjR4kWMGTVu5NjR40eMKhomDJAC5MmNIkeW ROlRZUOWLTUeHFnTYUWaNkc+lNnT50+gQYUOJVrU6FGkSZUuZdrUaUccCgvY+CAgxlOOURNOrXoV 61ewYcWOJSu0R9eyGc9abXlWIc+0FN3eRDkXANy4efXu5dvX71/AgQX3XOtVcOHBiRUvZkwU8eDH jSVPplzZ8mXMmTVv5tzZ82fQ/qFFjyZd2vRp1KlVr2bd2vVr2LFlz6Zd2/Zt3Ll17+bd2/dv4MGF Dyde3Phx5MmVL2fe3Plz6NGlT6de3fp17Nm1b+fe3ft38OHFjydf3vx59OnVr2ff3v17+PHlz6df 3/59/Pn17+ff3/9/AAMUcEACCzTwQAQTVHBBBht08EEII5RwQgortPBCDDPUcEMOO/TwQxBDFHHE n14q4IaNIiOxQR0S8MCHFgGIyaKc2OrILp0WOpEgkWb066WFXoRIhobwWorIHBOyccSDECBIKyct smvJFD8AwMacrvQqxh19QFJLwb4kyaSIkBQSrCzvIijGMUckEi6RzpTSSio1/oosyyUP6vIlH/8S E0yIzopSLDHhsqtPDt0yMiVAOTpLTjwN87LLL+v0608AuixIhUWfKhQirRIa1EOa5OxIJEsx6iEE SSMtSIcGUByoRzLDPMDKhUadtVOnPn01AYVSzTBGUzlCVVI7R5B1IFcJ6kFZymRAIE01N+W1KV8J atZDYk86tqdtMZMWRmCDtJasbAdiM9Nl78OxSxxLwhFMu5ykVkkXygUzS019+PYiIoX1Idw19QUU SABeDHXGSpHlUap2fbDLAx4siFiicX0Itc1dJzJR1nsj2HjMiS1KV+OFFsVRV38ZUlghhoN1eGCE ZIYI4Rdfupa0qNhyK2Z2/guWsdaBosKr218bpanff3GqmVeCN7W56IU6wDVIavvU6kUztUVIoKUv LnPQP23kNKKtvUxIyEPJ5HdZGRAtM+WC6pUIylnXBrXqqxWiuGaYiXZLSJ33VmgDgwFg2TSaeNKK J7eoJLJPGfolN+GImm6Z6UYrCnhmZgEXuGGhFXLSrgESABvwUbXa8WcyJ3dbAssxHhXhKM8uyHUU YVdX30EXrlWGnQfKdt1iUR46dL1LF1Xiq1Nf3XRnrYyJTR/XxZx01WLkiabWm2fe0A+KRVpqKkXi XGCOoiaI+8vpUtvvh7cC+en6T8fVRhxqH7vuvqlJd8yrVssUVz9AiSlK/jygAOiGpBO5QU9JXpnL ktbFEzGd6WNUs1/8MOe8KGGPaKc5yIzAZxCEjIomS8KBpc73sPQFDYYObJ/oHAg/5D0pcAjsUqg0 OEEcvehZYhuSrqjlgQEq74MGtNEFJQYCDNgMB4vDGN2A5L9QwetqZ8ohB5eXtw4qkS1d/B39PJg8 1XxJV7J7X+qWmEShFUtz6muX5k7iPuNN7Yw65JgBZfiSHy6vbDQsSMaSBjMGwMmMBoyJXU4EKxro K2donNtbaEa3zEFsIHYro/iEx8NlqYRw5nKekMi4GvCdMGkP4cEEUGClHelAASMsoxw7R0f0EXIm NqQIDiXJxy/68XUB/mQIS7JGS9vdrUjVg+DwAqcCD9SrYkSsJF1GZsswcnKPXpzRBgNlJcQtcpun RI2imMeyb40rbYaEyAvzFsP16TIjeJxfo8j4STAGzS7XmlcEC0nFP0HuajvLkgEaKEYSFO99mPRj 54QpK23ec4f57FfkYnDKLpKTcQg5kyp3t7ZWmiSVEkCjO1sGzzo6FCT09OUi8fnQfFKxkAyR6ftk CiRFPs8iQFog4PxZzQKmSVfejOgvuVkrbyqPJRg1qkZL0zSPMrMALaCUjErgv8vZMoZblSdGWKpH ifbRm/CDCAu88jaTyXSfM52gRV6aVIAxVIniU6IWFxnWYHqTTbE0/monP+jU0VjUawesJhfLxSuT QhUhXAUXLyfSUk9OFKYexMsKTDLAudTUeDXNCVycWBDLAjCM8VPoQi0pNT3mpEs1Mgxeu6lJmDLV jIAdjUqilMUbzIBoYcslRbxXydsK4AKwtaPHVOq80YG1r2/VI083Wb5ZAY2SPMJqGfHiXOgVC0nk O24vcwURoX70i2ySk2uROjU+XVauGe0rakZWgBoAzkhx2ltNkcSyNFlFJTbyHY3wZ1y53oyUc+UJ kNh2tRnNS36ilKBmladZo31zJxJpUZ2I5GAULkRrNGVr7uTnxQIa+LkTxRQHAAeojT3kZKcJlYqD +apYBQoEyFQw/mEJmDv+EnO6J3XYvfToQYghLFNUaci0TkzXjfXLslnaMaaqSzxlwlYiKsAvSTOC KbkiLMF9s5yQuVLke42yeaEalJdF0JDqrkjNa2Zzm938ZjjHWc5zpnOd7XxnPOdZz3vmc5/9/GdA B1rQgyZ0oQ10EPZNBNEzW/RYGm1o2Tx6I5JGYaKVQmlIZ1rTP4Gys2ZskAkgsyed3uSntRXqk5D6 ibtFNUh0sICuRuTVsd50W3DFEtwqcccemQuuNelDXt/aJLkGtktqYmkht7XWQLHbxrgmZcJczUnO rmeaJ9Jsc33J2gCuyY6TLc5ly4RNUfrSi1gblHHnUW/n7ki6/utp7ppZmsKJ+25FtMfhcP9EBw8Y kqlahGGP7Lvf7UwAwCnM70L6u+BAIdK2q+jwfGcEB4YydfV2/ZGJ163iz734RDLu6RE+itlouRHJ I94SVTMQWQIfSsoP+iqEf8TlK495ieQ95ZuffKdymmKgWEUUaO5OV6uideZ4PvSfB0UGOR8S03Xu MTkFvZCl/YjUZ2UqVbsk6linOkeWXvT3Of3pZfpeqKPyyBgThdQhhTE1M4mXtbdaXWm/iEio/nWJ yxDvYweJIft3g6ggQORF8bumBp8RuxfxSYaHLuILCLCbn11We+e7R/7esbCDPSt7ggvlQXJ5f3Ve 7BqpnNs9/gfxykeE7Sr3Wtc/svqX08z1ikY160M3+5QYPHO6Tz0ABcLOuIn6J2f5/aiCLxPie8n4 P22J1U/V8d5nTgAvsHLoUO+t6VefZteviAqy31Hamf4kQ/wI+aMPFZvg3vLp98nIAlz1HPnz25I9 P0bu3Uej3P/FKNE/81VFTIYQlhpbiNFLPR8rwI44QM1TtCNTtpPQPx0hIgiEtvqrQAu8wNLQHg+o sAXkCA3kQAzcjQ9MAATciBG8uXlRv/ELIKpLwRAkDRdkihj8P4ZQwRcklxJ8QBLswBvsnh18ChDs QSEcQiIsQiM8QiRMQiVcQiZsQid8QiiMQimcQiqsQiu8CEIszEKmCAgAADs= ------=_NextPart_000_0022_01C3A1DD.777163B0 Content-Type: application/msword; name="order3.doc" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="order3.doc" 0M8R4KGxGuEAAAAAAAAAAAAAAAAAAAAAPgADAP7/CQAGAAAAAAAAAAAAAAABAAAAJwAAAAAAAAAA EAAAKQAAAAEAAAD+////AAAAACgAAAD///////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// //////////////////////////////////////////////////////////////////////////// ///////////////////////////////////////////////////////////////////////////s pcEAW0ANBAAA8BK/AAAAAAAAMAAAAAAABgAAWgkAAA4AYmpiajQDNAMAAAAAAAAAAAAAAAAAAAAA AAANBBYANBIAAFZpAQBWaQEArQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAD//w8AAAAA AAAAAAD//w8AAAAAAAAAAAD//w8AAAAAAAAAAAAAAAAAAAAAAIgAAAAAAKwBAAAAAAAArAEAAKwB AAAAAAAArAEAAAAAAACsAQAAAAAAAKwBAAAAAAAArAEAABQAAAAAAAAAAAAAAOIBAAAAAAAA6gQA AAAAAADqBAAAAAAAAOoEAAAAAAAA6gQAABQAAAD+BAAADAAAAOIBAAAAAAAAwgcAAGoBAAAWBQAA AAAAABYFAAAAAAAAFgUAAAAAAAAWBQAAAAAAABYFAAAAAAAA8QUAAAAAAADxBQAAAAAAAPEFAAAA AAAAQQcAAAIAAABDBwAAAAAAAEMHAAAAAAAAQwcAAAAAAABDBwAAAA