From peterkirk@qaya.org Wed Feb 4 10:53:23 2004 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 04 Feb 2004 10:53:24 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i14FrMB19991 for ; Wed, 4 Feb 2004 10:53:23 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id D4CD6415878; Wed, 4 Feb 2004 15:53:06 +0000 (GMT) Message-ID: <4021156C.40007@qaya.org> Date: Wed, 04 Feb 2004 07:53:16 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Hebrew Computing list , Biblical Hebrew , hebrew@unicode.org Subject: [hebrew] New version of Ezra SIL font Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1123 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew New versions of the fonts Ezra SIL and Ezra SIL SR have just been released. See http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=EzraSIL_Home. Note that SIL is being very cautious in recommending this for use only with Office 2003. These fonts work well with any Unicode application supporting bidi text (in Windows, at least) for most Hebrew text, including full vowel pointing/niqud and most accents/teamim. It is only for ideal display of certain rare combinations of vowels and accents that Office 2003 is required, because the updated version of Uniscribe which comes with it is needed. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Wed Feb 11 19:11:37 2004 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Feb 2004 19:11:37 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1C0BaC24256 for ; Wed, 11 Feb 2004 19:11:37 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 2C713417526; Thu, 12 Feb 2004 00:11:18 +0000 (GMT) Message-ID: <402AC4B6.6030207@qaya.org> Date: Wed, 11 Feb 2004 16:11:34 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: hebrew@unicode.org Subject: [hebrew] New possibilities with ZWJ and ZWNJ Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1124 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew Last week, as I understand it, the UTC changed the rules for use of ZWJ and ZWNJ, and so they may now be used within combining character sequences (although their general category will remain Cf). See http://www.unicode.org/review/pr-27.html for more details. This affects some of the discussions which we had on this list last year about encoding of Hebrew combining sequences. The one which comes to mind first is the encoding of the various positions of meteg. Last year I was unhappy with the following suggested encodings, but now they are acceptable: - default positioning of meteg, to the left, or possibly medial for hataf vowels - meteg to the right - medial meteg if this is a hataf vowel (and the font supports medial meteg) - meteg to the left (even if this is not the default for hataf vowels) Can we now agree that this is the most suitable encoding? Are there any other issues which we might now be able to resolve in a different way? I know for example that there has been no real agreement about the encoding of holam male, i.e. vav with holam above right. It would now be permissible to propose the following encodings: - holam male - consonantal vav with holam - either consonantal vav with holam, or neutral for use in texts which do not attempt to differentiate this from holam male Any comments? (Note that my previous preferred proposal was or for holam male, and for vav with holam.) Other issues which might be simplified by defining sequences with ZWJ or ZWNJ in a combining character sequence include certain accent positioning issues, for example the positioning of the accent pashta which depends on its position in the word in a way which cannot easily be automated (as discussed on this list 19-20 December 2003). Thus might be used for the (less common) word medial variant of pashta, which should appear above its base character and to the right of certain other marks and ascenders, to distinguish it from the word final pashta which appears to the left of the base character and other marks and ascenders. (But rendering details vary between texts.) There are similar issues with some other accents. On the main Unicode list Ken Whistler has suggested, and Asmus Freytag has stated more strongly, that such conventions using Cf characters should be documented in the Unicode standard. I agree that they should be documented somewhere. So I am proposing to put together a simple proposal for such documentation, if possible embodying the consensus of this list and other Hebrew experts. The UTC will then be able to decide whether to incorporate it into the standard, or perhaps find somewhere else for it. Any comments? See http://www.qaya.org/academic/hebrew/Issues-Hebrew-Unicode.html and http://www.qsm.co.il/Hebrew/Hebrew%20Issues.htm for some further background. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From rick@unicode.org Wed Feb 11 20:40:17 2004 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Feb 2004 20:40:18 -0500 (EST) Received: from izanami (ip-216-36-75-240.dsl.sjc.megapath.net [216.36.75.240]) by unicode.org (8.11.6/8.11.6) with SMTP id i1C1eHC09304 for ; Wed, 11 Feb 2004 20:40:17 -0500 Message-Id: <200402120140.i1C1eHC09304@unicode.org> To: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ Date: Wed, 11 Feb 2004 17:40:12 -0800 From: Rick McGowan received: by Apple.Mailer (2.95.2) X-archive-position: 1125 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk X-list: hebrew > Last week, as I understand it, the UTC changed the rules for use of ZWJ > and ZWNJ, and so they may now be used within combining character > sequences (although their general category will remain Cf). See > http://www.unicode.org/review/pr-27.html for more details. It's also not quite that simple. There are a number of cases that are not well-defined. UTC basically agreed that: Base ZW[N]J NSM Base NSM ZW[N]J Base are well defined (because those are needed for Indic processing), but not where a ZW[N]J comes between two NSMs. The committee was split enough on the entire issue that we have to proceed with some caution in our thinking. Rick From peterkirk@qaya.org Thu Feb 12 06:52:18 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Feb 2004 06:52:19 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1CBqIk24138; Thu, 12 Feb 2004 06:52:18 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 7D04540C98A; Thu, 12 Feb 2004 11:52:04 +0000 (GMT) Message-ID: <402B68EA.5060201@qaya.org> Date: Thu, 12 Feb 2004 03:52:10 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Rick McGowan Cc: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ References: <200402120140.i1C1eHC09304@unicode.org> In-Reply-To: <200402120140.i1C1eHC09304@unicode.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1126 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 11/02/2004 17:40, Rick McGowan wrote: >>Last week, as I understand it, the UTC changed the rules for use of ZWJ >>and ZWNJ, and so they may now be used within combining character >>sequences (although their general category will remain Cf). See >>http://www.unicode.org/review/pr-27.html for more details. >> >> > >It's also not quite that simple. There are a number of cases that are not >well-defined. UTC basically agreed that: > > Base ZW[N]J NSM > Base NSM ZW[N]J Base > >are well defined (because those are needed for Indic processing), but not >where a ZW[N]J comes between two NSMs. The committee was split enough on >the entire issue that we have to proceed with some caution in our thinking. > > Rick > > > > > > Rick, would it be reasonable to suggest that, if a proposal is made to the UTC for specific encodings involving ZW(N)J between two NSMs, the UTC is likely to accept that as a valid encoding? Or is there a good technical or political resaon why that might not be acceptable? -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From rick@unicode.org Thu Feb 12 17:57:40 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Feb 2004 17:57:40 -0500 (EST) Received: from izanami (ip-216-36-75-240.dsl.sjc.megapath.net [216.36.75.240]) by unicode.org (8.11.6/8.11.6) with SMTP id i1CMvek04778 for ; Thu, 12 Feb 2004 17:57:40 -0500 Message-Id: <200402122257.i1CMvek04778@unicode.org> To: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ In-Reply-To: <200402120140.i1C1eHC09304@unicode.org> Date: Thu, 12 Feb 2004 14:57:34 -0800 From: Rick McGowan received: by Apple.Mailer (2.95.2) X-archive-position: 1127 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk X-list: hebrew Peter Kirk asked: > ... would it be reasonable to suggest that, if a proposal is made to > the UTC for specific encodings involving ZW(N)J between two NSMs, the > UTC is likely to accept that as a valid encoding? Or is there a good > technical or political resaon why that might not be acceptable? NOTE: All of the text below constitutes my own off-the-cuff opinion and should in no way be construed as any official statement whatsoever. The discussion in UTC I believe leaves some things unanswered. Please see the resolution of Public Review Issue #27. Yes, UTC determined by consensus "to allow ZWJ and ZWNJ in combining character sequences, but not to change their general category". And also decided that "the interpretation of joiner/nonjoiner between two combining marks is not yet defined". [Those are excerpts from my notes, not to be construed as official policy!] What this means, formally, is that presence of a ZW[N]J is no longer conceived as breaking a combining character sequence. What this means *practically* is still up in the air. The immediate intent of UTC was to formally accept that we have been explicitly documenting Khmer syllable forms that include ZW[N]J in them, but they were "defective" combining character sequences formally. This was a contradiction. See TUS 4.0 pp 281-282. Similar usage extends to other Indic cases: for years we have been using ZW[N]J to make various distinctions. This new formulation now recognizes that ZW[N]J do not result in the thing *after* them becoming necessarily a "defective" combining character sequence. In practice, people implementing such scripts have included special-casing for ZW[N]J when dealing with various textual processes operating upon Indic script data. All of the cases in Indic scripts involve the following forms: B C J B B J C B (B = base, C = Combining mark(s), J = joiner or non-joiner) UTC explicitly has *not* made any final determination about random use of ZW[N]J in any other contexts, such as: B C J C B In Indic implementations, things will just work right because people are dealing with these joiners anyway. Same for Arabic. In those scripts, there are sequences defined or discussed explicitly in the standard and in FAQ entries where ZW[N]J are used to solve specific problems, or express specific visual results. I wouldn't expect that any existing implementations of Latin, Hebrew, or other scripts would have already the notion of continuing a sequence of combining marks between which a ZW[N]J exists. (As an aside, formally speaking, "B J B" I suppose is now interpretable as a combining char sequence "B J" followed by "B", where before it was "B" followed by the format control "J" followed by "B". I would expect that there are some subtle implications for some dark corners of some existing implementations, but I wouldn't dare venture to guess what will or will not break as a result of this change!) So to answer your question... I think you are free to propose specific solutions to specific problems in Hebrew through use of ZW[N]J. To be effective and useful, those would all have to be essentially accepted by UTC as valid sequences, their ramifications studied, and then written into standard documentation so that the user and implementor communities have some hope of interoperable implementations. It is *not* OK to simply advocate a solution on this list and expect the world to understand or implement it. Just use some caution and good judgement, get community buy-in on solutions, and then proceed in an orderly fashion through UTC. NOTE: All of the text above constitutes my own off-the-cuff opinion and should in no way be construed as any official statement whatsoever. Cheers, Rick From peterkirk@qaya.org Thu Feb 12 19:01:16 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Feb 2004 19:01:16 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1D01Fk17504; Thu, 12 Feb 2004 19:01:16 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id C2D2A415825; Fri, 13 Feb 2004 00:00:56 +0000 (GMT) Message-ID: <402C13CA.3010308@qaya.org> Date: Thu, 12 Feb 2004 16:01:14 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Rick McGowan Cc: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ References: <200402122257.i1CMvek04778@unicode.org> In-Reply-To: <200402122257.i1CMvek04778@unicode.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1128 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 12/02/2004 14:57, Rick McGowan wrote: >Peter Kirk asked: > > > >>... would it be reasonable to suggest that, if a proposal is made to >>the UTC for specific encodings involving ZW(N)J between two NSMs, the >>UTC is likely to accept that as a valid encoding? Or is there a good >>technical or political resaon why that might not be acceptable? >> >> > >NOTE: All of the text below constitutes my own off-the-cuff opinion and >should in no way be construed as any official statement whatsoever. > >The discussion in UTC I believe leaves some things unanswered. Please see >the resolution of Public Review Issue #27. > > > Thank you, Rick, for the detailed explanation (snipped). >... >I wouldn't expect that any existing implementations of Latin, Hebrew, or >other scripts would have already the notion of continuing a sequence of >combining marks between which a ZW[N]J exists. > > > Actually there are existing implementations of Hebrew which use ZWJ and ZWNJ between combining marks, in sequences like B C J C B. This is the recommended encoding for specifying the position of meteg relative to hataf vowels with both Ezra SIL Release 2.0 (although CGJ is mentioned as an alternative) and in the SBL Hebrew beta release. I was aware of this before these fonts were released and pointed out to the developers in advance that this was formally a breach of Unicode principles. On 16th January 2004, which was actually before either of these fonts was publicly released, I posted to this list a mention of Public Review Issue #27. This was a proposal to permit sequences like B C J C B, as well as B J C* B and B C* J B sequences, and so I pointed out to this list that acceptance of this proposal would legitimate the encoding already proposed for use in these fonts. I noted carefully that the public review issue was not asking for feedback on the principle of making this change i.e. allowing B C J C B sequences, but only on the question of whether option A or option B should be chosen. (As such the new summary at http://www.unicode.org/review/resolved-pri.html is misleading.) As such I understood that the main substance of the proposal had already been decided, although not formally accepted. As I had no particular preferences between options A and B I made no formal submission on that issue. If the review issue had asked for feedback (as you now misleadingly suggest that it did) on the substance of the proposal, I would have indicated my strong support, as I did (informally) on this list. >... > >So to answer your question... I think you are free to propose specific >solutions to specific problems in Hebrew through use of ZW[N]J. To be >effective and useful, those would all have to be essentially accepted by >UTC as valid sequences, their ramifications studied, and then written into >standard documentation so that the user and implementor communities have >some hope of interoperable implementations. > >It is *not* OK to simply advocate a solution on this list and expect the >world to understand or implement it. Just use some caution and good >judgement, get community buy-in on solutions, and then proceed in an >orderly fashion through UTC. > > > Understood. Thank you for your help on this. But note that part of what I am doing is bringing to this list what "the world" (at least in this case two of the largest communities working with biblical Hebrew text, the western scholarly community and the Bible translation community) is actually implementing in the name of Unicode while not strictly following its rules. The task I seem to have taken upon myself is to bring together what the UTC decides with what "the world" is going to do whether or not the UTC formally recognises it. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Thu Feb 12 19:27:36 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Feb 2004 19:27:36 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1D0RZk21673 for ; Thu, 12 Feb 2004 19:27:36 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 3837C407D9F for ; Fri, 13 Feb 2004 00:27:16 +0000 (GMT) Message-ID: <402C19F6.2020408@qaya.org> Date: Thu, 12 Feb 2004 16:27:34 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ References: <200402122257.i1CMvek04778@unicode.org> <402C13CA.3010308@qaya.org> In-Reply-To: <402C13CA.3010308@qaya.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1129 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 12/02/2004 16:01, Peter Kirk wrote: > ... > Understood. Thank you for your help on this. But note that part of > what I am doing is bringing to this list what "the world" (at least in > this case two of the largest communities working with biblical Hebrew > text, the western scholarly community and the Bible translation > community) is actually implementing in the name of Unicode while not > strictly following its rules. The task I seem to have taken upon > myself is to bring together what the UTC decides with what "the world" > is going to do whether or not the UTC formally recognises it. > Further to the above, I have just submitted the following as official feedback to the Unicode Consortium: > I am disappointed that Public Review Issue #27 has been only partially > resolved, in that "The interpretation of joiner/nonjoiner between two > combining marks is not yet defined." I strongly supported the original > proposal, according to which ZWJ or ZWNJ between two combining marks > would affect the rendering of those two marks. I did not formally > express this support because I understood the public review issue as > concerned only with the choice between options A and B (on which I had > no particular opinion), and that the main principle was not being > reviewed. > > There are specific cases where ligatures may be made between combining > marks associated with the same base character, analogous to ligatures > between base characters, and there is a need for a mechanism to > control ligation. One such example is that (in some typesetting > traditions) the Hebrew mark meteg generally combines with certain > Hebrew vowel marks, but is also sometimes written separately. Another > possible example is with IPA contour tones written above the > character; to avoid a possible proliferation of tone contours it might > be sensible to define these contours as ligatures of acute, grave and > macron. > > I would like to encourage the UTC to reconsider the issue which was > left "not yet defined" and to accept the principle that ZWJ and ZWNJ > may be used to control ligation between combining marks in specific > defined instances (and should be ignored when used between other > combining marks). I intend to present to the UTC a proposal for at > least one such specific instance. > > Peter Kirk -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From rick@unicode.org Thu Feb 12 19:47:33 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Feb 2004 19:47:44 -0500 (EST) Received: from izanami (ip-216-36-75-240.dsl.sjc.megapath.net [216.36.75.240]) by unicode.org (8.11.6/8.11.6) with SMTP id i1D0lDk22957; Thu, 12 Feb 2004 19:47:33 -0500 Message-Id: <200402130047.i1D0lDk22957@unicode.org> To: peterkirk@qaya.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ Cc: hebrew@unicode.org In-Reply-To: <200402122257.i1CMvek04778@unicode.org> Date: Thu, 12 Feb 2004 16:46:37 -0800 From: Rick McGowan received: by Apple.Mailer (2.95.2) X-archive-position: 1130 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk X-list: hebrew Peter, > the public review issue was not asking for feedback on the principle of > making this change i.e. allowing B C J C B sequences, but only on the > question of whether option A or option B should be chosen And guess what? UTC chose "neither of the above"! In fact they voted down explicit motions on both A and B. But they did make another decision, and directed me to close the public review issue. > (As such the new summary at http://www.unicode.org/review/resolved-pri.html > is misleading.) No, it is not misleading. It reflects the UTC decision, which was neither A nor B. > If the review issue had asked for feedback (as > you now misleadingly suggest that it did No, excuse me. I inserted into the resolution text from my notes of the UTC decisions. *I* am not misleading, I am reporting. The current issue, PRI #27, is closed and the resolution was what it was. And your new feedback on the issue will be posted into the UTC register along with other feedback for the meeting in June. Rick From peterkirk@qaya.org Thu Feb 12 20:39:42 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Feb 2004 20:39:42 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1D1dfk29823; Thu, 12 Feb 2004 20:39:41 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 0B82740C9A0; Fri, 13 Feb 2004 01:39:21 +0000 (GMT) Message-ID: <402C2ADC.9020706@qaya.org> Date: Thu, 12 Feb 2004 17:39:40 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Rick McGowan Cc: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ References: <200402130047.i1D0lDk22957@unicode.org> In-Reply-To: <200402130047.i1D0lDk22957@unicode.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1131 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 12/02/2004 16:46, Rick McGowan wrote: >Peter, > > > >>the public review issue was not asking for feedback on the principle of >>making this change i.e. allowing B C J C B sequences, but only on the >>question of whether option A or option B should be chosen >> >> > >And guess what? UTC chose "neither of the above"! In fact they voted down >explicit motions on both A and B. But they did make another decision, and >directed me to close the public review issue. > > > Thanks for the clarification. >>(As such the new summary at http://www.unicode.org/review/resolved-pri.html >>is misleading.) >> >> > >No, it is not misleading. It reflects the UTC decision, which was neither >A nor B. > > > >>If the review issue had asked for feedback (as >>you now misleadingly suggest that it did >> >> > >No, excuse me. I inserted into the resolution text from my notes of the >UTC decisions. *I* am not misleading, I am reporting. > > My point referred to the sentences in the first paragraph "This paper describes a proposal with which to fix this problem. As a part of the proposal, a choice has to be made among two alternatives." This suggests that the review issue related to the proposal, when, as I understand it, the issue as previously presented was restricted to the choice between the two alternatives. Perhaps the UTC is misleading, although probably only itself, in thinking that this issue has been resolved when it has left undefined a significant issue, and one about which several UTC members were aware as it was posted to the main Unicode list as well as to this list on 16th January. >The current issue, PRI #27, is closed and the resolution was what it was. > >And your new feedback on the issue will be posted into the UTC register >along with other feedback for the meeting in June. > > Thank you. > Rick > > > > > > By the way, is anyone other than Rick and me reading this list? I am posting here on the understanding that those interested in Hebrew Unicode issues are still subscribed to it. Is there suddenly no one interested? -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From mms@actcom.co.il Fri Feb 13 08:31:09 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 13 Feb 2004 08:31:09 -0500 (EST) Received: from smtp2.actcom.co.il (mail.actcom.co.il [192.114.47.15]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1DDV7k11715 for ; Fri, 13 Feb 2004 08:31:08 -0500 Received: from saba (l192-115-61-144.tcable.actcom.net.il [192.115.61.144]) by smtp2.actcom.co.il (8.12.8/8.12.8) with SMTP id i1DDV2jU020726 for ; Fri, 13 Feb 2004 15:31:04 +0200 Message-Id: <3.0.5.32.20040213145523.007e0c00@mail3.actcom.co.il> X-Sender: mms@mail3.actcom.co.il X-Mailer: QUALCOMM Windows Eudora Light Version 3.0.5 (32) Date: Fri, 13 Feb 2004 14:55:23 +0200 To: hebrew@unicode.org From: MMS Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ In-Reply-To: <402C2ADC.9020706@qaya.org> References: <200402130047.i1D0lDk22957@unicode.org> <200402130047.i1D0lDk22957@unicode.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-archive-position: 1132 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: mms@actcom.co.il Precedence: bulk X-list: hebrew The fact that we do not post to the list does *not* mean we do not read it. We are interested in it very much, but probably know little to be able to contribute. M. Shavit At 17:39 2004-02-12 -0800, you wrote: >On 12/02/2004 16:46, Rick McGowan wrote: > >>Peter, >> >> >> >>>the public review issue was not asking for feedback on the principle of >>>making this change i.e. allowing B C J C B sequences, but only on the >>>question of whether option A or option B should be chosen >>> >>> >> >>And guess what? UTC chose "neither of the above"! In fact they voted down >>explicit motions on both A and B. But they did make another decision, and >>directed me to close the public review issue. >> >> >> >Thanks for the clarification. > >>>(As such the new summary at http://www.unicode.org/review/resolved-pri.html >>>is misleading.) >>> >>> >> >>No, it is not misleading. It reflects the UTC decision, which was neither >>A nor B. >> >> >> >>>If the review issue had asked for feedback (as >>>you now misleadingly suggest that it did >>> >>> >> >>No, excuse me. I inserted into the resolution text from my notes of the >>UTC decisions. *I* am not misleading, I am reporting. >> >> > >My point referred to the sentences in the first paragraph "This paper >describes a proposal with which to fix this problem. As a part of the >proposal, a choice has to be made among two alternatives." This suggests >that the review issue related to the proposal, when, as I understand it, >the issue as previously presented was restricted to the choice between >the two alternatives. > >Perhaps the UTC is misleading, although probably only itself, in >thinking that this issue has been resolved when it has left undefined a >significant issue, and one about which several UTC members were aware as >it was posted to the main Unicode list as well as to this list on 16th >January. > >>The current issue, PRI #27, is closed and the resolution was what it was. >> >>And your new feedback on the issue will be posted into the UTC register >>along with other feedback for the meeting in June. >> >> > >Thank you. > >> Rick >> >> >> >> >> >> >By the way, is anyone other than Rick and me reading this list? I am >posting here on the understanding that those interested in Hebrew >Unicode issues are still subscribed to it. Is there suddenly no one >interested? > >-- >Peter Kirk >peter@qaya.org (personal) >peterkirk@qaya.org (work) >http://www.qaya.org/ > > > From ted.hopp@newslate.com Fri Feb 13 10:57:52 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 13 Feb 2004 10:57:52 -0500 (EST) Received: from smtp03.mrf.mail.rcn.net (smtp03.mrf.mail.rcn.net [207.172.4.62]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1DFvqk14356 for ; Fri, 13 Feb 2004 10:57:52 -0500 Received: from 216-164-48-205.c3-0.gth-ubr1.lnh-gth.md.cable.rcn.com ([216.164.48.205] helo=Xerxes) by smtp03.mrf.mail.rcn.net with smtp (Exim 3.35 #4) id 1ArfhI-0001kx-00; Fri, 13 Feb 2004 10:57:44 -0500 Message-ID: <007901c3f24a$1d020840$deeefea9@Xerxes> From: "Ted Hopp" To: "Peter Kirk" , References: <402AC4B6.6030207@qaya.org> Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ Date: Fri, 13 Feb 2004 10:57:42 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-archive-position: 1133 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: ted.hopp@newslate.com Precedence: bulk X-list: hebrew Peter, On Wednesday, February 11, 2004 7:11 PM, Peter Kirk wrote: > - holam male > - consonantal vav with holam > - either consonantal vav with holam, or neutral for use in > texts which do not attempt to differentiate this from holam male I don't see a need for the second (ZWNJ) variant. Having two different ways of representing consonantal vav with holam will be a source of future headaches. Ted Ted Hopp, Ph.D. ZigZag, Inc. ted.hopp@newSLATE.com +1-301-990-7453 newSLATE is your personal learning workspace ...on the web at http://www.newSLATE.com/ From peterkirk@qaya.org Fri Feb 13 11:23:14 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 13 Feb 2004 11:23:15 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1DGNDk20468 for ; Fri, 13 Feb 2004 11:23:14 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 0698340E0A8; Fri, 13 Feb 2004 16:22:57 +0000 (GMT) Message-ID: <402CF9EB.1030703@qaya.org> Date: Fri, 13 Feb 2004 08:23:07 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Ted Hopp Cc: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ References: <402AC4B6.6030207@qaya.org> <007901c3f24a$1d020840$deeefea9@Xerxes> In-Reply-To: <007901c3f24a$1d020840$deeefea9@Xerxes> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1134 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 13/02/2004 07:57, Ted Hopp wrote: >Peter, > >On Wednesday, February 11, 2004 7:11 PM, Peter Kirk >wrote: > > > > >> - holam male >> - consonantal vav with holam >> - either consonantal vav with holam, or neutral for use in >>texts which do not attempt to differentiate this from holam male >> >> > >I don't see a need for the second (ZWNJ) variant. Having two different ways >of representing consonantal vav with holam will be a source of future >headaches. > >Ted > > > > Thank you, Ted. I rather agree, and would be happy to drop this as a suggested encoding if there is no real demand for it. But note that ZWNJ is "default ignorable" which means that if a particular font has no special rendering for a sequence including it the ZWNJ should simply be ignored. As for the coding of holam male, do you, or does anyone else, have any particular preference between my previous suggestion of (which only apparently breaches the rule that combining marks must follow their base characters) and this alternative of ? Or do we want to follow up other ideas like defining a new holam male character or a new right holam combining mark (which have the disadvantage of obscuring the fundamental identity of holam male as a combination of holam and vav)? -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From rosennej@qsm.co.il Fri Feb 13 14:07:39 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 13 Feb 2004 14:07:40 -0500 (EST) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1DJ7ak08716 for ; Fri, 13 Feb 2004 14:07:37 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.9.3p2/8.9.3) with SMTP id LAA25082; Fri, 13 Feb 2004 11:06:58 -0800 (PST) (envelope-from rosennej@qsm.co.il) Received: from [212.235.125.130] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id oV60KaC2 authenticated by POP; Fri, 13 Feb 2004 11:06:53 -0700 (PST) From: "Jony Rosenne" To: Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ Date: Fri, 13 Feb 2004 21:06:38 +0200 Message-ID: <000501c3f264$86f76f00$0401c80a@QSM4> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.4510 In-Reply-To: <402CF9EB.1030703@qaya.org> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id i1DJ7ak08716 X-archive-position: 1135 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew How about this: The Holam Male should be Holam Vav. The Vav isn't in this case the base character, the base character is the letter which takes the Holam. The dot is placed to the right of the Vav, or in other words to the left of the base character. This is similar to the situation with the Hiriq Male, although with Hiriq the dot is placed under the base character. It is distinguished by the Vav having no vowel (including Shva as a vowel). For example, Gadol Avoni would be: Gimel, Qamats, Dagesh, Dalet, Vav, Holam, Lamed, Space, Ayin, Hataf Patah, Vav, Holam, Nun, Hiriq, Yod Avon: Ayin, Qamats, Vav, Holam, Vav, Final Nun It would be a font issue whether to render Holam, vowel less Vav the same as Vav Holam or not. Jony > -----Original Message----- > From: hebrew-bounce@unicode.org > [mailto:hebrew-bounce@unicode.org] On Behalf Of Peter Kirk > Sent: Friday, February 13, 2004 6:23 PM > To: Ted Hopp > Cc: hebrew@unicode.org > Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ > > > On 13/02/2004 07:57, Ted Hopp wrote: > > >Peter, > > > >On Wednesday, February 11, 2004 7:11 PM, Peter Kirk > > > >wrote: > > > > > > > > > >> - holam male > >> - consonantal vav with holam > >> - either consonantal vav with holam, or > neutral for use > >>in texts which do not attempt to differentiate this from holam male > >> > >> > > > >I don't see a need for the second (ZWNJ) variant. Having two > different > >ways of representing consonantal vav with holam will be a source of > >future headaches. > > > >Ted > > > > > > > > > Thank you, Ted. I rather agree, and would be happy to drop this as a > suggested encoding if there is no real demand for it. But > note that ZWNJ > is "default ignorable" which means that if a particular font has no > special rendering for a sequence including it the ZWNJ should > simply be > ignored. > > As for the coding of holam male, do you, or does anyone else, > have any > particular preference between my previous suggestion of > (which only apparently breaches the rule that combining marks must > follow their base characters) and this alternative of ZWJ, holam>? > Or do we want to follow up other ideas like defining a new holam male > character or a new right holam combining mark (which have the > disadvantage of obscuring the fundamental identity of holam male as a > combination of holam and vav)? > > -- > Peter Kirk > peter@qaya.org (personal) > peterkirk@qaya.org (work) > http://www.qaya.org/ > > > > From ted.hopp@newslate.com Fri Feb 13 16:26:02 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 13 Feb 2004 16:26:02 -0500 (EST) Received: from smtp03.mrf.mail.rcn.net (smtp03.mrf.mail.rcn.net [207.172.4.62]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1DLPxk08213 for ; Fri, 13 Feb 2004 16:26:02 -0500 Received: from 216-164-48-205.c3-0.gth-ubr1.lnh-gth.md.cable.rcn.com ([216.164.48.205] helo=Xerxes) by smtp03.mrf.mail.rcn.net with smtp (Exim 3.35 #4) id 1Arkox-00029v-00 for hebrew@unicode.org; Fri, 13 Feb 2004 16:25:59 -0500 Message-ID: <009501c3f277$f7a3e810$deeefea9@Xerxes> From: "Ted Hopp" To: References: <000501c3f264$86f76f00$0401c80a@QSM4> Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ Date: Fri, 13 Feb 2004 16:25:56 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-archive-position: 1136 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: ted.hopp@newslate.com Precedence: bulk X-list: hebrew On Friday, February 13, 2004 2:06 PM, Jony Rosenne wrote: > How about this: > > The Holam Male should be Holam Vav. The Vav isn't in this case the base > character, the base character is the letter which takes the Holam. The dot > is placed to the right of the Vav, or in other words to the left of the base > character. This is similar to the situation with the Hiriq Male, although > with Hiriq the dot is placed under the base character. > > It is distinguished by the Vav having no vowel (including Shva as a vowel). > > For example, Gadol Avoni would be: > > Gimel, Qamats, Dagesh, Dalet, Vav, Holam, Lamed, Space, Ayin, Hataf Patah, > Vav, Holam, Nun, Hiriq, Yod > > Avon: > > Ayin, Qamats, Vav, Holam, Vav, Final Nun > > It would be a font issue whether to render Holam, vowel less Vav the same as > Vav Holam or not. Terminology is complicated here. I think you are using "base character" in a grammatical sense, yet we need to stick to "base character" in the Unicode sense. The two are not the same. You assert that the holam dot combines with the base character preceding the holam male vowel, but the great majority of printed material I have seen does not support this assertion if one treats "combines" as a graphical interaction. To the contrary, holam male often looks like a vav with a holam centered above it (not to the right); frequently, it is indistinguishable from vav haluma, with the holam dot slightly to the LEFT of the vav. I'm not saying that's how it should be, just that such is the reality. Putting the holam logically before the vav (that is to say, after a preceding base character) assumes that there is a preceding base character. In normal Hebrew orthography, this is true. But non-standard cases are common: dictionaries have tables of vowels where holam male occurs isolated in a table cell; so do pronunciation guides, reading primers, etc. At least one verb book (Shmuel Bolozky's "501 Hebrew Verbs") uses non-standard orthography for pedagogical purposes. You may not like these as examples of good Hebrew, but such usages are there and should be supported. The point is, I think it is wrong to build into Unicode a reliance on Hebrew spelling rules for the interpretation of a string of Hebrew Unicode characters. (By the way, how would you transliterate "Joe Jew" into Hebrew? Do you feel comfortable putting a holam male after a geresh using your scheme?) I agree that the situation is somewhat similar to that of hiriq male (and tsere male; perhaps also patah-alef, etc.): single vowels grammatically, composed of multiple glyphs, with no single Unicode character. There is a critical difference, though: holam male usually (always?) involves some sort of graphical interaction between the holam dot and the vav; the others never do. Then there is the very important issue of dynamic behavior of the text (i.e., editing). Suppose I wanted to change your gadol to gaon (gimel-qamats-alef-holam male-final nun). Change the lamed to final nun -- no problem. Then I put the cursor visually between the dalet and the holam male. Where is the cursor logically? I would expect to press backspace and type alef to replace the dalet. Does that happen? If the holam dot combines with the dalet, no! I changed the holam dot to alef and now I have gimel, qamats, dalet, alef, vav, final nun. (If you fix that, make sure the editor also does what a user would expect when trying to change gadol to gadur using a "delete following" key when visually between the dalet and the holam male.) Perhaps one could argue that this should be fixed by using some clever alternate text representation in the editor; I would hate it if this issue were resolved in a way that guaranteed that Unicode could not be used for text editing. Sorry to go on at such length. I just think Unicode should not be putting a combining character ahead of the base character with which it combines graphically. Ted Ted Hopp, Ph.D. ZigZag, Inc. ted.hopp@newSLATE.com +1-301-990-7453 newSLATE is your personal learning workspace ...on the web at http://www.newSLATE.com/ From rick@unicode.org Fri Feb 13 16:55:20 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 13 Feb 2004 16:55:21 -0500 (EST) Received: from izanami (ip-216-36-75-240.dsl.sjc.megapath.net [216.36.75.240]) by unicode.org (8.11.6/8.11.6) with SMTP id i1DLqRk10330 for ; Fri, 13 Feb 2004 16:55:16 -0500 Message-Id: <200402132155.i1DLqRk10330@unicode.org> To: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ Date: Fri, 13 Feb 2004 13:51:54 -0800 From: Rick McGowan received: by Apple.Mailer (2.95.2) X-archive-position: 1137 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk X-list: hebrew I'm not following the linguistic & graphic details here, but as regards the standard, Ted Hopp wrote... > Sorry to go on at such length. I just think Unicode should not > be putting a combining character ahead of the base character with > which it combines graphically. Here "should not" isn't a strong enough statement. *WILL NOT* is more like it. Any proposal for a combining mark coming logically before its associated base character in the text stream wouldn't hold up. It's clearly non-conformant. But as Ted says: we need to be very particular about terminology here, so I hope those who are into the nitty-gritty will keep the definitions in Chapter 3 of the standard well in mind. Rick From petercon@microsoft.com Sat Feb 14 00:53:28 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 14 Feb 2004 00:53:28 -0500 (EST) Received: from mail2.microsoft.com (mail2.microsoft.com [131.107.3.124]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1E5rRY11829 for ; Sat, 14 Feb 2004 00:53:28 -0500 Received: from mail5.microsoft.com ([157.54.6.156]) by mail2.microsoft.com with Microsoft SMTPSVC(6.0.3790.1041); Fri, 13 Feb 2004 21:53:24 -0800 Received: from inet-vrs-05.redmond.corp.microsoft.com ([157.54.6.157]) by mail5.microsoft.com with Microsoft SMTPSVC(6.0.3790.1039); Fri, 13 Feb 2004 21:53:31 -0800 Received: from 157.54.6.197 by inet-vrs-05.redmond.corp.microsoft.com (InterScan E-Mail VirusWall NT); Fri, 13 Feb 2004 21:53:21 -0800 Received: from RED-MSG-52.redmond.corp.microsoft.com ([157.54.12.12]) by INET-HUB-06.redmond.corp.microsoft.com with Microsoft SMTPSVC(6.0.3790.1069); Fri, 13 Feb 2004 21:53:19 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.5.7165.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ Date: Fri, 13 Feb 2004 21:53:08 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [hebrew] Re: New possibilities with ZWJ and ZWNJ Thread-Index: AcPx0mwKkzNBhKpGTyCPF3s4n7OHCgA63N+Q From: "Peter Constable" To: X-OriginalArrivalTime: 14 Feb 2004 05:53:19.0408 (UTC) FILETIME=[D8F3F300:01C3F2BE] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id i1E5rRY11829 X-archive-position: 1138 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: petercon@microsoft.com Precedence: bulk X-list: hebrew > From: hebrew-bounce@unicode.org [mailto:hebrew-bounce@unicode.org] On > Behalf Of Peter Kirk > By the way, is anyone other than Rick and me reading this list? I am > posting here on the understanding that those interested in Hebrew > Unicode issues are still subscribed to it. Is there suddenly no one > interested? Yes, but I can pay attention to only so many fires at once. BTW, I would be very much opposed to using ZWJ to create ligatures to represent contour tone diacritics, but that's way off topic for this list. Peter Constable From peterkirk@qaya.org Mon Feb 16 09:59:36 2004 Received: with ECARTIS (v1.0.0; list hebrew); Mon, 16 Feb 2004 09:59:37 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1GExa516126; Mon, 16 Feb 2004 09:59:36 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id A13A4412E50; Mon, 16 Feb 2004 14:59:21 +0000 (GMT) Message-ID: <4030DAD2.1060108@qaya.org> Date: Mon, 16 Feb 2004 06:59:30 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Rick McGowan Cc: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ References: <200402132155.i1DLqRk10330@unicode.org> In-Reply-To: <200402132155.i1DLqRk10330@unicode.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1139 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 13/02/2004 13:51, Rick McGowan wrote: >I'm not following the linguistic & graphic details here, but as regards >the standard, Ted Hopp wrote... > > > >>Sorry to go on at such length. I just think Unicode should not >>be putting a combining character ahead of the base character with >>which it combines graphically. >> >> > >Here "should not" isn't a strong enough statement. *WILL NOT* is more like >it. Any proposal for a combining mark coming logically before its >associated base character in the text stream wouldn't hold up. It's clearly >non-conformant. > >... > > Rick, the issue here is actually not as simple as it may seem. The question is, with which base character is the combining mark holam associated? If we are talking "logically", your word, then the base character is the preceding consonant, not the vav - or at least I can make a good case for that. If we are talking "graphically", Ted's word, then, yes, the holam does combine with the vav. But experience e.g. with Indic scripts, as well as with numbers etc in RTL scripts, should have taught us that the graphical order of characters does not always match the logical one. Where there is a mismatch between the two, I understood that Unicode normally chooses to encode the logical order rather than the graphical one. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From davidg@macam.ac.il Mon Feb 16 13:25:27 2004 Received: with ECARTIS (v1.0.0; list hebrew); Mon, 16 Feb 2004 13:25:27 -0500 (EST) Received: from mail.macam.ac.il (mail.mofet.macam98.ac.il [192.114.206.40]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1GIPQ512508 for ; Mon, 16 Feb 2004 13:25:26 -0500 Received: from mail.macam.ac.il (localhost.localdomain [127.0.0.1]) by mail.macam.ac.il (8.12.8/8.12.8) with ESMTP id i1GIL0lQ014596 for ; Mon, 16 Feb 2004 20:21:00 +0200 Received: from b199323 (dailin234.dailin.macam98.ac.il [192.114.209.234])by mail.macam.ac.il (8.12.8/8.12.8) with SMTP id i1GIKrV8014546for ; Mon, 16 Feb 2004 20:20:59 +0200 Message-ID: <016901c3f4ba$3a077ed0$ead172c0@b199323> From: "David Grossman" To: References: <200402132155.i1DLqRk10330@unicode.org> <4030DAD2.1060108@qaya.o rg> Subject: [hebrew] Adjusting the location of taamim and nikkud Date: Mon, 16 Feb 2004 20:25:05 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-imss-version: 2.0 X-imss-result: Passed X-imss-scores: Clean:37.11753 C:17 M:1 S:5 R:5 X-imss-settings: Baseline:4 C:4 M:4 S:4 R:4 (0.1000 0.4000) X-archive-position: 1140 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: davidg@macam.ac.il Precedence: bulk X-list: hebrew A second field of interest (in addition to my other email sent at this time) is the (im)possibility of adjusting the location of taamim or nikkud on letters. This is important, because certain vowels or Taamim do not fall on the correct part of the letter (or they do not fall on an esthetic part). In addition, they may create a space. A good solution would be to build in the location together with the creation of the font. A *temporary* kludge might be to allow users to shift the location of these symbols (somehow?). I don't know whether this is possible. I certainly have no recommendations about how to resolve this issue. However, others may have suggestions about how this can be done. David Grossman From davidg@macam.ac.il Mon Feb 16 13:25:27 2004 Received: with ECARTIS (v1.0.0; list hebrew); Mon, 16 Feb 2004 13:25:27 -0500 (EST) Received: from mail.macam.ac.il (mail.mofet.macam98.ac.il [192.114.206.40]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1GIPQ512509 for ; Mon, 16 Feb 2004 13:25:26 -0500 Received: from mail.macam.ac.il (localhost.localdomain [127.0.0.1]) by mail.macam.ac.il (8.12.8/8.12.8) with ESMTP id i1GIL2lQ014605 for ; Mon, 16 Feb 2004 20:21:02 +0200 Received: from b199323 (dailin234.dailin.macam98.ac.il [192.114.209.234])by mail.macam.ac.il (8.12.8/8.12.8) with SMTP id i1GIKrV6014546for ; Mon, 16 Feb 2004 20:20:58 +0200 Message-ID: <016601c3f4ba$39333ee0$ead172c0@b199323> From: "David Grossman" To: References: <200402132155.i1DLqRk10330@unicode.org> <4030DAD2.1060108@qaya.o rg> Subject: [hebrew] Taamei Hamikra and vowels combined Date: Mon, 16 Feb 2004 20:19:34 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-imss-version: 2.0 X-imss-result: Passed X-imss-scores: Clean:36.88787 C:20 M:1 S:5 R:5 X-imss-settings: Baseline:4 C:4 M:4 S:4 R:4 (0.1000 0.4000) X-archive-position: 1140 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: davidg@macam.ac.il Precedence: bulk X-list: hebrew Hi, Peter, As you suggested on the Hebrew Computing group, I joined this Unicode group, and I must say that I am quite impressed. I'm particularly interested in two issues. The first is Taamim, especially when used together with nikkud. I assume that this issue was discussed before I joined. Is there a way that I can review the discussion of combined Taamim and Nikkud in the group archives, after which I may present some specific questions? David Grossman From peterkirk@qaya.org Mon Feb 16 15:22:56 2004 Received: with ECARTIS (v1.0.0; list hebrew); Mon, 16 Feb 2004 15:22:56 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1GKMto07659 for ; Mon, 16 Feb 2004 15:22:55 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 8966F40D4FD for ; Mon, 16 Feb 2004 20:22:38 +0000 (GMT) Message-ID: <4031269D.5090009@qaya.org> Date: Mon, 16 Feb 2004 12:22:53 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: hebrew@unicode.org Subject: [hebrew] [Fwd: RE: [Fwd: New possibilities with ZWJ and ZWNJ]] Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1141 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew Forwarded to the list at Jony's request. Peter -------- Original Message -------- Subject: RE: [Fwd: [hebrew] New possibilities with ZWJ and ZWNJ] Date: Fri, 13 Feb 2004 07:25:40 +0200 From: Jony Rosenne To: 'Peter Kirk' , 'John Hudson' , 'Mark E. Shoulson' , 'Peter Constable' , 'Ted Hopp' I agree to using the ZWJ for the medial Meteg, but to nothing else. The medial Meteg is like a ligature of the Meteg and the Hataf. The right Meteg probably needs its own character. Jony > -----Original Message----- > From: Peter Kirk [mailto:peterkirk@qaya.org] > Sent: Friday, February 13, 2004 3:44 AM > To: Jony Rosenne; John Hudson; Mark E. Shoulson; Peter > Constable; Ted Hopp > Subject: [Fwd: [hebrew] New possibilities with ZWJ and ZWNJ] > > > Jony, John, Mark, Peter don't any of you have anything to say > about this > proposal? > > > -------- Original Message -------- > Subject: [hebrew] New possibilities with ZWJ and ZWNJ > Date: Wed, 11 Feb 2004 16:11:34 -0800 > From: Peter Kirk > To: hebrew@unicode.org > > > > Last week, as I understand it, the UTC changed the rules for > use of ZWJ > and ZWNJ, and so they may now be used within combining character > sequences (although their general category will remain Cf). See > http://www.unicode.org/review/pr-27.html for more details. > > This affects some of the discussions which we had on this > list last year > about encoding of Hebrew combining sequences. The one which comes to > mind first is the encoding of the various positions of meteg. > Last year > I was unhappy with the following suggested encodings, but now > they are > acceptable: > > - default positioning of meteg, to the left, > or possibly > medial for hataf vowels > - meteg to the right > - medial meteg if this is a hataf vowel (and the > font supports medial meteg) > - meteg to the left (even if this is not the > default for hataf vowels) > > Can we now agree that this is the most suitable encoding? > > Are there any other issues which we might now be able to resolve in a > different way? I know for example that there has been no real > agreement > about the encoding of holam male, i.e. vav with holam above right. It > would now be permissible to propose the following encodings: > > - holam male > - consonantal vav with holam > - either consonantal vav with holam, or neutral > for use in > texts which do not attempt to differentiate this from holam male > > Any comments? (Note that my previous preferred proposal was > > or for holam male, and for > vav with holam.) > > Other issues which might be simplified by defining sequences > with ZWJ or > ZWNJ in a combining character sequence include certain accent > positioning issues, for example the positioning of the accent pashta > which depends on its position in the word in a way which > cannot easily > be automated (as discussed on this list 19-20 December 2003). > Thus pashta> might be used for the (less common) word medial variant of > pashta, which should appear above its base character and to > the right of > certain other marks and ascenders, to distinguish it from the > word final > pashta which appears to the left of the base character and > other marks > and ascenders. (But rendering details vary between texts.) There are > similar issues with some other accents. > > On the main Unicode list Ken Whistler has suggested, and > Asmus Freytag > has stated more strongly, that such conventions using Cf characters > should be documented in the Unicode standard. I agree that > they should > be documented somewhere. So I am proposing to put together a simple > proposal for such documentation, if possible embodying the > consensus of > this list and other Hebrew experts. The UTC will then be able > to decide > whether to incorporate it into the standard, or perhaps find > somewhere > else for it. > > Any comments? > > See > http://www.qaya.org/academic/hebrew/Issues-Hebrew-Unicode.html and http://www.qsm.co.il/Hebrew/Hebrew%20Issues.htm for some further background. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Mon Feb 16 15:25:11 2004 Received: with ECARTIS (v1.0.0; list hebrew); Mon, 16 Feb 2004 15:25:11 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1GKPBo07741 for ; Mon, 16 Feb 2004 15:25:11 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 783834087E0 for ; Mon, 16 Feb 2004 20:24:55 +0000 (GMT) Message-ID: <40312726.9090501@qaya.org> Date: Mon, 16 Feb 2004 12:25:10 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: hebrew@unicode.org Subject: [hebrew] [Fwd: Re: [Fwd: New possibilities with ZWJ and ZWNJ]] Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1142 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew My reply to Jony. Peter -------- Original Message -------- Subject: Re: [Fwd: [hebrew] New possibilities with ZWJ and ZWNJ] Date: Fri, 13 Feb 2004 07:08:11 -0800 From: Peter Kirk To: Jony Rosenne CC: 'John Hudson' , "'Mark E. Shoulson'" , 'Peter Constable' , 'Ted Hopp' References: <001101c3f1f1$d995c770$0401c80a@QSM4> On 12/02/2004 21:25, Jony Rosenne wrote: >I agree to using the ZWJ for the medial Meteg, but to nothing else. The >medial Meteg is like a ligature of the Meteg and the Hataf. The right Meteg >probably needs its own character. > >Jony > > > Thank you, Jony. I am glad of your support re ZWJ for medial meteg. As John mentions, you are free to propose a separate right meteg character, and we can leave the decision between the two solutions to the UTC. I prefer the CGJ solution, already approved by the UTC (see http://www.unicode.org/consortium/utc-minutes/UTC-096-200308.html item B.14.5.1 - I can't find the promised FAQ on CGJ), because right meteg is not fundamentally a different character. (The fundamental distinction which does exist, between meteg and silluq, has already been lost, and probably cannot be restored although I would not be strongly opposed to an attempt to do so.) But my objections to a new right meteg character would not be strong ones, it is more important to me to have some agreed solution to this issue. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Mon Feb 16 18:47:45 2004 Received: with ECARTIS (v1.0.0; list hebrew); Mon, 16 Feb 2004 18:47:51 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1GNlPo10441 for ; Mon, 16 Feb 2004 18:47:45 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 21B994140E6; Mon, 16 Feb 2004 23:46:35 +0000 (GMT) Message-ID: <4031566D.7010406@qaya.org> Date: Mon, 16 Feb 2004 15:46:53 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: David Grossman Cc: hebrew@unicode.org Subject: [hebrew] Re: Taamei Hamikra and vowels combined References: <200402132155.i1DLqRk10330@unicode.org> <4030DAD2.1060108@qaya.o rg> <016601c3f4ba$39333ee0$ead172c0@b199323> In-Reply-To: <016601c3f4ba$39333ee0$ead172c0@b199323> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1143 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 16/02/2004 10:19, David Grossman wrote: >Hi, Peter, > >As you suggested on the Hebrew Computing group, I joined this Unicode group, >and I must say that I am quite impressed. > >I'm particularly interested in two issues. The first is Taamim, especially >when used together with nikkud. I assume that this issue was discussed >before I joined. Is there a way that I can review the discussion of combined >Taamim and Nikkud in the group archives, after which I may present some >specific questions? > >David Grossman > > > > > > > Sadly, there isn't a public archive of this list. I'll see if I can send you an archive based on my own records, but not today I'm afraid. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Mon Feb 16 18:54:08 2004 Received: with ECARTIS (v1.0.0; list hebrew); Mon, 16 Feb 2004 18:54:08 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1GNs8o11597 for ; Mon, 16 Feb 2004 18:54:08 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 4FCFC40D322; Mon, 16 Feb 2004 23:53:48 +0000 (GMT) Message-ID: <4031581E.4020407@qaya.org> Date: Mon, 16 Feb 2004 15:54:06 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: David Grossman Cc: hebrew@unicode.org Subject: [hebrew] Re: Taamei Hamikra and vowels combined References: <200402132155.i1DLqRk10330@unicode.org> <4030DAD2.1060108@qaya.o rg> <016601c3f4ba$39333ee0$ead172c0@b199323> <4031566D.7010406@qaya.org> In-Reply-To: <4031566D.7010406@qaya.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1144 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 16/02/2004 15:46, Peter Kirk wrote: > On 16/02/2004 10:19, David Grossman wrote: > >> Hi, Peter, >> >> As you suggested on the Hebrew Computing group, I joined this Unicode >> group, >> and I must say that I am quite impressed. >> >> I'm particularly interested in two issues. The first is Taamim, >> especially >> when used together with nikkud. I assume that this issue was discussed >> before I joined. Is there a way that I can review the discussion of >> combined >> Taamim and Nikkud in the group archives, after which I may present some >> specific questions? >> >> David Grossman >> >> >> >> >> >> >> > Sadly, there isn't a public archive of this list. I'll see if I can > send you an archive based on my own records, but not today I'm afraid. > I'm pleased to be able to correct myself. There IS a public archive of this list, but only a raw one (but very up-to-date), at http://www.unicode.org/~ecartis/hebrew/ The user id to use is "unicode-ml" and the password is "unicode". -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Tue Feb 17 06:23:27 2004 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 17 Feb 2004 06:23:27 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1HBNQo03738 for ; Tue, 17 Feb 2004 06:23:26 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 59D39415C1E; Tue, 17 Feb 2004 11:23:13 +0000 (GMT) Message-ID: <4031F9A7.30008@qaya.org> Date: Tue, 17 Feb 2004 03:23:19 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Ted Hopp Cc: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ References: <000501c3f264$86f76f00$0401c80a@QSM4> <009501c3f277$f7a3e810$deeefea9@Xerxes> In-Reply-To: <009501c3f277$f7a3e810$deeefea9@Xerxes> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1145 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 13/02/2004 13:25, Ted Hopp wrote: > ... > >Then there is the very important issue of dynamic behavior of the text >(i.e., editing). Suppose I wanted to change your gadol to gaon >(gimel-qamats-alef-holam male-final nun). Change the lamed to final nun -- >no problem. Then I put the cursor visually between the dalet and the holam >male. Where is the cursor logically? I would expect to press backspace and >type alef to replace the dalet. Does that happen? If the holam dot combines >with the dalet, no! I changed the holam dot to alef and now I have gimel, >qamats, dalet, alef, vav, final nun. (If you fix that, make sure the editor >also does what a user would expect when trying to change gadol to gadur >using a "delete following" key when visually between the dalet and the holam >male.) Perhaps one could argue that this should be fixed by using some >clever alternate text representation in the editor; I would hate it if this >issue were resolved in a way that guaranteed that Unicode could not be used >for text editing. > > > Ted, I think this part of what you wrote needs further consideration. You recent postings to the bidi list show how carefully you have thought about issues of cursor placement, deletion of characters etc. And I agree with your comments about how confusing existing bidi implementations often are (I have one which actually reorders the what you see on the screen when you try to select an RTL character in an ambiguous context!) I am not sure that visual selection is the answer, but that's a different issue. So let me ask a couple of questions, of you and of any others. Suppose you had a word on the screen consisting of alef followed by holam male, with an accent (which should probably be placed above or below the alef more than the vav, but renderings may vary). You place the cursor at the end and press backspace a number of times. What would you all expect to see deleted, and in what order: 1) first vav, then the accent, then holam, then alef 2) first vav, then alef, holam and the accent together 3) first holam, then vav, then the accent, then alef 4) first holam and vav together, then alef and the accent together 5) anything else? And if the cursor is at the beginning of the word and you press delete, would it be the same in reverse? And what if the cursor was positioned between the alef and the vav? The answers to these questions should help us decide what is the best encoding for holam male. In Unicode terms, it will help us to decide what should be considered a grapheme cluster in Hebrew, because, if I remember correctly, that is the entity which should be deleted in a single keystroke. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From davidg@macam.ac.il Tue Feb 17 06:48:45 2004 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 17 Feb 2004 06:48:45 -0500 (EST) Received: from mail.macam.ac.il (mail.mofet.macam98.ac.il [192.114.206.40]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1HBmYo05608 for ; Tue, 17 Feb 2004 06:48:44 -0500 Received: from mail.macam.ac.il (localhost.localdomain [127.0.0.1]) by mail.macam.ac.il (8.12.8/8.12.8) with ESMTP id i1HBhWlQ012619 for ; Tue, 17 Feb 2004 13:43:32 +0200 Received: from b199323 (dailin82.dailin.macam98.ac.il [192.114.209.82])by ma il.macam.ac.il (8.12.8/8.12.8) with SMTP id i1HBhGVK012461for ; Tue, 17 Feb 2004 13:43:31 +0200 Message-ID: <01e001c3f54b$e0957770$52d172c0@b199323> From: "David Grossman" To: References: <200402132155.i1DLqRk10330@unicode.org> <4030DAD2.1060108@qaya.o rg> <016601c3f4ba$39333ee0$ead172c0@b199323> <4031566D.7010406@qaya.org> < 4031581E.4020407@qaya.org> Subject: [hebrew] Re: Taamei Hamikra and vowels combined Date: Tue, 17 Feb 2004 13:14:01 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-imss-version: 2.0 X-imss-result: Passed X-imss-scores: Clean:66.80422 C:12 M:1 S:5 R:5 X-imss-settings: Baseline:4 C:4 M:4 S:4 R:4 (0.1000 0.1000) X-archive-position: 1146 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: davidg@macam.ac.il Precedence: bulk X-list: hebrew Thank you. I'll review it and then get ask specific queries based on my research. David Grossman ----- Original Message ----- From: "Peter Kirk" To: "David Grossman" Cc: Sent: Tuesday, February 17, 2004 1:54 AM Subject: Re: [hebrew] Re: Taamei Hamikra and vowels combined > On 16/02/2004 15:46, Peter Kirk wrote: > > > On 16/02/2004 10:19, David Grossman wrote: > > > >> Hi, Peter, > >> > >> As you suggested on the Hebrew Computing group, I joined this Unicode > >> group, > >> and I must say that I am quite impressed. > >> > >> I'm particularly interested in two issues. The first is Taamim, > >> especially > >> when used together with nikkud. I assume that this issue was discussed > >> before I joined. Is there a way that I can review the discussion of > >> combined > >> Taamim and Nikkud in the group archives, after which I may present some > >> specific questions? > >> > >> David Grossman > >> > >> > >> > >> > >> > >> > >> > > Sadly, there isn't a public archive of this list. I'll see if I can > > send you an archive based on my own records, but not today I'm afraid. > > > I'm pleased to be able to correct myself. There IS a public archive of > this list, but only a raw one (but very up-to-date), at > > http://www.unicode.org/~ecartis/hebrew/ > > The user id to use is "unicode-ml" and the password is "unicode". > > > -- > Peter Kirk > peter@qaya.org (personal) > peterkirk@qaya.org (work) > http://www.qaya.org/ From mark@kli.org Tue Feb 17 09:44:17 2004 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 17 Feb 2004 09:44:18 -0500 (EST) Received: from pi.meson.org (h-66-134-26-207.NYCMNY83.covad.net [66.134.26.207]) by unicode.org (8.11.6/8.11.6) with SMTP id i1HEiFo24071 for ; Tue, 17 Feb 2004 09:44:17 -0500 Received: (qmail 14452 invoked from network); 17 Feb 2004 14:44:07 -0000 Received: from nagas.meson.org (HELO kli.org) (1000@192.168.1.101) by pi.meson.org with SMTP; 17 Feb 2004 14:44:07 -0000 Message-ID: <403228EE.2060201@kli.org> Date: Tue, 17 Feb 2004 09:45:02 -0500 From: "Mark E. Shoulson" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en, fr MIME-Version: 1.0 To: Peter Kirk CC: Ted Hopp , hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ References: <000501c3f264$86f76f00$0401c80a@QSM4> <009501c3f277$f7a3e810$deeefea9@Xerxes> <4031F9A7.30008@qaya.org> In-Reply-To: <4031F9A7.30008@qaya.org> X-Hebrew-Date: 25 Shevat 5764 09:15am (horae temporales) X-Enigmail-Version: 0.76.3.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1147 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: mark@kli.org Precedence: bulk X-list: hebrew Peter Kirk wrote: > > So let me ask a couple of questions, of you and of any others. Suppose > you had a word on the screen consisting of alef followed by holam > male, with an accent (which should probably be placed above or below > the alef more than the vav, but renderings may vary). You place the > cursor at the end and press backspace a number of times. What would > you all expect to see deleted, and in what order: > > 1) first vav, then the accent, then holam, then alef > 2) first vav, then alef, holam and the accent together > 3) first holam, then vav, then the accent, then alef > 4) first holam and vav together, then alef and the accent together > 5) anything else? > > And if the cursor is at the beginning of the word and you press > delete, would it be the same in reverse? And what if the cursor was > positioned between the alef and the vav? > > The answers to these questions should help us decide what is the best > encoding for holam male. In Unicode terms, it will help us to decide > what should be considered a grapheme cluster in Hebrew, because, if I > remember correctly, that is the entity which should be deleted in a > single keystroke. An interesting definition, interesting in its succinctness. Worth keeping in mind even if it turns out not to be exactly correct. I'm not a native Hebrew typist (but I imagine few people are, when considering *pointed* Hebrew); here's my expectation: Certainly the vav-holam (holam male) goes as a single character. It's one unit, and should go as one. That said, once we're allowing backspace to remove only a diacritical mark (which we must do, and which is correct, but which is also somewhat odd with respect to ordinary typing), I can conceive of getting used to either the holam or the vav vanishing first--but then one can get used to practically anything. There's some logic to both choices: the holam is a dot on the vav, as thus the thing drawn last, while on the other hand, if I need to correct my text, chances are that I know what vowel I meant and accidentally wrote it "full" and not "deficient." Still, deleting the vav before the holam seems least intuitive of the choices. After that, either the accent or the aleph+accent. Accents are rare beasts (for normal people typing that is, even in pointed Hebrew), so it's reasonable to have them obey rules incompletely. ~mark P.S. Found out about an edition of the Talmud that actually printed accents (not on every word) on the text of the Mishnah. Wow. From peterkirk@qaya.org Tue Feb 17 10:02:20 2004 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 17 Feb 2004 10:02:24 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1HF2Ko06520 for ; Tue, 17 Feb 2004 10:02:20 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 30B83406C5F; Tue, 17 Feb 2004 15:02:05 +0000 (GMT) Message-ID: <40322CF6.8070909@qaya.org> Date: Tue, 17 Feb 2004 07:02:14 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: "Mark E. Shoulson" Cc: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ References: <000501c3f264$86f76f00$0401c80a@QSM4> <009501c3f277$f7a3e810$deeefea9@Xerxes> <4031F9A7.30008@qaya.org> <403228EE.2060201@kli.org> In-Reply-To: <403228EE.2060201@kli.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1148 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 17/02/2004 06:45, Mark E. Shoulson wrote: > Peter Kirk wrote: > >> >> ... >> >> The answers to these questions should help us decide what is the best >> encoding for holam male. In Unicode terms, it will help us to decide >> what should be considered a grapheme cluster in Hebrew, because, if I >> remember correctly, that is the entity which should be deleted in a >> single keystroke. > > > An interesting definition, interesting in its succinctness. Worth > keeping in mind even if it turns out not to be exactly correct. It's a summary of http://www.unicode.org/reports/tr29/tr29-5.html section 3 paragraphs 4-5; cursor movement is in some ways a better test than deletion except that the position of the cursor relative to a non-spacing character is ill-defined. > > I'm not a native Hebrew typist (but I imagine few people are, when > considering *pointed* Hebrew); here's my expectation: > > Certainly the vav-holam (holam male) goes as a single character. It's > one unit, and should go as one. That said, once we're allowing > backspace to remove only a diacritical mark (which we must do, and > which is correct, but which is also somewhat odd with respect to > ordinary typing), I can conceive of getting used to either the holam > or the vav vanishing first--but then one can get used to practically > anything. There's some logic to both choices: the holam is a dot on > the vav, as thus the thing drawn last, while on the other hand, if I > need to correct my text, chances are that I know what vowel I meant > and accidentally wrote it "full" and not "deficient." Still, deleting > the vav before the holam seems least intuitive of the choices. OK. I wonder if in fact the best thing is to delete the whole holam male, even if other vowel points are deleted separately. A problem with deleting just the holam is that a user who does that by mistake might try to type the holam in again separately - and would end up with vav haluma instead of holam male. We also need to consider the analogous case of vav shruqa which looks like vav with dagesh and is encoded the same; do we support separate deletion of dagesh, and so of shuruq? > > After that, either the accent or the aleph+accent. Accents are rare > beasts (for normal people typing that is, even in pointed Hebrew), so > it's reasonable to have them obey rules incompletely. Oddly enough accents are more often deleted than typed! At least, I only type accents into demonstration texts about accents; but quite often I paste a word from the Bible text into a context in which I don't want accents, and so I want to delete the accent - which is sometimes difficult. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From mark@kli.org Tue Feb 17 10:14:13 2004 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 17 Feb 2004 10:14:13 -0500 (EST) Received: from pi.meson.org (h-66-134-26-207.NYCMNY83.covad.net [66.134.26.207]) by unicode.org (8.11.6/8.11.6) with SMTP id i1HFEDo07825 for ; Tue, 17 Feb 2004 10:14:13 -0500 Received: (qmail 15379 invoked from network); 17 Feb 2004 15:14:09 -0000 Received: from nagas.meson.org (HELO kli.org) (1000@192.168.1.101) by pi.meson.org with SMTP; 17 Feb 2004 15:14:09 -0000 Message-ID: <40322FF8.4000008@kli.org> Date: Tue, 17 Feb 2004 10:15:04 -0500 From: "Mark E. Shoulson" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en, fr MIME-Version: 1.0 To: Peter Kirk CC: hebrew@unicode.org Subject: [hebrew] Re: New possibilities with ZWJ and ZWNJ References: <000501c3f264$86f76f00$0401c80a@QSM4> <009501c3f277$f7a3e810$deeefea9@Xerxes> <4031F9A7.30008@qaya.org> <403228EE.2060201@kli.org> <40322CF6.8070909@qaya.org> In-Reply-To: <40322CF6.8070909@qaya.org> X-Hebrew-Date: 25 Shevat 5764 09:49am (horae temporales) X-Enigmail-Version: 0.76.3.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1149 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: mark@kli.org Precedence: bulk X-list: hebrew Peter Kirk wrote: > On 17/02/2004 06:45, Mark E. Shoulson wrote: > >> Peter Kirk wrote: >> >> I'm not a native Hebrew typist (but I imagine few people are, when >> considering *pointed* Hebrew); here's my expectation: >> >> Certainly the vav-holam (holam male) goes as a single character. >> It's one unit, and should go as one. That said, once we're allowing >> backspace to remove only a diacritical mark (which we must do, and >> which is correct, but which is also somewhat odd with respect to >> ordinary typing), I can conceive of getting used to either the holam >> or the vav vanishing first--but then one can get used to practically >> anything. There's some logic to both choices: the holam is a dot on >> the vav, as thus the thing drawn last, while on the other hand, if I >> need to correct my text, chances are that I know what vowel I meant >> and accidentally wrote it "full" and not "deficient." Still, >> deleting the vav before the holam seems least intuitive of the choices. > > > > OK. I wonder if in fact the best thing is to delete the whole holam > male, even if other vowel points are deleted separately. A problem > with deleting just the holam is that a user who does that by mistake > might try to type the holam in again separately - and would end up > with vav haluma instead of holam male. We also need to consider the > analogous case of vav shruqa which looks like vav with dagesh and is > encoded the same; do we support separate deletion of dagesh, and so of > shuruq? The more I think about it, the more sure I am: yes, take out the whole holam male with one backspace, ditto a whole vav shruqa, a whole vav degusha, and anything else with a dagesh in it. If you accidentally dagesh a letter, you'll have to retype the base letter. Deleting a shuruq without the vav, or a dagesh without it, just doesn't make sense. >> >> After that, either the accent or the aleph+accent. Accents are rare >> beasts (for normal people typing that is, even in pointed Hebrew), so >> it's reasonable to have them obey rules incompletely. > > > Oddly enough accents are more often deleted than typed! At least, I > only type accents into demonstration texts about accents; but quite > often I paste a word from the Bible text into a context in which I > don't want accents, and so I want to delete the accent - which is > sometimes difficult. All the more reason for them to behave a little strangely. Letter+accent can make a decent (not bulletproof) case for being a grapheme cluster, but especially if they're going to be deleted frequently, that would make pasting the text in a waste of time. So it's better they be treated as separate marks, deleted alone. ~mark From peterkirk@qaya.org Tue Feb 17 11:20:11 2004 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 17 Feb 2004 11:20:12 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1HGKBo26590 for ; Tue, 17 Feb 2004 11:20:11 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 61755407223; Tue, 17 Feb 2004 16:19:38 +0000 (GMT) Message-ID: <40323F26.9090101@qaya.org> Date: Tue, 17 Feb 2004 08:19:50 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Peter Constable Cc: hebrew@unicode.org Subject: [hebrew] Re: [Fwd: New possibilities with ZWJ and ZWNJ] References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1150 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 17/02/2004 08:02, Peter Constable wrote: > ... > >If the positioning of pashta is nothing more than a matter of >presentation, it should be dealt with in font/rendering technologies, >not in character sequences. (I'm not saying I think it is purely a >matter of presentation -- I don't know enough about it; just that *if* >if is only...) > > > Careful, Peter! You might find yourself committed to supporting what is necessary in your font and rendering technologies. The point is that the positioning of this mark depends on its positioning in a WORD, not just relative to a base character. That is, if it is associated with a word final base character it is positioned to the left of the base character, but if the base character is not word final the mark is positioned above it. Well, the forms of Arabic letters similarly depend on their position in a word (roughly speaking) and OpenType and Uniscribe deal with that OK. This situation is more difficult only because of how it interacts with various other positioning rules. This in fact ties up with the even trickier problem that the masora circle is usually centred over a WORD rather than a letter. I'm not sure how you would deal with that one, but again it is just a matter of presentation, surely! ;-) PS In the light of your known preference for discussing such things on a list, I am sending this to the list rather than copying it to several individuals. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From ted.hopp@newslate.com Tue Feb 17 11:47:58 2004 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 17 Feb 2004 11:48:55 -0500 (EST) Received: from smtp03.mrf.mail.rcn.net (smtp03.mrf.mail.rcn.net [207.172.4.62]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1HGlbo32248 for ; Tue, 17 Feb 2004 11:47:57 -0500 Received: from 216-164-48-205.c3-0.gth-ubr1.lnh-gth.md.cable.rcn.com ([216.164.48.205] helo=Xerxes) by smtp03.mrf.mail.rcn.net with smtp (Exim 3.35 #4) id 1At8NG-0007QL-00; Tue, 17 Feb 2004 11:47:06 -0500 Message-ID: <00e701c3f575$ad975260$deeefea9@Xerxes> From: "Ted Hopp" To: "Peter Constable" , "Mark E. Shoulson" , "Peter Kirk" , Cc: "Jony Rosenne" , "John Hudson" References: Subject: [hebrew] Re: [Fwd: New possibilities with ZWJ and ZWNJ] Date: Tue, 17 Feb 2004 11:47:06 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1158 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165 X-archive-position: 1151 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: ted.hopp@newslate.com Precedence: bulk X-list: hebrew On Tuesday, February 17, 2004 11:02 AM, Peter Constable wrote: > The question isn't whether the BHS exactly as printed is plain text -- > it is clearly not; for instance, it assumes a particularly typeface and > calligraphic style. The question is whether all of the distinctions that > might reasonably be expected to be represented in terms of character > sequences can, if fact, be so represented. There are some distinctions > in mark positioning (e.g. meteg to the left or right of a vowel) that > should be represented in terms of character sequences because they are > part of the editorial content of the text, and some distinctions in mark > positioning that are no more than typographic choices / presentation. Peter, Don't overstrikes constitute mark positioning that is usually "part of the editorial content of the text"? Yet we do not require that Unicode be able to encode that. (We have the "combining character" mechanism instead, which has a repertoire limited to the combining characters.) Likewise with super/subscripts, rotated text, etc. At face value, your statement above seems to argue that all text positioning that is editorial content (do you mean here: as opposed to stylistic choices devoid of semantic value?) should be encodable in Unicode. But my impression is that you would not support this argument in its most extreme form. What's the real criterion here? Ted Ted Hopp, Ph.D. ZigZag, Inc. ted.hopp@newSLATE.com +1-301-990-7453 newSLATE is your personal learning workspace ...on the web at http://www.newSLATE.com/ From mark@kli.org Tue Feb 17 21:42:11 2004 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 17 Feb 2004 21:42:12 -0500 (EST) Received: from pi.meson.org (h-66-134-26-207.NYCMNY83.covad.net [66.134.26.207]) by unicode.org (8.11.6/8.11.6) with SMTP id i1I2g9928926 for ; Tue, 17 Feb 2004 21:42:11 -0500 Received: (qmail 31156 invoked from network); 18 Feb 2004 02:42:06 -0000 Received: from nagas.meson.org (HELO kli.org) (1000@192.168.1.101) by pi.meson.org with SMTP; 18 Feb 2004 02:42:06 -0000 Message-ID: <4032D139.90904@kli.org> Date: Tue, 17 Feb 2004 21:43:05 -0500 From: "Mark E. Shoulson" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 X-Accept-Language: en, fr MIME-Version: 1.0 To: Peter Constable CC: Peter Kirk , Jony Rosenne , John Hudson , Ted Hopp , hebrew@unicode.org Subject: [hebrew] Re: [Fwd: New possibilities with ZWJ and ZWNJ] References: In-Reply-To: X-Hebrew-Date: 26 Shevat 5764 09:46pm (horae temporales) X-Enigmail-Version: 0.76.3.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1152 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: mark@kli.org Precedence: bulk X-list: hebrew Peter Constable wrote: >If the positioning of pashta is nothing more than a matter of >presentation, it should be dealt with in font/rendering technologies, >not in character sequences. (I'm not saying I think it is purely a >matter of presentation -- I don't know enough about it; just that *if* >if is only...) > > It seems awfully presentation-y to me. It's purely a matter of where it is in the word, and also a matter of the typeface (some center the medial pashta and some keep it on the left, from what I've seen). I can't see a reason to encode it separately, as it's just a pashta. It complicates computation a little, as you have to be able to distinguish two pashtas in a row (on different words) from a pashta on a word that's written twice (which is the situation we're dealing with regarding the "medial" pashta). It might be easier to have a special code for the auxiliary one, but I don't know that that's a valid reason. We computational ta`amists will just have to cope. ~mark From petercon@microsoft.com Mon Feb 23 00:25:14 2004 Received: with ECARTIS (v1.0.0; list hebrew); Mon, 23 Feb 2004 00:25:14 -0500 (EST) Received: from mail1.microsoft.com (mail1.microsoft.com [131.107.3.125]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1N5PDo14539 for ; Mon, 23 Feb 2004 00:25:14 -0500 Received: from inet-vrs-01.redmond.corp.microsoft.com ([157.54.8.27]) by mail1.microsoft.com with Microsoft SMTPSVC(6.0.3790.0); Sun, 22 Feb 2004 21:24:53 -0800 Received: from 157.54.8.155 by inet-vrs-01.redmond.corp.microsoft.com (InterScan E-Mail VirusWall NT); Sun, 22 Feb 2004 21:25:07 -0800 Received: from RED-MSG-52.redmond.corp.microsoft.com ([157.54.12.12]) by inet-hub-04.redmond.corp.microsoft.com with Microsoft SMTPSVC(6.0.3790.0); Sun, 22 Feb 2004 21:25:52 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.5.7165.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: [hebrew] Re: Taamei Hamikra and vowels combined Date: Sun, 22 Feb 2004 21:25:01 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [hebrew] Taamei Hamikra and vowels combined Thread-Index: AcP0ulUb4uOm0crwRr2NTYU6tQOPMwFEr9wg From: "Peter Constable" To: X-OriginalArrivalTime: 23 Feb 2004 05:25:52.0704 (UTC) FILETIME=[81289C00:01C3F9CD] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id i1N5PDo14539 X-archive-position: 1153 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: petercon@microsoft.com Precedence: bulk X-list: hebrew > From: hebrew-bounce@unicode.org [mailto:hebrew-bounce@unicode.org] On > Behalf Of David Grossman > I'm particularly interested in two issues. The first is Taamim, especially > when used together with nikkud. I assume that this issue was discussed > before I joined. Is there a way that I can review the discussion of > combined > Taamim and Nikkud in the group archives, after which I may present some > specific questions? It would probably be much faster to simply ask your questions and have someone summarize prior discussion; one thing this list has *not* been known for is succinct statements free of repetition. ;=) Peter Constable From petercon@microsoft.com Mon Feb 23 00:33:51 2004 Received: with ECARTIS (v1.0.0; list hebrew); Mon, 23 Feb 2004 00:33:51 -0500 (EST) Received: from mail3.microsoft.com (mail3.microsoft.com [131.107.3.123]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1N5Xoo18846 for ; Mon, 23 Feb 2004 00:33:50 -0500 Received: from INET-VRS-03.redmond.corp.microsoft.com ([157.54.5.27]) by mail3.microsoft.com with Microsoft SMTPSVC(6.0.3790.0); Sun, 22 Feb 2004 21:33:46 -0800 Received: from 157.54.5.25 by INET-VRS-03.redmond.corp.microsoft.com (InterScan E-Mail VirusWall NT); Sun, 22 Feb 2004 21:33:44 -0800 Received: from RED-MSG-52.redmond.corp.microsoft.com ([157.54.12.12]) by inet-hub-03.redmond.corp.microsoft.com with Microsoft SMTPSVC(6.0.3790.0); Sun, 22 Feb 2004 21:34:24 -0800 X-MimeOLE: Produced By Microsoft Exchange V6.5.7165.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Subject: [hebrew] Re: Adjusting the location of taamim and nikkud Date: Sun, 22 Feb 2004 21:33:38 -0800 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [hebrew] Adjusting the location of taamim and nikkud Thread-Index: AcP0uk9iUnGs49g+QBq/Ke/DJ+LX0QFEzaKQ From: "Peter Constable" To: X-OriginalArrivalTime: 23 Feb 2004 05:34:24.0112 (UTC) FILETIME=[B1FB4700:01C3F9CE] Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id i1N5Xoo18846 X-archive-position: 1154 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: petercon@microsoft.com Precedence: bulk X-list: hebrew > From: hebrew-bounce@unicode.org [mailto:hebrew-bounce@unicode.org] On > Behalf Of David Grossman > A second field of interest (in addition to my other email sent at this > time) > is the (im)possibility of adjusting the location of taamim or nikkud on > letters. > > This is important, because certain vowels or Taamim do not fall on the > correct part of the letter (or they do not fall on an esthetic part). In > addition, they may create a space. > > A good solution would be to build in the location together with the > creation > of the font. The font technologies with which Unicode is intended to be used do, in fact, do exactly this: using font technologies like AAT, Graphite and OpenType, fonts can (and should) contain information regarding correct positioning of diacritic marks such as nikud and taamim. What these technologies don't particularly lend themselves to is giving users a way to nudge marks this way or that. But when they are implemented properly, users generally shouldn't ever need to do that. Try out fonts designed to support Biblical Hebrew using Unicode and one of these font technologies, such as the SBL's font or Ezra SIL (both of which use OpenType); I expect you'll find the mark positioning is at least acceptable for what you're needed, if not excellent. Peter Constable From davidg@macam.ac.il Mon Feb 23 18:23:42 2004 Received: with ECARTIS (v1.0.0; list hebrew); Mon, 23 Feb 2004 18:23:43 -0500 (EST) Received: from mail.macam.ac.il (mail.mofet.macam98.ac.il [192.114.206.40]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1NNNgk11570 for ; Mon, 23 Feb 2004 18:23:42 -0500 Received: from mail.macam.ac.il (localhost.localdomain [127.0.0.1]) by mail.macam.ac.il (8.12.8/8.12.8) with ESMTP id i1NNCoUM013955 for ; Tue, 24 Feb 2004 01:12:50 +0200 Received: from b199323 (dailin83.dailin2.macam98.ac.il [192.114.208.83])by m ail.macam.ac.il (8.12.8/8.12.8) with SMTP id i1NNCkuw013934for ; Tue, 24 Feb 2004 01:12:49 +0200 Message-ID: <000b01c3fa63$54546830$53d072c0@b199323> From: "David Grossman" To: References: Subject: [hebrew] Re: Taamei Hamikra and vowels combined Date: Tue, 24 Feb 2004 01:16:50 +0200 MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2600.0000 X-MIMEOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-imss-version: 2.0 X-imss-result: Passed X-imss-scores: Clean:34.60894 C:9 M:1 S:5 R:5 X-imss-settings: Baseline:4 C:4 M:4 S:4 R:4 (0.1000 0.1000) X-archive-position: 1156 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: davidg@macam.ac.il Precedence: bulk X-list: hebrew Thank you, Peter. I'll try to explain the issues. 1. Khaf Sofit with a dot in it and a kametz under it (but not Kaf at the end of a word, which does not use the final form) can be very annoying. It needs to be adjusted separately for each font. 2. Yetiv (one of the taamim) comes before the word. However, it should not "shove" the existing vowel out of the way. 3. Random taamim, cholams and mapiks look OK in smaller typefaces, but awful when enlarged. Either they slide around when the letter is enlarged, or else the incorrect placement is not noticeable when small. 4. The left dot on the Sin merges or nearly merges with the dot on a cholam in certain cases. 5. The Rafeh on top of some Yiddish letters merges with the letter or does not appear at all. Those issues are at the top of my head at this moment. Since it is likely that nothing can be done about these quirks, there should be an easy option to manually move these items. Peter, if I don't like the spacing of headline-size letters in standard typefaces, then I can adjust the kerning. That kerning adjustment is built into any decent word processor. It helps users move letters *horizontally*. Furthermore, if I don't like the height of any letter (especially in formulas) I can adjust that height. I can also adjust the height with numbered footnote markers. If I'm not mistaken, I can also make these adjustments with HTML. Those adjustments help users move letters *vertically*. The upshot is that users can move any character up, down, right, or left. I want to recommend that the technology that allows characters to be moved in this way be expanded, so that users can do the same with nekudot and taamim. Is this request doable? David Grossman > It would probably be much faster to simply ask your questions and have > someone summarize prior discussion; one thing this list has *not* been > known for is succinct statements free of repetition. ;=) > > > > Peter Constable From peterkirk@qaya.org Tue Feb 24 18:38:08 2004 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 24 Feb 2004 18:38:09 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1ONc8h29551 for ; Tue, 24 Feb 2004 18:38:08 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id AA4F0415699; Tue, 24 Feb 2004 23:37:46 +0000 (GMT) Message-ID: <403BE05C.5080705@qaya.org> Date: Tue, 24 Feb 2004 15:38:04 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: hebrew@unicode.org Cc: Jungshik Shin Subject: [hebrew] Inverted nun HTML display problem Content-Type: multipart/mixed; boundary="------------030709050104020207080509" X-archive-position: 1157 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew This is a multi-part message in MIME format. --------------030709050104020207080509 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit The coding manuals for the SBL Hebrew and Ezra SIL v.2 fonts both specify that inverted nun (nun hafukha) should be encoded as , optionally followed by U+0307 to display the dot above used in BHS. This code displays correctly in some applications on Windows 2000. But see the attached HTML snippet which fails to display as required in Internet Explorer 6, Mozilla 1.6 or Netscape 7.1 - nor in Word 2002 although the inverted nun displays correctly if pasted into a Word document. I assume that the problem is that the rendering engine, Uniscribe (various versions have been tried) or whatever, is not passing the CGJ character to the font, perhaps because it is assumed that CGJ does not affect the display. Does anyone know a workaround so that the inverted nun can be made to display correctly in these browsers? (Sadly the PUA code for the glyph which was in earlier versions of Ezra SIL has been removed from the latest version, perhaps because of bidi property problems which were actually not too important for this character which always appears in isolation.) I realise that there may not be a workaround because this is an abuse of CGJ, which is not supposed to affect rendering in this way. The proper solution of course is to define a new Unicode character for this nun hafukha. Such a proposal is in preparation, and I trust that we can finalise the proposal in good time for the June UTC meeting. I hope it will be possible now to let this group see the proposal and comment. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ --------------030709050104020207080509 Content-Type: text/html; name="InvertedNun.html" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="InvertedNun.html" PGhlYWQ+DQo8dGl0bGU+SW52ZXJ0ZWQgbnVuPC90aXRsZT4gIA0KPC9oZWFkPg0KPGJvZHkg Ymdjb2xvcj0iI2ZmZmZmOCIgPg0KPGgxPkludmVydGVkIG51bjwvaDE+DQo8cC8+DQo8Y2Vu dGVyPg0KTm9uLWludmVydGVkOiA8c3BhbiBkaXI9InJ0bCIgc3R5bGU9ImZvbnQtZmFtaWx5 OlNCTCBIZWJyZXc7Zm9udC1zaXplOjQwIj4mI3gwNWUwOzwvc3Bhbj4NCjwvY2VudGVyPg0K PHAvPg0KPGNlbnRlcj4NCkludmVydGVkOiA8c3BhbiBkaXI9InJ0bCIgc3R5bGU9ImZvbnQt ZmFtaWx5OlNCTCBIZWJyZXc7Zm9udC1zaXplOjQwIj4mI3gwNWUwOyYjeDAzNEY7JiN4MDMw Nzs8L3NwYW4+DQo8L2NlbnRlcj4NCjxwLz4NCjxoci8+DQoNCg== --------------030709050104020207080509-- From tiro@tiro.com Wed Feb 25 02:51:40 2004 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 25 Feb 2004 02:51:40 -0500 (EST) Received: from portal.uniserve.ca (portal.uniserve.ca [216.113.192.66]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1P7pdJ28008 for ; Wed, 25 Feb 2004 02:51:40 -0500 Received: from sec1d43.dial.uniserve.ca ([204.244.165.58] helo=tiro.com) by portal.uniserve.ca with esmtp (Exim 4.22) id 1AvtpM-000APt-Sq; Tue, 24 Feb 2004 23:51:33 -0800 Message-ID: <403C5318.60506@tiro.com> Date: Wed, 25 Feb 2004 07:47:36 +0000 From: John Hudson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Peter Kirk CC: hebrew@unicode.org Subject: [hebrew] Re: Inverted nun HTML display problem References: <403BE05C.5080705@qaya.org> In-Reply-To: <403BE05C.5080705@qaya.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1158 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: tiro@tiro.com Precedence: bulk X-list: hebrew Peter Kirk wrote: > The coding manuals for the SBL Hebrew and Ezra SIL v.2 fonts both > specify that inverted nun (nun hafukha) should be encoded as nun, CGJ>, optionally followed by U+0307 to display the dot above used > in BHS. This code displays correctly in some applications on Windows > 2000. But see the attached HTML snippet which fails to display as > required in Internet Explorer 6, Mozilla 1.6 or Netscape 7.1 - nor in > Word 2002 although the inverted nun displays correctly if pasted into a > Word document. Thanks for catching this, Peter. I'd assumed that the CGJ hack wouldn't work everywhere, and agree entirely that the proper solution is to encode the nun hafukha. In the SBL Hebrew font the nun hafukha *does* have PUA encoding; actually it is double encoded: EA01 codepoint used in some earlier fonts employed by Libronix F300 codepoint requested by SIL John Hudson -- Tiro Typeworks www.tiro.com Vancouver, BC tiro@tiro.com With a pen, text is a virtuous maiden; printed, she becomes a harlot. - Filippo della Strata, a scribe, 1473 From peterkirk@qaya.org Wed Feb 25 06:10:27 2004 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 25 Feb 2004 06:10:28 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1PBARJ01653 for ; Wed, 25 Feb 2004 06:10:27 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id 40DBD416622; Wed, 25 Feb 2004 11:10:19 +0000 (GMT) Message-ID: <403C82A1.10501@qaya.org> Date: Wed, 25 Feb 2004 03:10:25 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: John Hudson Cc: hebrew@unicode.org Subject: [hebrew] Re: Inverted nun HTML display problem References: <403BE05C.5080705@qaya.org> <403C5318.60506@tiro.com> In-Reply-To: <403C5318.60506@tiro.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1159 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 24/02/2004 23:47, John Hudson wrote: > Peter Kirk wrote: > >> The coding manuals for the SBL Hebrew and Ezra SIL v.2 fonts both >> specify that inverted nun (nun hafukha) should be encoded as > nun, CGJ>, optionally followed by U+0307 to display the dot above >> used in BHS. This code displays correctly in some applications on >> Windows 2000. But see the attached HTML snippet which fails to >> display as required in Internet Explorer 6, Mozilla 1.6 or Netscape >> 7.1 - nor in Word 2002 although the inverted nun displays correctly >> if pasted into a Word document. > > > Thanks for catching this, Peter. I'd assumed that the CGJ hack > wouldn't work everywhere, and agree entirely that the proper solution > is to encode the nun hafukha. > > In the SBL Hebrew font the nun hafukha *does* have PUA encoding; > actually it is double encoded: > > EA01 > codepoint used in some earlier fonts employed by Libronix > > F300 > codepoint requested by SIL > > > John Hudson > Thank you, John. I will recommend the user I was corresponding with to use F300 temporarily as that will work with SBL Hebrew as well as with the original version of Ezra SIL. I cannot understand why the PUA codes (which worked perfectly well) were removed from the new version of Ezra SIL, thus destroying backward compatibility. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From elaine_keown@yahoo.com Sat Feb 28 15:24:26 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 28 Feb 2004 15:24:26 -0500 (EST) Received: from web80805.mail.yahoo.com (web80805.mail.yahoo.com [66.163.170.100]) by unicode.org (8.11.6/8.11.6) with SMTP id i1SKOQJ12935 for ; Sat, 28 Feb 2004 15:24:26 -0500 Message-ID: <20040228202423.48245.qmail@web80805.mail.yahoo.com> Received: from [66.52.170.170] by web80805.mail.yahoo.com via HTTP; Sat, 28 Feb 2004 12:24:23 PST Date: Sat, 28 Feb 2004 12:24:23 -0800 (PST) From: Elaine Keown Subject: [hebrew] Re: [Fwd: RE: [Fwd: New possibilities with ZWJ and ZWNJ]] To: Peter Kirk , hebrew@unicode.org In-Reply-To: <4031269D.5090009@qaya.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 1161 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: elaine_keown@yahoo.com Precedence: bulk X-list: hebrew Elaine Keown Point Arena Dear Peter, Jony, and List: > I agree to using the ZWJ for the medial Meteg, but > to nothing else. The > medial Meteg is like a ligature of the Meteg and the > Hataf. The right Meteg > probably needs its own character. > > Jony Before much more happens with adding to Unicode Hebrew, I'd like to see opinions on how pointing and accents are used in Modern Hebrew, especially in poetry. If one reads literary Modern Hebrew, apparently marks are added to make it clear which unusual Hebrew poetry words and exotic loan words are in a text. I think someone should be in contact with professors of modern Hebrew literature with respect to this. I'm not on the list much now, my blood pressure is just too high to deal with it--Elaine __________________________________ Do you Yahoo!? New Yahoo! Photos - easier uploading and sharing. http://photos.yahoo.com/ From peterkirk@qaya.org Sat Feb 28 19:46:56 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 28 Feb 2004 19:47:01 -0500 (EST) Received: from mail.metronet.co.uk (mail.metronet.co.uk [213.162.97.75]) by unicode.org (8.11.6/8.11.6) with ESMTP id i1T0kZJ21962 for ; Sat, 28 Feb 2004 19:46:56 -0500 Received: from qaya.org (unknown [213.162.124.237]) by mail.metronet.co.uk (MetroNet Mail) with ESMTP id D2F49415D60; Sun, 29 Feb 2004 00:45:44 +0000 (GMT) Message-ID: <4041364C.5080002@qaya.org> Date: Sat, 28 Feb 2004 16:46:04 -0800 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Elaine Keown Cc: hebrew@unicode.org Subject: [hebrew] Re: [Fwd: RE: [Fwd: New possibilities with ZWJ and ZWNJ]] References: <20040228202423.48245.qmail@web80805.mail.yahoo.com> In-Reply-To: <20040228202423.48245.qmail@web80805.mail.yahoo.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 1162 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 28/02/2004 12:24, Elaine Keown wrote: > Elaine Keown > Point Arena > >Dear Peter, Jony, and List: > > > >>I agree to using the ZWJ for the medial Meteg, but >>to nothing else. The >>medial Meteg is like a ligature of the Meteg and the >>Hataf. The right Meteg >>probably needs its own character. >> >>Jony >> >> > >Before much more happens with adding to Unicode >Hebrew, I'd like to see opinions on how pointing and >accents are used in Modern Hebrew, especially in >poetry. If one reads literary Modern Hebrew, >apparently marks are added to make it clear which >unusual Hebrew poetry words and exotic loan words are >in a text. > >I think someone should be in contact with professors >of modern Hebrew literature with respect to this. > >I'm not on the list much now, my blood pressure is >just too high to deal with it--Elaine > > > > Good to hear from you, Elaine, and I hope your blood pressure comes down. Jony is in a much better position than me to look at modern Hebrew issues, so I will leave them to him, and others on the list who know modern Hebrew well. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/