From kfeuerherm@wlu.ca Thu Sep 2 13:37:55 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 13:37:55 -0500 (CDT) Received: from wlu.ca (wluw5.wlu.ca [192.54.242.118]) by unicode.org (8.12.11/8.12.11) with SMTP id i82Ibskv020323 for ; Thu, 2 Sep 2004 13:37:55 -0500 Received: from WLU-MTA by wlu.ca with Novell_GroupWise; Thu, 02 Sep 2004 14:37:54 -0400 Message-Id: X-Mailer: Novell GroupWise Internet Agent 6.5.2 Beta Date: Thu, 02 Sep 2004 14:37:15 -0400 From: "Karljurgen Feuerherm" To: Subject: [hebrew] daghesh vs shureq Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2582 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: kfeuerherm@wlu.ca Precedence: bulk X-list: hebrew Hi Browsing through Gesenius for something else, I happened to find the following comment: "Waw with Dagesh cannot in our printed texts be distinguished from a waw pointed as Shureq; in the latter case the point should stand higher up." (p. 55, n. 2). Any truth to this? According to Unicode 4.0 sub 05BD, "dagesh or mapiq" = "shuruq." K From peterkirk@qaya.org Thu Sep 2 13:52:49 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 13:52:49 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i82IqjfP021380 for ; Thu, 2 Sep 2004 13:52:49 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C2wfa-00055o-Oy; Thu, 02 Sep 2004 19:50:51 +0100 Message-ID: <41376C01.1030509@qaya.org> Date: Thu, 02 Sep 2004 19:52:49 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Karljurgen Feuerherm CC: hebrew@unicode.org Subject: [hebrew] Re: daghesh vs shureq References: In-Reply-To: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2583 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 02/09/2004 19:37, Karljurgen Feuerherm wrote: >Hi > >Browsing through Gesenius for something else, I happened to find the >following comment: > >"Waw with Dagesh cannot in our printed texts be distinguished from a >waw pointed as Shureq; in the latter case the point should stand higher >up." (p. 55, n. 2). > >Any truth to this? > >According to Unicode 4.0 sub 05BD, "dagesh or mapiq" = "shuruq." > >K > > > Good question, and one that we have looked at before. And this is one which could have some complex implications e.g. the need to define a separate (but probably only optionally distinguished cf. qamats qatan) shuruq character. I don't remember seeing any actual evidence that anyone has ever made a regular graphical distinction between dagesh and shuruq. But I would be very interested in seeing any such evidence. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From k_isoetc@yahoo.com Thu Sep 2 14:49:37 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 14:49:37 -0500 (CDT) Received: from web53804.mail.yahoo.com (web53804.mail.yahoo.com [206.190.36.199]) by unicode.org (8.12.11/8.12.11) with SMTP id i82Jnb1P002289 for ; Thu, 2 Sep 2004 14:49:37 -0500 Message-ID: <20040902194932.74917.qmail@web53804.mail.yahoo.com> Received: from [66.114.205.249] by web53804.mail.yahoo.com via HTTP; Thu, 02 Sep 2004 12:49:32 PDT Date: Thu, 2 Sep 2004 12:49:32 -0700 (PDT) From: "E. Keown" Subject: [hebrew] Re: pointing systems, relative importance To: Michael Everson , hebrew@unicode.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-archive-position: 2584 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: k_isoetc@yahoo.com Precedence: bulk X-list: hebrew Elaine Keown Philadelphia Dear Michael Everson and List: Michael Everson wrote, quite a while back: > Samaritan "case" is not "case" in the Unicode sense > where A=a. > "Majuscule" and "minuscule" are used in discussion > of Samaritan letterforms in written manuscripts, > but it does not imply case per se. I finally found two probably definitive references to 'majuscule' etc. in Samaritan. I also finally have a good list of *early* Samaritan mss..... However, I have not yet seen any of the above refs yet...If it's not too expensive, I may try to order microfiche of one ms. ---Elaine _______________________________ Do you Yahoo!? Win 1 of 4,000 free domain names from Yahoo! Enter now. http://promotions.yahoo.com/goldrush From mark@kli.org Thu Sep 2 14:56:27 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 14:56:27 -0500 (CDT) Received: from pi.meson.org (h-66-134-26-207.nycmny83.covad.net [66.134.26.207]) by unicode.org (8.12.11/8.12.11) with SMTP id i82JuQEY006978 for ; Thu, 2 Sep 2004 14:56:27 -0500 Received: (qmail 10093 invoked from network); 2 Sep 2004 19:56:21 -0000 Received: from nagas.meson.org (HELO kli.org) (1000@192.168.1.101) by pi.meson.org with SMTP; 2 Sep 2004 19:56:21 -0000 Message-ID: <41377AE5.4020502@kli.org> Date: Thu, 02 Sep 2004 15:56:21 -0400 From: "Mark E. Shoulson" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en, fr MIME-Version: 1.0 To: Karljurgen Feuerherm CC: hebrew@unicode.org Subject: [hebrew] Re: daghesh vs shureq References: In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2585 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: mark@kli.org Precedence: bulk X-list: hebrew Karljurgen Feuerherm wrote: >Hi > >Browsing through Gesenius for something else, I happened to find the >following comment: > >"Waw with Dagesh cannot in our printed texts be distinguished from a >waw pointed as Shureq; in the latter case the point should stand higher >up." (p. 55, n. 2). > >Any truth to this? > >According to Unicode 4.0 sub 05BD, "dagesh or mapiq" = "shuruq." > In *almost* all typography I've ever seen, dagesh and shuruq are indistinguishable. I did once post here (and can again) scans from Koren's prayer-book which actually does distinguish, if you look really carefully, but in the opposite direction from Gesenius: the vav-dagesh dot is higher than the shuruq. But that's way out there, very unusual typography. I have never noticed a difference in MSS, but I also never looked for one. I'm satisfied with dagesh=shuruq, though it does make some analysis difficult. ~mark From peterkirk@qaya.org Thu Sep 2 15:59:00 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 15:59:01 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i82Kx02Z031461 for ; Thu, 2 Sep 2004 15:59:00 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C2ydk-0008UK-Is; Thu, 02 Sep 2004 21:57:04 +0100 Message-ID: <41378998.8010800@qaya.org> Date: Thu, 02 Sep 2004 21:59:04 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: "Mark E. Shoulson" CC: hebrew@unicode.org Subject: [hebrew] Re: daghesh vs shureq References: <41377AE5.4020502@kli.org> In-Reply-To: <41377AE5.4020502@kli.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2586 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 02/09/2004 20:56, Mark E. Shoulson wrote: > Karljurgen Feuerherm wrote: > >> Hi >> >> Browsing through Gesenius for something else, I happened to find the >> following comment: >> >> "Waw with Dagesh cannot in our printed texts be distinguished from a >> waw pointed as Shureq; in the latter case the point should stand higher >> up." (p. 55, n. 2). >> >> Any truth to this? >> >> According to Unicode 4.0 sub 05BD, "dagesh or mapiq" = "shuruq." >> > In *almost* all typography I've ever seen, dagesh and shuruq are > indistinguishable. I did once post here (and can again) scans from > Koren's prayer-book which actually does distinguish, if you look > really carefully, but in the opposite direction from Gesenius: the > vav-dagesh dot is higher than the shuruq. But that's way out there, > very unusual typography. > > I have never noticed a difference in MSS, but I also never looked for > one. > > I'm satisfied with dagesh=shuruq, though it does make some analysis > difficult. > > ~mark > I wonder if we ought to seriously consider proposing (but not as an urgent matter) a new separate shuruq character alongside a second sheva and perhaps a second dagesh, perhaps also silluq distinct from meteg. For it does seem that these are regularly distinguished at least by one publisher. They are probably used distinctly a lot more often than some accepted Unicode characters are used at all e.g. some which were recently accepted on the basis of usage in a couple of very specialised and rather old books on African phonetics, and some Greek marks which seem to be attested in only two or three ancient MSS. These characters would of course be optional (and I would not expect them to be supported by Israeli standards), but they would meet the needs of people who wish to make certain distinctions, whether for display or for analysis. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From mark@kli.org Thu Sep 2 16:19:44 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 16:19:44 -0500 (CDT) Received: from pi.meson.org (h-66-134-26-207.nycmny83.covad.net [66.134.26.207]) by unicode.org (8.12.11/8.12.11) with SMTP id i82LJhvt000592 for ; Thu, 2 Sep 2004 16:19:44 -0500 Received: (qmail 316 invoked from network); 2 Sep 2004 01:19:40 -0000 Received: from nagas.meson.org (HELO kli.org) (1000@192.168.1.101) by pi.meson.org with SMTP; 2 Sep 2004 01:19:40 -0000 Message-ID: <41378E6A.3010905@kli.org> Date: Thu, 02 Sep 2004 17:19:38 -0400 From: "Mark E. Shoulson" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en, fr MIME-Version: 1.0 To: "E. Keown" CC: Michael Everson , hebrew@unicode.org Subject: [hebrew] Re: pointing systems, relative importance References: <20040902194932.74917.qmail@web53804.mail.yahoo.com> In-Reply-To: <20040902194932.74917.qmail@web53804.mail.yahoo.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2587 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: mark@kli.org Precedence: bulk X-list: hebrew I have Alan Crown's monographs on "majuscule" and "minuscule" Samaritan writing, and I've seen a lot of examples of both, and also how they're used and where and when. I really don't see a case distinction there: it's a font-choice. There are no rules about "OK, use majuscule for these words, or at the beginning of a sentence, or..." it's "use majuscule when you're writing majuscules and minuscules when you're writing minuscules." It works like Rashi script for Hebrew, the way I see it. It is true that I've seen a poem (contemporary) written in minuscule with the acrostic picked out in majuscule, but I've seen tricks no stranger done with Rashi and Square script, or Roman and Blackletter. In general, documents are written in majuscule or in minuscule, or maybe in one with headings of the other (which was also done with uncials and half-uncials in English, and they're not cases either). I can't find any true case-relationship anywhere, with the possible exception of typewriters, but that proves nothing at all (just a convenience). It's not case. It's just two distinct styles of writing, not even significantly more distinct than Square vs Rashi or Cursive. ~mark P.S. Now, for true case in Hebrew, one could look at the Schonfield script, at http://www.geocities.com/snortar/schonfield.html ... E. Keown wrote: > Elaine Keown > Philadelphia > >Dear Michael Everson and List: > >Michael Everson wrote, quite a while back: > > >>Samaritan "case" is not "case" in the Unicode sense >>where A=a. >>"Majuscule" and "minuscule" are used in discussion >>of Samaritan letterforms in written manuscripts, >>but it does not imply case per se. >> >> > >I finally found two probably definitive references to >'majuscule' etc. in Samaritan. I also finally have a >good list of *early* Samaritan mss..... > >However, I have not yet seen any of the above refs >yet...If it's not too expensive, I may try to order >microfiche of one ms. >---Elaine > > > >_______________________________ >Do you Yahoo!? >Win 1 of 4,000 free domain names from Yahoo! Enter now. >http://promotions.yahoo.com/goldrush > > From peterkirk@qaya.org Thu Sep 2 17:34:09 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 17:34:09 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i82MY903009713 for ; Thu, 2 Sep 2004 17:34:09 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C307q-0002Qj-PB for hebrew@unicode.org; Thu, 02 Sep 2004 23:32:15 +0100 Message-ID: <41379FE6.1040501@qaya.org> Date: Thu, 02 Sep 2004 23:34:14 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Hebrew List Subject: [hebrew] Code point allocations for HOLAM HASER FOR VAV and QAMATS QATAN References: <200409021925.i82JPGBs026135@unicode.org> In-Reply-To: <200409021925.i82JPGBs026135@unicode.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2588 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew I mentioned on 21st August my proposal in http://www.qaya.org/academic/hebrew/Holam-Qamats-Allocations.pdf to reallocate the new Holam character to the gap in the existing Hebrew vowel points at 05BA, and move QAMATS QATAN, which was provisionally allocated there, to another code point. This proposal has been formally posted as L2/04-346. I also understand that there is general agreement on this list in favour of this proposal - except for the final "NOTE". Would anyone like to help me to revise and refine this proposal (including removing the "NOTE") and join me as a joint proposer? Or would it be better simply to let this go ahead as it stands? By the way, I am not currently planning to make any further submissions concerning HEBREW POINT HOLAM HASER FOR VAV, as I have reluctantly concluded that in the current situation this proposal is the least of the evils which would be acceptable to the UTC. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Thu Sep 2 17:45:37 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 17:45:37 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i82Mjbto013892 for ; Thu, 2 Sep 2004 17:45:37 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C30Iw-0002eg-A5 for hebrew@unicode.org; Thu, 02 Sep 2004 23:43:43 +0100 Message-ID: <4137A296.3030503@qaya.org> Date: Thu, 02 Sep 2004 23:45:42 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Hebrew List Subject: [hebrew] Compatibility mappings for new Hebrew points References: <41278393.9000100@qaya.org> In-Reply-To: <41278393.9000100@qaya.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2589 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 21/08/2004 18:17, Peter Kirk wrote: > ... > 8) Because there will in any case be some "improper" texts, at least > for a transitional period, it is important that, for most processing > purposes except for rendering, "improper" texts are folded with > correctly represented ones, i.e. that the new character is treated as > equivalent to the existing HOLAM. One way of facilitating this, and > avoiding the need for additional programming and so additional > disunification costs, would be to give the new character a > compatibility decomposition to the existing one, even though (like > many existing compatibility deomposable characters) it is not strictly > a compatibility character. > I wrote the above about the proposed new HOLAM HASER FOR VAV, and no one has dissented from it. In many ways the same considerations apply to QAMATS QATAN, as well as to the other variant Hebrew points which might be proposed later, and which I mentioned in a recent posting. Would it be sensible to write a formal proposal to the UTC to add compatibility mappings from HOLAM HASER FOR VAV and QAMATS QATAN respectively to the existing HOLAM and QAMATS characters? I note that at this stage this can be done without breaking the stability policy, but this window of opportunity may close soon. But would such a proposal have a realistic chance of acceptance, given that these are not strictly compatibility characters? Would anyone like to join me in making this proposal? Or is anyone in fact already making such a proposal? -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From mark@kli.org Thu Sep 2 20:01:23 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 20:01:23 -0500 (CDT) Received: from pi.meson.org (h-66-134-26-207.nycmny83.covad.net [66.134.26.207]) by unicode.org (8.12.11/8.12.11) with SMTP id i8311MtV014340 for ; Thu, 2 Sep 2004 20:01:23 -0500 Received: (qmail 5396 invoked from network); 2 Sep 2004 05:01:19 -0000 Received: from nagas.meson.org (HELO kli.org) (1000@192.168.1.101) by pi.meson.org with SMTP; 2 Sep 2004 05:01:19 -0000 Message-ID: <4137C25D.3050008@kli.org> Date: Thu, 02 Sep 2004 21:01:17 -0400 From: "Mark E. Shoulson" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040113 X-Accept-Language: en, fr MIME-Version: 1.0 To: Peter Kirk CC: hebrew@unicode.org Subject: [hebrew] Re: daghesh vs shureq References: <41377AE5.4020502@kli.org> <41378998.8010800@qaya.org> In-Reply-To: <41378998.8010800@qaya.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2590 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: mark@kli.org Precedence: bulk X-list: hebrew Peter Kirk wrote: > I wonder if we ought to seriously consider proposing (but not as an > urgent matter) a new separate shuruq character alongside a second > sheva and perhaps a second dagesh, perhaps also silluq distinct from > meteg. For it does seem that these are regularly distinguished at > least by one publisher. They are probably used distinctly a lot more > often than some accepted Unicode characters are used at all e.g. some > which were recently accepted on the basis of usage in a couple of very > specialised and rather old books on African phonetics, and some Greek > marks which seem to be attested in only two or three ancient MSS. > These characters would of course be optional (and I would not expect > them to be supported by Israeli standards), but they would meet the > needs of people who wish to make certain distinctions, whether for > display or for analysis. Maybe. But my initial thinking is that this is the sort of thing that God created the PUA for. If people want to make distinctions not made in text, but made by semantics, so as to facilitate whatever analysis they're doing, they'd best do it on their own terms and their own time. Same with mayela vs. tipeha: it would be a lot easier if they were different characters, but they just don't seem to be. People will just have to designate some PUA char as mayela for the purposes of their analysis when they recode their texts (which they'd have to do anyway, no matter what is done). There's room for argument here; I may be being overly harsh. But at first glance it does seem to me that this is a case of Unicode (in the sense of plain-text and encoded characters) not being the right tool for the job. ~mark From rick@unicode.org Thu Sep 2 20:14:55 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 20:14:55 -0500 (CDT) Received: from izanami (ip-216-36-75-240.dsl.sjc.megapath.net [216.36.75.240]) by unicode.org (8.12.11/8.12.11) with SMTP id i831Etpm016168 for ; Thu, 2 Sep 2004 20:14:55 -0500 Message-Id: <200409030114.i831Etpm016168@unicode.org> To: hebrew@unicode.org Subject: [hebrew] Re: Compatibility mappings for new Hebrew points In-Reply-To: <41278393.9000100@qaya.org> Date: Thu, 2 Sep 2004 18:14:55 -0700 From: "rick@unicode.org" received: by Apple.Mailer (2.95.2) X-archive-position: 2591 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk X-list: hebrew Referring to Peter's proposal... > Would it be sensible to write a formal proposal to the UTC to add > compatibility mappings from HOLAM HASER FOR VAV and QAMATS QATAN > respectively to the existing HOLAM and QAMATS characters? I don't think this is worth pursuing. I don't have an agenda, I'm just thinking in general terms. Making a compatibility equivalent "in the meantime" wouldn't hit the street any earlier than the new character itself, really. So, as an interim measure, I don't see its utility that way. Aside from that, making a compatibility equivalence would (at least partially) defeat the purpose of having the thing in the first place. Compatibility decompositions are generally a signal that if we had our druthers, we'd not have encoded one or the other of the things because they're really the same thing. Rick From kenw@sybase.com Thu Sep 2 20:35:36 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 20:35:36 -0500 (CDT) Received: from inergen.sybase.com (inergen.sybase.com [192.138.151.43]) by unicode.org (8.12.11/8.12.11) with ESMTP id i831ZZPv024369; Thu, 2 Sep 2004 20:35:35 -0500 Received: from smtp2.sybase.com (sybgate2 [10.22.97.85]) by inergen.sybase.com with ESMTP id i831ZUJ15017; Thu, 2 Sep 2004 18:35:30 -0700 (PDT) Received: from olympus-dublin.sybase.com (localhost [127.0.0.1]) by smtp2.sybase.com with ESMTP id SAA21617; Thu, 2 Sep 2004 18:35:29 -0700 (PDT) Received: from birdie.sybase.com (birdie.sybase.com [10.22.85.43]) by olympus-dublin.sybase.com (8.11.7p1+Sun/8.10.2) with ESMTP id i831ZT127249; Thu, 2 Sep 2004 18:35:29 -0700 (PDT) Received: from birdie (birdie [10.22.85.43]) by birdie.sybase.com (8.11.6+Sun/8.11.6) with SMTP id i831ZTF04927; Thu, 2 Sep 2004 18:35:29 -0700 (PDT) Message-Id: <200409030135.i831ZTF04927@birdie.sybase.com> Date: Thu, 2 Sep 2004 18:35:29 -0700 (PDT) From: Kenneth Whistler Reply-To: Kenneth Whistler Subject: [hebrew] Re: Compatibility mappings for new Hebrew points To: rick@unicode.org Cc: hebrew@unicode.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Content-MD5: tUAooz3yJ4UfogEgKw/V9g== X-Mailer: dtmail 1.3.0 @(#)CDE Version 1.4.6_06 SunOS 5.8 sun4u sparc X-archive-position: 2592 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: kenw@sybase.com Precedence: bulk X-list: hebrew Rick responded: > Referring to Peter's proposal... > > > Would it be sensible to write a formal proposal to the UTC to add > > compatibility mappings from HOLAM HASER FOR VAV and QAMATS QATAN > > respectively to the existing HOLAM and QAMATS characters? > > I don't think this is worth pursuing. I don't have an agenda, I'm just > thinking in general terms. > > Making a compatibility equivalent "in the meantime" wouldn't hit the > street any earlier than the new character itself, really. So, as an interim > measure, I don't see its utility that way. In addition to these considerations, you also need to consider what would the *actual* impact of having a compatibility decomposition be. Not many processes would automatically do compatibility normalizations (NFKC, NFKD) -- they conflate too much, and are much less useful than the canonical normalizations (NFC, NFD). There are exceptions, of course, like IDN's, but are you *really* *really* worried about the use of HOLAM HASER FOR VAV in Hebrew domain names? Come on. Equivalencing and folding for Unicode characters is a much broader topic than just the list of compatibility mappings in the Unicode Character Database. Many important equivalences are *not* enshrined in compatibility mappings, and not all compatibility mappings are useful for all kinds of equivalencing or folding. This isn't some magic key which will automatically make software equate all the right things together for all purposes. --Ken From rosennej@qsm.co.il Thu Sep 2 22:49:25 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 22:49:25 -0500 (CDT) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.12.11/8.12.11) with ESMTP id i833nLjt025400 for ; Thu, 2 Sep 2004 22:49:25 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.12.9p2/8.12.9) with SMTP id i833nKHE078366 for ; Thu, 2 Sep 2004 20:49:20 -0700 (PDT) (envelope-from rosennej@qsm.co.il) Received: from [212.235.3.165] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id lNK0HSQ2 authenticated by POP; Thu, 02 Sep 2004 20:49:19 -0700 (PDT) From: "Jony Rosenne" To: Subject: [hebrew] Re: daghesh vs shureq Date: Fri, 3 Sep 2004 06:49:17 +0300 Message-ID: <000401c49168$fdb1f0e0$0401c80a@QSM4> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 In-Reply-To: <41378998.8010800@qaya.org> Importance: Normal X-archive-position: 2593 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew Use the PUA. This is what it was intended for. You can make any distinction you want. See TUS 13.5. Jony > -----Original Message----- > From: hebrew-bounce@unicode.org > [mailto:hebrew-bounce@unicode.org] On Behalf Of Peter Kirk > Sent: Thursday, September 02, 2004 11:59 PM > To: Mark E. Shoulson > Cc: hebrew@unicode.org > Subject: [hebrew] Re: daghesh vs shureq > > > On 02/09/2004 20:56, Mark E. Shoulson wrote: > > > Karljurgen Feuerherm wrote: > > > >> Hi > >> > >> Browsing through Gesenius for something else, I happened > to find the > >> following comment: > >> > >> "Waw with Dagesh cannot in our printed texts be > distinguished from a > >> waw pointed as Shureq; in the latter case the point should stand > >> higher up." (p. 55, n. 2). > >> > >> Any truth to this? > >> > >> According to Unicode 4.0 sub 05BD, "dagesh or mapiq" = "shuruq." > >> > > In *almost* all typography I've ever seen, dagesh and shuruq are > > indistinguishable. I did once post here (and can again) scans from > > Koren's prayer-book which actually does distinguish, if you look > > really carefully, but in the opposite direction from Gesenius: the > > vav-dagesh dot is higher than the shuruq. But that's way > out there, > > very unusual typography. > > > > I have never noticed a difference in MSS, but I also never > looked for > > one. > > > > I'm satisfied with dagesh=shuruq, though it does make some analysis > > difficult. > > > > ~mark > > > I wonder if we ought to seriously consider proposing (but not as an > urgent matter) a new separate shuruq character alongside a > second sheva > and perhaps a second dagesh, perhaps also silluq distinct from meteg. > For it does seem that these are regularly distinguished at > least by one > publisher. They are probably used distinctly a lot more often > than some > accepted Unicode characters are used at all e.g. some which were > recently accepted on the basis of usage in a couple of very > specialised > and rather old books on African phonetics, and some Greek marks which > seem to be attested in only two or three ancient MSS. These > characters > would of course be optional (and I would not expect them to > be supported > by Israeli standards), but they would meet the needs of > people who wish > to make certain distinctions, whether for display or for analysis. > > -- > Peter Kirk > peter@qaya.org (personal) > peterkirk@qaya.org (work) > http://www.qaya.org/ > > > > > From rosennej@qsm.co.il Thu Sep 2 23:17:05 2004 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Sep 2004 23:17:05 -0500 (CDT) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.12.11/8.12.11) with ESMTP id i834H4Yb003405 for ; Thu, 2 Sep 2004 23:17:05 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.12.9p2/8.12.9) with SMTP id i834H3HE082646 for ; Thu, 2 Sep 2004 21:17:03 -0700 (PDT) (envelope-from rosennej@qsm.co.il) Received: from [212.235.3.165] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id xUL0TZR2 authenticated by POP; Thu, 02 Sep 2004 21:17:02 -0700 (PDT) From: "Jony Rosenne" To: Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Date: Fri, 3 Sep 2004 07:16:59 +0300 Message-ID: <000501c4916c$dcc59bd0$0401c80a@QSM4> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 In-Reply-To: <200409030114.i831Etpm016168@unicode.org> Importance: Normal X-archive-position: 2594 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew > -----Original Message----- > From: hebrew-bounce@unicode.org > [mailto:hebrew-bounce@unicode.org] On Behalf Of rick@unicode.org > Sent: Friday, September 03, 2004 4:15 AM > To: hebrew@unicode.org > Subject: [hebrew] Re: Compatibility mappings for new Hebrew points > > > Referring to Peter's proposal... > > > Would it be sensible to write a formal proposal to the UTC to add > > compatibility mappings from HOLAM HASER FOR VAV and QAMATS QATAN > > respectively to the existing HOLAM and QAMATS characters? > > I don't think this is worth pursuing. I don't have an agenda, > I'm just > thinking in general terms. > > Making a compatibility equivalent "in the meantime" wouldn't hit the > street any earlier than the new character itself, really. So, > as an interim > measure, I don't see its utility that way. > > Aside from that, making a compatibility equivalence would (at least > partially) defeat the purpose of having the thing in the > first place. > Compatibility decompositions are generally a signal that if > we had our > druthers, we'd not have encoded one or the other of the > things because > they're really the same thing. It should be canonical equivalence. Compatibility equivalence is a compromise. Anything else means "Be careful when you use these characters. They are only for presentation, not for interchange". Jony > > Rick > > > > From peterkirk@qaya.org Fri Sep 3 04:16:04 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 04:16:04 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i839G4os024865 for ; Fri, 3 Sep 2004 04:16:04 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C3A95-0006Cx-Ib; Fri, 03 Sep 2004 10:14:12 +0100 Message-ID: <41383657.2020307@qaya.org> Date: Fri, 03 Sep 2004 10:16:07 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: "Mark E. Shoulson" CC: hebrew@unicode.org Subject: [hebrew] Re: daghesh vs shureq References: <41377AE5.4020502@kli.org> <41378998.8010800@qaya.org> <4137C25D.3050008@kli.org> In-Reply-To: <4137C25D.3050008@kli.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2595 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 03/09/2004 02:01, Mark E. Shoulson wrote: > ... > Maybe. But my initial thinking is that this is the sort of thing that > God created the PUA for. ... Understood. But if God had really created the PUA, he would have made it usable for the RTL languages he prefers to communicate in. :-) As it is, the PUA works only for LTR base characters and is hopeless for combining marks used in RTL languages. > ... If people want to make distinctions not made in text, but made by > semantics, so as to facilitate whatever analysis they're doing, they'd > best do it on their own terms and their own time. Same with mayela > vs. tipeha: it would be a lot easier if they were different > characters, but they just don't seem to be. People will just have to > designate some PUA char as mayela for the purposes of their analysis > when they recode their texts (which they'd have to do anyway, no > matter what is done). I am not talking about characters which are never distinguished in rendering (that's a very different matter), but about characters which at least one important publisher has chosen to represent with distinct glyphs. Other characters have been accepted for Unicode on the basis of consistent use by a single publisher e.g. th with strikethrough (if I remember correctly) which is attested in just one range of English dictionaries. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Fri Sep 3 04:18:54 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 04:18:54 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i839IsN1025315; Fri, 3 Sep 2004 04:18:54 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C3ABr-0006Lm-7X; Fri, 03 Sep 2004 10:17:03 +0100 Message-ID: <41383705.9090400@qaya.org> Date: Fri, 03 Sep 2004 10:19:01 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: "rick@unicode.org" CC: hebrew@unicode.org Subject: [hebrew] Re: Compatibility mappings for new Hebrew points References: <200409030114.i831Etpm016168@unicode.org> In-Reply-To: <200409030114.i831Etpm016168@unicode.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2596 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 03/09/2004 02:14, rick@unicode.org wrote: >Referring to Peter's proposal... > > > >>Would it be sensible to write a formal proposal to the UTC to add >>compatibility mappings from HOLAM HASER FOR VAV and QAMATS QATAN >>respectively to the existing HOLAM and QAMATS characters? >> >> > >I don't think this is worth pursuing. I don't have an agenda, I'm just >thinking in general terms. > >Making a compatibility equivalent "in the meantime" wouldn't hit the >street any earlier than the new character itself, really. So, as an interim >measure, I don't see its utility that way. > > Rick, you misunderstood me here. I am not suggesting anything "in the meantime", but that the new characters, when they are accepted, should have a permanent compatibility mapping to the existing one. >Aside from that, making a compatibility equivalence would (at least >partially) defeat the purpose of having the thing in the first place. >Compatibility decompositions are generally a signal that if we had our >druthers, we'd not have encoded one or the other of the things because >they're really the same thing. > > I'll answer this point along with Ken's. > Rick > > > > > > > -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Fri Sep 3 04:39:04 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 04:39:04 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i839cuVj029458 for ; Fri, 3 Sep 2004 04:39:04 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C3AVH-0006kf-DQ; Fri, 03 Sep 2004 10:37:07 +0100 Message-ID: <41383BB9.5070707@qaya.org> Date: Fri, 03 Sep 2004 10:39:05 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Jony Rosenne CC: hebrew@unicode.org Subject: [hebrew] Re: Compatibility mappings for new Hebrew points References: <000501c4916c$dcc59bd0$0401c80a@QSM4> In-Reply-To: <000501c4916c$dcc59bd0$0401c80a@QSM4> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2597 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 03/09/2004 05:16, Jony Rosenne wrote: >> ... >> >>Aside from that, making a compatibility equivalence would (at least >>partially) defeat the purpose of having the thing in the >>first place. >>Compatibility decompositions are generally a signal that if >>we had our >>druthers, we'd not have encoded one or the other of the >>things because >>they're really the same thing. >> >> > >It should be canonical equivalence. ... > But making a canonical equivalence would *totally* defeat the purpose of having the thing in the first place, by defining it as identical to what already exists. Even rendering engines are supposed to fold canonically equivalent characters, so they cannot be used for distinctions even in presentation only. >... Compatibility equivalence is a >compromise. ... > Agreed. In this case it should be taken as meaning something like that these are special purpose presentation variants, perhaps that they are the sort of thing which should have been represented with variation selectors but couldn't be because these are not usable with combining characters. I understand that this is not what compatibility equivalence is intended for. But there are other cases where compatibility equivalence has been used for variant characters where there is no question of the characters being encoded only to cover up past mistakes, for example 017F long s and Greek beta, theta, phi etc symbols. If long s can have a compatibility equivalence to regular s (although the two s forms are used contrastively in texts where both are used e.g. to disambiguate the two senses of German "Wachstube"), then there is no good reason why the new Hebrew points cannot have compatibility equivalents to the existing ones. >... Anything else means "Be careful when you use these characters. >They are only for presentation, not for interchange". > > -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Fri Sep 3 05:09:17 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 05:09:17 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i83A9HlG003242 for ; Fri, 3 Sep 2004 05:09:17 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C3Aye-0007kv-5E; Fri, 03 Sep 2004 11:07:28 +0100 Message-ID: <413842C7.9040205@qaya.org> Date: Fri, 03 Sep 2004 11:09:11 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Kenneth Whistler CC: hebrew@unicode.org Subject: [hebrew] Re: Compatibility mappings for new Hebrew points References: <200409030135.i831ZTF04927@birdie.sybase.com> In-Reply-To: <200409030135.i831ZTF04927@birdie.sybase.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2598 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 03/09/2004 02:35, Kenneth Whistler wrote: > ... > > >In addition to these considerations, you also need to consider >what would the *actual* impact of having a compatibility >decomposition be. Not many processes would automatically >do compatibility normalizations (NFKC, NFKD) -- they conflate >too much, and are much less useful than the canonical >normalizations (NFC, NFD). There are exceptions, of >course, like IDN's, but are you *really* *really* worried >about the use of HOLAM HASER FOR VAV in Hebrew domain names? >Come on. > > My main concern is in fact with collation, see below, not with IDN's. But then, if Hebrew points are to be permitted at all in IDN's there is a real chance that someone will want to use HHFV in one, either because it is the correct spelling of their personal or business etc name (mitzvot.org is registered; mitzvot.com is currently for sale but has been registered; these and their Hebrew equivalents with HHFV are obvious choices for Jewish religious sites), or because they are trying to spoof an existing one. So, although this was not my primary concern, it is a valid one. Can the Israeli domain registry solve this simply by banning HHFV etc in all domain names? That would be one solution; compatibility equivalence would be another one. >Equivalencing and folding for Unicode characters is a much >broader topic than just the list of compatibility >mappings in the Unicode Character Database. Many important >equivalences are *not* enshrined in compatibility mappings, >and not all compatibility mappings are useful for all >kinds of equivalencing or folding. This isn't some >magic key which will automatically make software equate all >the right things together for all purposes. > > > I realise that compatibility mapping does not in fact do everything which might need to be done for correct processing of these characters - although it does give a clear indication to software developers that these characters should be considered equivalent, much clearer than a text note which might not be understood by those who don't know the script. This is why I was pressing before for making the distinctions in a way which is supposed to make all non-rendering software (unless specially set up) apply the folding - adding default ignorable characters. I realise that the pairs of characters need to be treated as equivalent in many additional ways. Of course many of these equivalences come automatically by giving them the same properties. There may be others which should ideally be defined in the standard. For example, in general in every case where HOLAM is explicitly mentioned in the standard, its data tables, its technical reports etc, HHFV should be listed together with it, and similarly for QAMATS QATAN etc. There is to me one really significant advantage to be gained by compatibility equivalence, and that is that it brings the characters together for collation, in the DUCET. When we discussed Greek ARCHAIC KOPPA a few months ago (a case where a compatibility equivalence probably should have been defined but it was not), Ken, you made it clear to me that DUCET equivalence comes automatically with compatibility equivalence, but is hard to add to DUCET where there is no compatibility equivalence. Well, I and probably others consider it very important that at least for collation purposes HHFV is by default folded with HOLAM, and QAMATS QATAN with QAMATS etc. And it seems that this folding will automatically become the default if there is a compatibility equivalence; whereas it would be a hard fight to have this folding added to DUCET is there is no compatibility equivalence. While I remember, an additional compatibility equivalence which should be considered is ATNAH HAFUKH to YERAH BEN YOMO. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From kfeuerherm@wlu.ca Fri Sep 3 06:57:49 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 06:57:49 -0500 (CDT) Received: from wlu.ca (wluw5.wlu.ca [192.54.242.118]) by unicode.org (8.12.11/8.12.11) with SMTP id i83BvmjP018036 for ; Fri, 3 Sep 2004 06:57:49 -0500 Received: from WLU-MTA by wlu.ca with Novell_GroupWise; Fri, 03 Sep 2004 07:57:48 -0400 Message-Id: X-Mailer: Novell GroupWise Internet Agent 6.5.2 Beta Date: Fri, 03 Sep 2004 07:57:33 -0400 From: "Karljurgen Feuerherm" To: Subject: [hebrew] Re: daghesh vs shureq Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Disposition: inline X-archive-position: 2599 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: kfeuerherm@wlu.ca Precedence: bulk X-list: hebrew Sure. But that of course begs the original question--whether or not real plain text does make the distinction (just to be overly picky rather than overly harsh... :) ) K >>> "Mark E. Shoulson" 02/09/2004 9:01:17 pm >>> (Re shuruq vs daghesh and PUA) If people want to make distinctions not made in text, but made by semantics, so as to facilitate whatever analysis they're doing, they'd best do it on their own terms and their own time. ~mark From verdy_p@wanadoo.fr Fri Sep 3 13:06:06 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 13:06:06 -0500 (CDT) Received: from mwinf0108.wanadoo.fr (smtp1.wanadoo.fr [193.252.22.30]) by unicode.org (8.12.11/8.12.11) with ESMTP id i83I65HU027199 for ; Fri, 3 Sep 2004 13:06:06 -0500 Received: from VENGEROV (ANantes-252-1-37-20.w82-126.abo.wanadoo.fr [82.126.66.20]) by mwinf0108.wanadoo.fr (SMTP Server) with ESMTP id 8CA3724004C4; Fri, 3 Sep 2004 20:05:57 +0200 (CEST) Message-ID: <02ef01c491e0$a9a15180$14427e52@VENGEROV> From: "Philippe Verdy" To: "Jony Rosenne" Cc: "Hebrew List" References: <000401c49168$fdb1f0e0$0401c80a@QSM4> Subject: [hebrew] Re: daghesh vs shureq Date: Fri, 3 Sep 2004 19:51:38 +0200 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 X-archive-position: 2600 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew From: "Jony Rosenne" > Use the PUA. This is what it was intended for. You can make any > distinction > you want. See TUS 13.5. If this is a private need for such distinction, based on private considerations on Hebrew lexical or semantic or phonetic or grammatical distinctions, then yes a PUA is right there. But I would use a PUA not to encode a distinct character, but to annotate it, after encoding the default standard (unified) character. This way, an application that don't understand or know this private convention will still be able to handle that text, by ignoring those annotation PUAs. The only poblem is that PUAs are, by default, only base characters, i.e. starters of combining sequences. You can't annotate easily a combining character that is part of a longer combining sequence, unless you insert those annotation PUAs at end of the whole combining sequence (this solution is still possible, if PUAs are correctly defined to designate unambiguously the combining character or base character to which they apply in the whole combining sequence: for example the SHURUQ versus DAGESH annotation could use two PUAs whose meaning would be significant for the standard DAGESH combining character that is part of the previous combining sequence). However, if the PUA implies a distinct (and coherent) rendering in texts that must include the distinction between distinct occurences of each of the variant, this is not an annotation, and this merits some other encoding solutions. This is the king of issue that occur with QAMATS QATAN, or with VAV HALUMA versus HOLAM MALE. In that case, an author could still use privately a solution based on ZW(N)J or CGJ, even if the resulting encoding is not one supported by the documented Unicode conventions: this won't be an illegal text, but it will use a convention which will have undefined semantics in Unicode. The risk exists however that Unicode documents a new standard semantic for this usage, which would break such existing private conventions (Fonts or renderers using these past conventions would need to be upgraded, or modified so that it will include some "feature" which will be off by default, be activable if a user needs the past legacy encoding conventions when rendering legacy texts). From kenw@sybase.com Fri Sep 3 14:10:13 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 14:10:13 -0500 (CDT) Received: from inergen.sybase.com (inergen.sybase.com [192.138.151.43]) by unicode.org (8.12.11/8.12.11) with ESMTP id i83JAC5J001601 for ; Fri, 3 Sep 2004 14:10:12 -0500 Received: from smtp2.sybase.com (sybgate2 [10.22.97.85]) by inergen.sybase.com with ESMTP id i83JA7J13000; Fri, 3 Sep 2004 12:10:07 -0700 (PDT) Received: from olympus-dublin.sybase.com (localhost [127.0.0.1]) by smtp2.sybase.com with ESMTP id MAA14802; Fri, 3 Sep 2004 12:10:06 -0700 (PDT) Received: from birdie.sybase.com (birdie.sybase.com [10.22.85.43]) by olympus-dublin.sybase.com (8.11.7p1+Sun/8.10.2) with ESMTP id i83JA6104849; Fri, 3 Sep 2004 12:10:06 -0700 (PDT) Received: from birdie (birdie [10.22.85.43]) by birdie.sybase.com (8.11.6+Sun/8.11.6) with SMTP id i83JA0F05701; Fri, 3 Sep 2004 12:10:06 -0700 (PDT) Message-Id: <200409031910.i83JA0F05701@birdie.sybase.com> Date: Fri, 3 Sep 2004 12:10:00 -0700 (PDT) From: Kenneth Whistler Reply-To: Kenneth Whistler Subject: [hebrew] Re: Compatibility mappings for new Hebrew points To: peterkirk@qaya.org Cc: hebrew@unicode.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Content-MD5: 2c/IU7SAcv+dMvq9+2M3QA== X-Mailer: dtmail 1.3.0 @(#)CDE Version 1.4.6_06 SunOS 5.8 sun4u sparc X-archive-position: 2601 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: kenw@sybase.com Precedence: bulk X-list: hebrew Peter, > My main concern is in fact with collation, see below, not with IDN's. Then the concern should be addressed in the UCA and the DUCET. > But then, if Hebrew points are to be permitted at all in IDN's there is > a real chance that someone will want to use HHFV in one, either because > it is the correct spelling of their personal or business etc name > (mitzvot.org is registered; mitzvot.com is currently for sale but has > been registered; these and their Hebrew equivalents with HHFV are > obvious choices for Jewish religious sites), or because they are trying > to spoof an existing one. So, although this was not my primary concern, > it is a valid one. Can the Israeli domain registry solve this simply by > banning HHFV etc in all domain names? That would be one solution; > compatibility equivalence would be another one. Isn't it more likely that the Israeli domain registry will simply not allow points at all in domain names? I mean, the potential problems for having pointed and unpointed versions of the "same" domain name seem to me to be an obviously orders of magnitude more serious issue than worrying about HOLAM and HHFV as a *particular* case. > I realise that compatibility mapping does not in fact do everything > which might need to be done for correct processing of these characters - > although it does give a clear indication to software developers that > these characters should be considered equivalent, much clearer than a > text note which might not be understood by those who don't know the > script. But HHFV is only going to be used by people who *do* know the script. I realize that software implementers may not be in the same situation in many cases, but the Hebrew points, in most cases, are going to be implemented by font developers and application makers who are already conversant with the script and who would be expected to pay attention to these issues. Focussing on getting them the appropriate information would seem to me to be in order, and that would be best addressed by a comprehensive technical note on issues for Hebrew point and accent representation and rendering, rather than worrying about one particular compatibility equivalence. > There is to me one really significant advantage to be gained by > compatibility equivalence, and that is that it brings the characters > together for collation, in the DUCET. When we discussed Greek ARCHAIC > KOPPA a few months ago (a case where a compatibility equivalence > probably should have been defined but it was not), Ken, you made it > clear to me that DUCET equivalence comes automatically with > compatibility equivalence, but is hard to add to DUCET where there is no > compatibility equivalence. Actually, for some kinds of equivalences it isn't *technically* difficult to add them to DUCET in the absence of a formal compatibility equivalence. The software that generates the DUCET from basic Unicode data has mechanisms that recognize artificially specified "compatibility" equivalences, in addition to the normative compatibility equivalences listed in UnicodeData.txt. It does so precisely to handle instances like this, where desired collation behavior may require conflating two characters by default, even when they do not have a canonical or compatibility mapping in UnicodeData.txt. The difficulty is more clerical and procedural -- coming to consensus in the committee as to what the desired default is for newly added characters and keeping track of all the decisions and opinions when rolling up new versions of DUCET for repertoire additions. > Well, I and probably others consider it very > important that at least for collation purposes HHFV is by default folded > with HOLAM, and QAMATS QATAN with QAMATS etc. O.k. -- noted. > And it seems that this > folding will automatically become the default if there is a > compatibility equivalence; whereas it would be a hard fight to have this > folding added to DUCET is there is no compatibility equivalence. > > While I remember, an additional compatibility equivalence which should > be considered is ATNAH HAFUKH to YERAH BEN YOMO. The *best* way to make this happen is to develop a specific proposal for equivalencing in the DUCET for these *specific* instances, and present it to the UTC for decision. At that point the UTC will have a mechanism, through minutes and action items, to keep track of such decisions and roll them into future versions of DUCET. Just expressing an opinion on this list as to what the collation behavior should be won't, by itself, cause anything to happen. That said, I'm not necessarily opposed to having a compatibility mapping defined in UnicodeData.txt for HHFV. But the appropriate context for ensuring that *that* happens will be the eventual beta review of the data files for the next version of the Unicode Standard (probably available early next year). When that review starts, feedback specifically on those data files can then be collected and summarized for UTC decisions on tweaking the data files before their final release. Everyone on the Hebrew list should be watching for that time and then review very carefully all character properties associated with any Hebrew character additions to the standard. --Ken From rosennej@qsm.co.il Fri Sep 3 14:30:33 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 14:30:33 -0500 (CDT) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.12.11/8.12.11) with ESMTP id i83JUXKF007249 for ; Fri, 3 Sep 2004 14:30:33 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.12.9p2/8.12.9) with SMTP id i83JUVHE015575 for ; Fri, 3 Sep 2004 12:30:32 -0700 (PDT) (envelope-from rosennej@qsm.co.il) Received: from [217.132.174.39] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id 8340e7A2 authenticated by POP; Fri, 03 Sep 2004 12:30:30 -0700 (PDT) From: "Jony Rosenne" To: Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Date: Fri, 3 Sep 2004 22:30:27 +0300 Message-ID: <000601c491ec$78809aa0$0401c80a@QSM4> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 In-Reply-To: <200409031910.i83JA0F05701@birdie.sybase.com> Importance: Normal X-archive-position: 2602 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew > -----Original Message----- > From: hebrew-bounce@unicode.org > [mailto:hebrew-bounce@unicode.org] On Behalf Of Kenneth Whistler > Sent: Friday, September 03, 2004 10:10 PM > To: peterkirk@qaya.org > Cc: hebrew@unicode.org > Subject: [hebrew] Re: Compatibility mappings for new Hebrew points > ... > > > But HHFV is only going to be used by people who *do* know the > script. I realize that software implementers may not be in > the same situation in many cases, but the Hebrew points, in > most cases, are going to be implemented by font developers > and application makers who are already conversant with the > script and who would be expected to pay attention to these > issues. Focussing on getting them the appropriate information > would seem to me to be in order, and that would be best > addressed by a comprehensive technical note on issues for > Hebrew point and accent representation and rendering, rather > than worrying about one particular compatibility equivalence. > People who *do* know the Hebrew script will not accept the HHFV proposal. If font makes feel bound to implement it they may have a problem with most of their Hebrew users. ... > --Ken > > > > > From everson@evertype.com Fri Sep 3 14:57:43 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 14:57:43 -0500 (CDT) Received: from ni-mail2.dna.utvinternet.net (ni-mail2.dna.utvinternet.net [194.46.8.26]) by unicode.org (8.12.11/8.12.11) with ESMTP id i83Jvf8c016356 for ; Fri, 3 Sep 2004 14:57:42 -0500 Received: from [10.0.1.3] (unverified [195.218.109.225]) by ni-mail2.dna.utvinternet.net (Vircom SMTPRS 3.2.315.0) with ESMTP id for ; Fri, 3 Sep 2004 20:57:40 +0100 Mime-Version: 1.0 X-Sender: evr001@mail.dna.ie Message-Id: In-Reply-To: <000601c491ec$78809aa0$0401c80a@QSM4> References: <000601c491ec$78809aa0$0401c80a@QSM4> Date: Fri, 3 Sep 2004 20:56:15 +0100 To: From: Michael Everson Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-archive-position: 2603 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 22:30 +0300 2004-09-03, Jony Rosenne wrote: >People who *do* know the Hebrew script will not accept the HHFV proposal. Why? Be specific, and back up your claims. -- Michael Everson * * Everson Typography * * http://www.evertype.com From rosennej@qsm.co.il Fri Sep 3 17:22:44 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 17:22:44 -0500 (CDT) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.12.11/8.12.11) with ESMTP id i83MMh8x010116 for ; Fri, 3 Sep 2004 17:22:44 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.12.9p2/8.12.9) with SMTP id i83MMgHE056590 for ; Fri, 3 Sep 2004 15:22:42 -0700 (PDT) (envelope-from rosennej@qsm.co.il) Received: from [217.132.174.39] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id iiE0EnK2 authenticated by POP; Fri, 03 Sep 2004 15:22:41 -0700 (PDT) From: "Jony Rosenne" To: Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Date: Sat, 4 Sep 2004 01:22:37 +0300 Message-ID: <000f01c49204$85cbb1a0$0401c80a@QSM4> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 In-Reply-To: Importance: Normal X-archive-position: 2604 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew > -----Original Message----- > From: hebrew-bounce@unicode.org > [mailto:hebrew-bounce@unicode.org] On Behalf Of Michael Everson > Sent: Friday, September 03, 2004 10:56 PM > To: hebrew@unicode.org > Subject: [hebrew] Re: Compatibility mappings for new Hebrew points > > > At 22:30 +0300 2004-09-03, Jony Rosenne wrote: > > >People who *do* know the Hebrew script will not accept the HHFV > >proposal. > > Why? Be specific, and back up your claims. I see no need to repeat myself. Jony > -- > Michael Everson * * Everson Typography * * http://www.evertype.com > > From everson@evertype.com Fri Sep 3 17:34:11 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 17:34:11 -0500 (CDT) Received: from ni-mail3.dna.utvinternet.net (ni-mail3.dna.utvinternet.net [194.46.8.37]) by unicode.org (8.12.11/8.12.11) with ESMTP id i83MY9R4010949 for ; Fri, 3 Sep 2004 17:34:11 -0500 Received: from [10.0.1.3] (unverified [195.218.109.225]) by ni-mail3.dna.utvinternet.net (Vircom SMTPRS 3.2.315.0) with ESMTP id ; Fri, 3 Sep 2004 23:34:16 +0100 Mime-Version: 1.0 X-Sender: evr001@mail.dna.ie Message-Id: In-Reply-To: <000f01c49204$85cbb1a0$0401c80a@QSM4> References: <000f01c49204$85cbb1a0$0401c80a@QSM4> Date: Fri, 3 Sep 2004 23:29:36 +0100 To: "Jony Rosenne" From: Michael Everson Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Cc: Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-archive-position: 2605 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 01:22 +0300 2004-09-04, Jony Rosenne wrote: > > >People who *do* know the Hebrew script will not accept the HHFV >> >proposal. >> >> Why? Be specific, and back up your claims. > >I see no need to repeat myself. Fine, then. You have never once given an actual argument that made sense or was backed up by anything. Even logic. People who *do* make the kinds of distinctions like QAMATS QATAN know Hebrew script perfectly well. They are Israeli publishers. People who *do* make a distiction with regard to two dot placements and VAV know Hebrew script perfectly well. They are Israeli publishers. HHFV makes the distinction they make while leaving 99% of current data COMPLETELY UNTOUCHED. -- Michael Everson * * Everson Typography * * http://www.evertype.com From peterkirk@qaya.org Fri Sep 3 18:00:38 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 18:00:38 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i83N0cel014954 for ; Fri, 3 Sep 2004 18:00:38 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C3N17-0006my-EK; Fri, 03 Sep 2004 23:58:49 +0100 Message-ID: <4138F791.6010502@qaya.org> Date: Sat, 04 Sep 2004 00:00:33 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Kenneth Whistler CC: hebrew@unicode.org Subject: [hebrew] Re: Compatibility mappings for new Hebrew points References: <200409031910.i83JA0F05701@birdie.sybase.com> In-Reply-To: <200409031910.i83JA0F05701@birdie.sybase.com> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2606 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 03/09/2004 20:10, Kenneth Whistler wrote: > ... > > >The *best* way to make this happen is to develop a specific >proposal for equivalencing in the DUCET for these *specific* >instances, and present it to the UTC for decision. At that >point the UTC will have a mechanism, through minutes and >action items, to keep track of such decisions and roll them >into future versions of DUCET. Just expressing an opinion >on this list as to what the collation behavior should be >won't, by itself, cause anything to happen. > > Thank you, Ken, for all of this including what I have snipped. Well, I did make a specific proposal earlier this year for equivalencing in the DUCET, http://www.qaya.org/academic/greek/Koppa-proposal.pdf, for equivalencing U+03D8 GREEK LETTER ARCHAIC KOPPA with U+03DE GREEK LETTER KOPPA and U+03D9 GREEK SMALL LETTER ARCHAIC KOPPA with U+03DF GREEK SMALL LETTER KOPPA. (By the way, did the UTC ever look at this? I never received any feedback.) And you told me that this was unlikely to be accepted because there was no easy way to include such equivalences in DUCET when they were not compatibility equivalences. It was because of that that I decided to suggest compatibility equivalences for these Hebrew characters. But has the situation changed since then? >That said, I'm not necessarily opposed to having a compatibility >mapping defined in UnicodeData.txt for HHFV. But the appropriate >context for ensuring that *that* happens will be the eventual >beta review of the data files for the next version of the >Unicode Standard (probably available early next year). When that >review starts, feedback specifically on those data files can >then be collected and summarized for UTC decisions on tweaking >the data files before their final release. Everyone on the >Hebrew list should be watching for that time and then review >very carefully all character properties associated with any >Hebrew character additions to the standard. > > Indeed we will watch carefully. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From kenw@sybase.com Fri Sep 3 20:44:58 2004 Received: with ECARTIS (v1.0.0; list hebrew); Fri, 03 Sep 2004 20:44:58 -0500 (CDT) Received: from fm200.sybase.com (fm200.sybase.com [192.138.151.122]) by unicode.org (8.12.11/8.12.11) with ESMTP id i841ijPY000463 for ; Fri, 3 Sep 2004 20:44:58 -0500 Received: from smtp2.sybase.com (sybgate2.sybase.com [10.22.97.85]) by fm200.sybase.com with ESMTP id i841idQ17312; Fri, 3 Sep 2004 18:44:39 -0700 (PDT) Received: from olympus-dublin.sybase.com (localhost [127.0.0.1]) by smtp2.sybase.com with ESMTP id SAA00669; Fri, 3 Sep 2004 18:44:39 -0700 (PDT) Received: from birdie.sybase.com (birdie.sybase.com [10.22.85.43]) by olympus-dublin.sybase.com (8.11.7p1+Sun/8.10.2) with ESMTP id i841ic106963; Fri, 3 Sep 2004 18:44:38 -0700 (PDT) Received: from birdie (birdie [10.22.85.43]) by birdie.sybase.com (8.11.6+Sun/8.11.6) with SMTP id i841iXF05945; Fri, 3 Sep 2004 18:44:38 -0700 (PDT) Message-Id: <200409040144.i841iXF05945@birdie.sybase.com> Date: Fri, 3 Sep 2004 18:44:33 -0700 (PDT) From: Kenneth Whistler Reply-To: Kenneth Whistler Subject: [hebrew] Re: Compatibility mappings for new Hebrew points To: peterkirk@qaya.org Cc: hebrew@unicode.org MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Content-MD5: oCt4O5O4+uSqmiGD24pjgw== X-Mailer: dtmail 1.3.0 @(#)CDE Version 1.4.6_06 SunOS 5.8 sun4u sparc X-archive-position: 2607 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: kenw@sybase.com Precedence: bulk X-list: hebrew Peter, > Thank you, Ken, for all of this including what I have snipped. > > Well, I did make a specific proposal earlier this year for equivalencing > in the DUCET, http://www.qaya.org/academic/greek/Koppa-proposal.pdf, for > equivalencing U+03D8 GREEK LETTER ARCHAIC KOPPA with U+03DE GREEK LETTER > KOPPA and U+03D9 GREEK SMALL LETTER ARCHAIC KOPPA with U+03DF GREEK > SMALL LETTER KOPPA. (By the way, did the UTC ever look at this? Yes. > I never > received any feedback.) The minutes of the June meeting haven't been posted yet, but it was considered at the Toronto meeting. The upshot of that and another document were that the UTC decided to modify the DUCET weighting of the Greek letter SAN but not the weighting of KOPPA. And the reason wasn't the difficulty of actually making the change in the table, but differences of opinion regarding what the default should be for KOPPA. > And you told me that this was unlikely to be > accepted because there was no easy way to include such equivalences in > DUCET when they were not compatibility equivalences. It was because of > that that I decided to suggest compatibility equivalences for these > Hebrew characters. But has the situation changed since then? Not really. Indeed the UTC editorial committee is not tasked with coming up with a method for tracking such changes into the DUCET, and any such change for HOLAM would just be another instance. --Ken From rosennej@qsm.co.il Sat Sep 4 00:52:41 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 00:52:41 -0500 (CDT) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.12.11/8.12.11) with ESMTP id i845qec5004087 for ; Sat, 4 Sep 2004 00:52:41 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.12.9p2/8.12.9) with SMTP id i845qdHE022149 for ; Fri, 3 Sep 2004 22:52:39 -0700 (PDT) (envelope-from rosennej@qsm.co.il) Received: from [217.132.27.23] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id Cl50ipB2 authenticated by POP; Fri, 03 Sep 2004 22:52:38 -0700 (PDT) From: "Jony Rosenne" To: Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Date: Sat, 4 Sep 2004 08:52:27 +0300 Message-ID: <000601c49243$5d51e9d0$0401c80a@QSM4> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 In-Reply-To: Importance: Normal Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id i845qec5004087 X-archive-position: 2608 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew I have to repeat myself, after all. > -----Original Message----- > From: Michael Everson [mailto:everson@evertype.com] > Sent: Saturday, September 04, 2004 1:30 AM > To: Jony Rosenne > Cc: hebrew@unicode.org > Subject: [hebrew] Re: Compatibility mappings for new Hebrew points > > > At 01:22 +0300 2004-09-04, Jony Rosenne wrote: > > > > >People who *do* know the Hebrew script will not accept the HHFV > >> >proposal. > >> > >> Why? Be specific, and back up your claims. > > > >I see no need to repeat myself. > > Fine, then. You have never once given an actual argument that made > sense or was backed up by anything. Even logic. > > People who *do* make the kinds of distinctions like QAMATS QATAN know > Hebrew script perfectly well. They are Israeli publishers. > > People who *do* make a distiction with regard to two dot placements > and VAV know Hebrew script perfectly well. They are Israeli > publishers. You are not addressing the issue - it isn't whether there is a distinction, it is how to encode it. > > HHFV makes the distinction they make while leaving 99% of current > data COMPLETELY UNTOUCHED. This number is irrelevant. The few current users who make the distinction in non standard ways will be able to convert to any decision. The issue is how to make a useable optional distinction. Jony > -- > Michael Everson * * Everson Typography * * http://www.evertype.com > > From tiro@tiro.com Sat Sep 4 01:12:24 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 01:12:24 -0500 (CDT) Received: from portal.uniserve.ca (portal.uniserve.ca [216.113.192.66]) by unicode.org (8.12.11/8.12.11) with ESMTP id i846COWx005814 for ; Sat, 4 Sep 2004 01:12:24 -0500 Received: from sec2d8.dial.uniserve.ca ([204.244.165.71] helo=tiro.com) by portal.uniserve.ca with esmtp (Exim 4.22) id 1C3Tmc-0005FJ-4S for hebrew@unicode.org; Fri, 03 Sep 2004 23:12:18 -0700 Message-ID: <41395CB9.1080303@tiro.com> Date: Fri, 03 Sep 2004 23:12:09 -0700 From: John Hudson User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031205 Thunderbird/0.4 X-Accept-Language: en-us, en MIME-Version: 1.0 CC: hebrew@unicode.org Subject: [hebrew] Re: Compatibility mappings for new Hebrew points References: <000601c49243$5d51e9d0$0401c80a@QSM4> In-Reply-To: <000601c49243$5d51e9d0$0401c80a@QSM4> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 2609 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: tiro@tiro.com Precedence: bulk X-list: hebrew Jony Rosenne wrote: > The issue is how to make a useable optional distinction. I think the issue for some time now has been disagreement over what 'optional' means. John Hudson -- Tiro Typeworks www.tiro.com Vancouver, BC tiro@tiro.com Currently reading: The Mass in slow motion, by Ronald Knox Hebrew manuscripts of the Middle Ages, by Colette Sirat From rosennej@qsm.co.il Sat Sep 4 03:23:46 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 03:23:46 -0500 (CDT) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.12.11/8.12.11) with ESMTP id i848NjaQ025685 for ; Sat, 4 Sep 2004 03:23:46 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.12.9p2/8.12.9) with SMTP id i848NjHE041907 for ; Sat, 4 Sep 2004 01:23:45 -0700 (PDT) (envelope-from rosennej@qsm.co.il) Received: from [212.235.7.2] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id stA0OyG2 authenticated by POP; Sat, 04 Sep 2004 01:23:43 -0700 (PDT) From: "Jony Rosenne" To: Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Date: Sat, 4 Sep 2004 11:23:40 +0300 Message-ID: <000001c49258$7ca91690$0401c80a@QSM4> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 In-Reply-To: <41395CB9.1080303@tiro.com> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 X-archive-position: 2610 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew Yes, this is one of the issues, whether the reader has a voice or only the typesetter/editor/author. Another is compatibility with systems that choose not to support these options - but there is a difference here: Holam Haser is optional in the sense that it is an established optional distinction for fine printing and an established practice to ignore it in everyday use. The Qamats Qatan distinction is definitely non standard and only used rarely and under special circumstances, such as books for people who don't know Hebrew too well, plus there are several cases where there is a disagreement whether a certain Qamats is Qatan or Gadol. From a typographical point of view, these problems don't matter, all you want is to reproduce a certain text with fidelity, but for interchange and other tasks they certainly do. Kind of like the difference between PDF and HTML. Mandating that the author chooses the presentation is parallel to upgrading HTML to PDF. The Unicode Standard does mention "exchange of text data" and "a consistent way" on page 1. It continues to discuss "interoperability". It does not emphasize typography. To sum up: QQ should be canonically equivalent to Qamats. HHFV should be HH and should be canonically equivalent to Holam. Jony > -----Original Message----- > From: hebrew-bounce@unicode.org > [mailto:hebrew-bounce@unicode.org] On Behalf Of John Hudson > Sent: Saturday, September 04, 2004 9:12 AM > Cc: hebrew@unicode.org > Subject: [hebrew] Re: Compatibility mappings for new Hebrew points > > > Jony Rosenne wrote: > > > The issue is how to make a useable optional distinction. > > I think the issue for some time now has been disagreement > over what 'optional' means. > > John Hudson > > -- > > Tiro Typeworks www.tiro.com > Vancouver, BC tiro@tiro.com > > Currently reading: > The Mass in slow motion, by Ronald Knox > Hebrew manuscripts of the Middle Ages, by Colette Sirat > > > From peterkirk@qaya.org Sat Sep 4 05:42:41 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 05:42:41 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i84Ageeg021675 for ; Sat, 4 Sep 2004 05:42:41 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C3XyV-00022T-4d; Sat, 04 Sep 2004 11:40:51 +0100 Message-ID: <41399C18.1070601@qaya.org> Date: Sat, 04 Sep 2004 11:42:32 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Jony Rosenne CC: hebrew@unicode.org Subject: [hebrew] Re: Compatibility mappings for new Hebrew points References: <000001c49258$7ca91690$0401c80a@QSM4> In-Reply-To: <000001c49258$7ca91690$0401c80a@QSM4> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2611 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 04/09/2004 09:23, Jony Rosenne wrote: >Yes, this is one of the issues, whether the reader has a voice or only the >typesetter/editor/author. > >Another is compatibility with systems that choose not to support these >options - but there is a difference here: Holam Haser is optional in the >sense that it is an established optional distinction for fine printing and >an established practice to ignore it in everyday use. The Qamats Qatan >distinction is definitely non standard and only used rarely and under >special circumstances, such as books for people who don't know Hebrew too >well, plus there are several cases where there is a disagreement whether a >certain Qamats is Qatan or Gadol. > >>From a typographical point of view, these problems don't matter, all you >want is to reproduce a certain text with fidelity, but for interchange and >other tasks they certainly do. Kind of like the difference between PDF and >HTML. Mandating that the author chooses the presentation is parallel to >upgrading HTML to PDF. > > The important point here is that there are existing texts (in the HHFV case, including the Bible, poetry and educational materials) in which the author has *already*, maybe centuries ago, *chosen* to specify a distinct Vav Haluma. And so to reproduce such texts with fidelity a separate HHFV or HH character is needed (and not one which is canonically equivalent to an existing character as in this absolutely no semantic distinction is possible). In such cases, there is no question of mandating the author to choose the presentation; the author has already chosen to do so. And the author expects that these distinctions will be presented to the readers, at least unless particular readers (and not the rendering system, in an ideal world) choose to neutralise the distinction. I guess that if particular readers never want to make the distinction they can choose to use a less fully featured rendering system which neutralises it, or even one which ignores all points according to the Israeli standard. But given that rendering systems supporting fully pointed Hebrew are now becoming available not only on Windows (e.g. recently available in pango, see http://news.gmane.org/gmane.linux.region.israel.ivrix.discuss, Dov Grobgeld's message of 22 August 2004), I would expect rendering systems which do not support pointing at all to gradually become something of the past, although they might offer to hide pointing as an option. >The Unicode Standard does mention "exchange of text data" and "a consistent >way" on page 1. It continues to discuss "interoperability". It does not >emphasize typography. > > Consistent exchange of text data which makes the Holam Male/Vav Haluma, Qamats Qatan/Qamats Gadol etc distinctions requires that these distinctions are made in the Unicode characters in a way which is preserved under canonical normalisation. >To sum up: QQ should be canonically equivalent to Qamats. HHFV should be HH >and should be canonically equivalent to Holam. > > Jony, this is *precisely* the same as saying that QQ and HHFV/HH should not be encoded as separate characters at all. If that is what you mean, say so, and ask the UTC to reverse its decisions to encode separate characters. But stop wasting everyone's time by proposing "canonically equivalent characters" which are no more than alternative codes for the same character. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Sat Sep 4 06:01:25 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 06:01:25 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i84B1P5b023347 for ; Sat, 4 Sep 2004 06:01:25 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C3YGg-0002Xn-IX; Sat, 04 Sep 2004 11:59:39 +0100 Message-ID: <4139A081.9030306@qaya.org> Date: Sat, 04 Sep 2004 12:01:21 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Jony Rosenne CC: hebrew@unicode.org Subject: [hebrew] Re: Compatibility mappings for new Hebrew points References: <000601c49243$5d51e9d0$0401c80a@QSM4> In-Reply-To: <000601c49243$5d51e9d0$0401c80a@QSM4> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2612 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 04/09/2004 06:52, Jony Rosenne wrote: > ... > >>At 01:22 +0300 2004-09-04, Jony Rosenne wrote: >> >> >> >>> > >People who *do* know the Hebrew script will not accept the HHFV >>> >>> >>>> >proposal. >>>> >>>> >>>> This one person did not accept it for a long time, because it seems so counter-intuitive to anyone who knows the script. But eventually I allowed myself to be persuaded that it causes significantly less backward compatibility problems than the alternatives on the table i.e. HH and HMD, and that the illogicality can if desired be hidden from end users by very simple keyboard intelligence. > ... > >>HHFV makes the distinction they make while leaving 99% of current >>data COMPLETELY UNTOUCHED. >> >> > >This number is irrelevant. The few current users who make the distinction in >non standard ways will be able to convert to any decision. > >The issue is how to make a useable optional distinction. > > > Indeed, and in a way which minimises backwards compatibility problems. But, Jony, you have not answered the following points which I made on this list on 21st August: > 6) There is a real danger that there will be a continuing situation in > which two different representations are used for Holam Haser when not > with Vav (or, with the HMD proposal, for the Holam Male dot). While > with each proposal there is a clear specification of which Unicode > character should be used in which case, there are likely to be > "improper" texts, i.e. ones which do not follow this specification. > With the HH and HMD proposals, such "improper" texts include existing > Unicode texts which have not been converted to the new representation, > legacy character set texts which are converted to Unicode with the > current mappings, and new texts which are created with current > practices and keyboard drivers etc. All of these are essentially > transitional issues, which should be resolved with time as long as > there is a commitment to update existing texts, mappings and > keyboarding practices. With the HHFV proposal, none of these types of > text will be "improper"; this is a significant advantage of HHFV. ... > 9) Legacy character set mappings will need to be changed for the HH or > HMD proposals, if Holam is to be represented as specified in texts > mapped to Unicode. The amended mappings will be complex or > context-dependent. No change in these mappings is required with HHFV, > as Vav Haluma is not supported by legacy character sets. This would > seem to be a significant practical advantage of HHFV. Canonical equivalence would solve these "improper" text and legacy mapping problems. But the cost of this is that the distinction between the two Holams is entirely neutralised, and the decision already made by the UTC to make the distinction in plain text is reversed in that there is defined to be no plain text distinction between canonical equivalents. Do you have any other answer to these two arguments that these are significant practical advantages of HHFV over HH and HMD? -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From rosennej@qsm.co.il Sat Sep 4 09:52:40 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 09:52:40 -0500 (CDT) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.12.11/8.12.11) with ESMTP id i84EqeoH002672 for ; Sat, 4 Sep 2004 09:52:40 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.12.9p2/8.12.9) with SMTP id i84EqcHE004153 for ; Sat, 4 Sep 2004 07:52:38 -0700 (PDT) (envelope-from rosennej@qsm.co.il) Received: from [212.235.97.63] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id s410K5E0 authenticated by POP; Sat, 04 Sep 2004 07:52:35 -0700 (PDT) From: "Jony Rosenne" To: Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Date: Sat, 4 Sep 2004 17:52:31 +0300 Message-ID: <000801c4928e$d066dcf0$0401c80a@QSM4> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 In-Reply-To: <4139A081.9030306@qaya.org> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id i84EqeoH002672 X-archive-position: 2613 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew > -----Original Message----- > From: Peter Kirk [mailto:peterkirk@qaya.org] > Sent: Saturday, September 04, 2004 2:01 PM > To: Jony Rosenne > Cc: hebrew@unicode.org > Subject: Re: [hebrew] Re: Compatibility mappings for new Hebrew points > > > On 04/09/2004 06:52, Jony Rosenne wrote: > > > ... > > > >>At 01:22 +0300 2004-09-04, Jony Rosenne wrote: > >> > >> > >> > >>> > >People who *do* know the Hebrew script will not accept the HHFV > >>> > >>> > >>>> >proposal. > >>>> > >>>> > >>>> > This one person did not accept it for a long time, because it > seems so > counter-intuitive to anyone who knows the script. But eventually I > allowed myself to be persuaded that it causes significantly less > backward compatibility problems than the alternatives on the > table i.e. > HH and HMD, and that the illogicality can if desired be > hidden from end > users by very simple keyboard intelligence. There is no keyboard intelligence in Hebrew keyboard layouts. Backwards compatibility is an artificial issue, because the small number of relevant cases can easily convert to any decision. > > > ... > > > >>HHFV makes the distinction they make while leaving 99% of current > >>data COMPLETELY UNTOUCHED. > >> > >> > > > >This number is irrelevant. The few current users who make the > >distinction in non standard ways will be able to convert to any > >decision. > > > >The issue is how to make a useable optional distinction. > > > > > > > Indeed, and in a way which minimises backwards compatibility > problems. > But, Jony, you have not answered the following points which I made on > this list on 21st August: > > > 6) There is a real danger that there will be a continuing > situation in > > which two different representations are used for Holam > Haser when not > > with Vav (or, with the HMD proposal, for the Holam Male dot). While > > with each proposal there is a clear specification of which Unicode > > character should be used in which case, there are likely to be > > "improper" texts, i.e. ones which do not follow this specification. > > With the HH and HMD proposals, such "improper" texts > include existing > > Unicode texts which have not been converted to the new > representation, > > legacy character set texts which are converted to Unicode with the > > current mappings, and new texts which are created with current > > practices and keyboard drivers etc. All of these are essentially > > transitional issues, which should be resolved with time as long as > > there is a commitment to update existing texts, mappings and > > keyboarding practices. With the HHFV proposal, none of > these types of > > text will be "improper"; this is a significant advantage of HHFV. Whatever it is called, these practical considerations mean that when it is used on any other letter rather than Vav, it will be treated as Holam Haser, whatever the standard says. So why encumber the standard, font designers and users with a meaningless and unenforceable semantic that corresponds to nothing in the Hebrew script? People will be able to understand that there is a separate code for Holam Haser should they wish to make the distinction, but how can we explain that we have a different Holam Haser for Vav and a different Holam Haser for Bet, and that if one makes a spelling mistake and edits the Bet to a Vav or vice versa one has to edit the Holam Haser too? > > ... > > > 9) Legacy character set mappings will need to be changed > for the HH or > > HMD proposals, if Holam is to be represented as specified in texts > > mapped to Unicode. The amended mappings will be complex or > > context-dependent. No change in these mappings is required > with HHFV, > > as Vav Haluma is not supported by legacy character sets. This would > > seem to be a significant practical advantage of HHFV. > > > Canonical equivalence would solve these "improper" text and legacy > mapping problems. But the cost of this is that the > distinction between > the two Holams is entirely neutralised, and the decision > already made by > the UTC to make the distinction in plain text is reversed in > that there > is defined to be no plain text distinction between canonical > equivalents. The Holam Haser distinction is not entirely neutralized, it is affected by canonical equivalence only when it is normalized, and this should happen only in relation to processing. If you want to keep the distinction, don't normalize. > > Do you have any other answer to these two arguments that these are > significant practical advantages of HHFV over HH and HMD? I am not proposing HMD. I suggested to change the name of HHFV to HH to correspond better with reality. Essentially this leaves the proposal unchanged, except for the point discussed above. Jony > > -- > Peter Kirk > peter@qaya.org (personal) > peterkirk@qaya.org (work) > http://www.qaya.org/ > > > From rosennej@qsm.co.il Sat Sep 4 10:00:04 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 10:00:04 -0500 (CDT) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.12.11/8.12.11) with ESMTP id i84F03hO003128 for ; Sat, 4 Sep 2004 10:00:04 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.12.9p2/8.12.9) with SMTP id i84F02HE005356 for ; Sat, 4 Sep 2004 08:00:02 -0700 (PDT) (envelope-from rosennej@qsm.co.il) Received: from [212.235.97.63] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id HO10jOE0 authenticated by POP; Sat, 04 Sep 2004 08:00:00 -0700 (PDT) From: "Jony Rosenne" To: Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Date: Sat, 4 Sep 2004 17:59:55 +0300 Message-ID: <000d01c4928f$d9088790$0401c80a@QSM4> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 In-Reply-To: <41399C18.1070601@qaya.org> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id i84F03hO003128 X-archive-position: 2614 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew > -----Original Message----- > From: Peter Kirk [mailto:peterkirk@qaya.org] > Sent: Saturday, September 04, 2004 1:43 PM > To: Jony Rosenne > Cc: hebrew@unicode.org > Subject: Re: [hebrew] Re: Compatibility mappings for new Hebrew points > > > On 04/09/2004 09:23, Jony Rosenne wrote: > > >Yes, this is one of the issues, whether the reader has a > voice or only > >the typesetter/editor/author. > > > >Another is compatibility with systems that choose not to > support these > >options - but there is a difference here: Holam Haser is optional in > >the sense that it is an established optional distinction for fine > >printing and an established practice to ignore it in > everyday use. The > >Qamats Qatan distinction is definitely non standard and only used > >rarely and under special circumstances, such as books for people who > >don't know Hebrew too well, plus there are several cases > where there is > >a disagreement whether a certain Qamats is Qatan or Gadol. > > > >>From a typographical point of view, these problems don't > matter, all > >>you > >want is to reproduce a certain text with fidelity, but for > interchange > >and other tasks they certainly do. Kind of like the > difference between > >PDF and HTML. Mandating that the author chooses the presentation is > >parallel to upgrading HTML to PDF. > > > > > > The important point here is that there are existing texts (in > the HHFV > case, including the Bible, poetry and educational materials) in which > the author has *already*, maybe centuries ago, *chosen* to specify a > distinct Vav Haluma. And so to reproduce such texts with fidelity a > separate HHFV or HH character is needed (and not one which is > canonically equivalent to an existing character as in this > absolutely no > semantic distinction is possible). In such cases, there is no > question > of mandating the author to choose the presentation; the author has > already chosen to do so. And the author expects that these > distinctions > will be presented to the readers, at least unless particular readers > (and not the rendering system, in an ideal world) choose to > neutralise > the distinction. The user selects the rendering system that suits him. If you do not want this, use PDF, not HTML. > > I guess that if particular readers never want to make the distinction > they can choose to use a less fully featured rendering system which > neutralises it, or even one which ignores all points according to the > Israeli standard. But given that rendering systems supporting fully > pointed Hebrew are now becoming available not only on Windows (e.g. > recently available in pango, see > http://news.gmane.org/gmane.linux.region.israel.ivrix.discuss, Dov > Grobgeld's message of 22 August 2004), I would expect > rendering systems > which do not support pointing at all to gradually become something of > the past, although they might offer to hide pointing as an option. I don't share this expectation. Most Hebrew users do not need or want points, they just get in the way. I expect future systems that have an option to show points, hiding being the default. > > >The Unicode Standard does mention "exchange of text data" and "a > >consistent way" on page 1. It continues to discuss > "interoperability". > >It does not emphasize typography. > > > > > > Consistent exchange of text data which makes the Holam > Male/Vav Haluma, > Qamats Qatan/Qamats Gadol etc distinctions requires that these > distinctions are made in the Unicode characters in a way which is > preserved under canonical normalisation. > > >To sum up: QQ should be canonically equivalent to Qamats. > HHFV should > >be HH and should be canonically equivalent to Holam. > > > > > > Jony, this is *precisely* the same as saying that QQ and > HHFV/HH should > not be encoded as separate characters at all. If that is what > you mean, > say so, and ask the UTC to reverse its decisions to encode separate > characters. But stop wasting everyone's time by proposing > "canonically > equivalent characters" which are no more than alternative > codes for the > same character. QQ is simply a glyph variant, an alternate glyph for the same character. Other glyph variants have canonical equivalence, why should QQ be different? Jony > > -- > Peter Kirk > peter@qaya.org (personal) > peterkirk@qaya.org (work) > http://www.qaya.org/ > > > From everson@evertype.com Sat Sep 4 10:07:25 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 10:07:25 -0500 (CDT) Received: from ni-mail2.dna.utvinternet.net (ni-mail2.dna.utvinternet.net [194.46.8.26]) by unicode.org (8.12.11/8.12.11) with ESMTP id i84F7NVh003573 for ; Sat, 4 Sep 2004 10:07:24 -0500 Received: from [10.0.1.3] (unverified [194.46.85.47]) by ni-mail2.dna.utvinternet.net (Vircom SMTPRS 3.2.315.0) with ESMTP id for ; Sat, 4 Sep 2004 16:07:22 +0100 Mime-Version: 1.0 X-Sender: evr001@mail.dna.ie Message-Id: In-Reply-To: <000d01c4928f$d9088790$0401c80a@QSM4> References: <000d01c4928f$d9088790$0401c80a@QSM4> Date: Sat, 4 Sep 2004 16:07:21 +0100 To: hebrew@unicode.org From: Michael Everson Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-archive-position: 2615 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 17:59 +0300 2004-09-04, Jony Rosenne wrote: >I don't share this expectation. Most Hebrew users do not need or want >points, they just get in the way. So what? We aren't talking about people who DON'T use points. >QQ is simply a glyph variant, an alternate glyph for the same character. No, it is not. This is an absurd suggestion. HEBREW POINT QAMATS QATAN *derived* from HEBREW POINT QAMATS, but it is a different character. >Other glyph variants have canonical equivalence, why should QQ be different? It is not a glyph variant. What you are saying is nonsense. -- Michael Everson * * Everson Typography * * http://www.evertype.com From peterkirk@qaya.org Sat Sep 4 11:18:11 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 11:18:11 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i84GIBqH009374 for ; Sat, 4 Sep 2004 11:18:11 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C3dDD-00033o-PN; Sat, 04 Sep 2004 17:16:24 +0100 Message-ID: <4139EABF.5070203@qaya.org> Date: Sat, 04 Sep 2004 17:18:07 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Jony Rosenne CC: hebrew@unicode.org Subject: [hebrew] Re: Compatibility mappings for new Hebrew points References: <000d01c4928f$d9088790$0401c80a@QSM4> In-Reply-To: <000d01c4928f$d9088790$0401c80a@QSM4> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2616 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 04/09/2004 15:59, Jony Rosenne wrote: > ... > >QQ is simply a glyph variant, an alternate glyph for the same character. >Other glyph variants have canonical equivalence, why should QQ be different? > > > Michael has responded to the first point here. Whether you like it or not, the UTC has decided that QQ is a distinct character, but you do still have the chance to reverse this decision during later parts of the standardisation process. I will respond to the second point: glyph *variants* do not have canonical equivalents, but only glyphs which are supposed to be *identical*, and semantically identical. Glyph variants, if recognised by Unicode at all, are either variation sequences (which are *not* canonically equivalent to their base character alone) or separate characters sometimes with a *compatibility* equivalence. This is where QQ fits, if it is encoded at all. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From peterkirk@qaya.org Sat Sep 4 11:59:14 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 11:59:14 -0500 (CDT) Received: from pan.hu-pan.com (hu-pan.com [67.15.6.3]) by unicode.org (8.12.11/8.12.11) with ESMTP id i84GxETR012570 for ; Sat, 4 Sep 2004 11:59:14 -0500 Received: from [213.162.124.237] (helo=[127.0.0.1]) by pan.hu-pan.com with esmtpa (Exim 4.42) id 1C3dqy-00041M-IQ; Sat, 04 Sep 2004 17:57:28 +0100 Message-ID: <4139F45F.2000809@qaya.org> Date: Sat, 04 Sep 2004 17:59:11 +0100 From: Peter Kirk User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 X-Accept-Language: en-gb, en, en-us, az, ru, tr, he, el, fr, de MIME-Version: 1.0 To: Jony Rosenne CC: hebrew@unicode.org Subject: [hebrew] Re: Compatibility mappings for new Hebrew points References: <000801c4928e$d066dcf0$0401c80a@QSM4> In-Reply-To: <000801c4928e$d066dcf0$0401c80a@QSM4> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-cPanel-MailScanner-Information: Please contact the ISP for more information X-cPanel-MailScanner: Found to be clean X-cPanel-MailScanner-SpamCheck: X-MailScanner-From: peterkirk@qaya.org X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - pan.hu-pan.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - qaya.org X-Source: X-Source-Args: X-Source-Dir: X-archive-position: 2617 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew On 04/09/2004 15:52, Jony Rosenne wrote: > ... > >>This one person did not accept it for a long time, because it >>seems so >>counter-intuitive to anyone who knows the script. But eventually I >>allowed myself to be persuaded that it causes significantly less >>backward compatibility problems than the alternatives on the >>table i.e. >>HH and HMD, and that the illogicality can if desired be >>hidden from end >>users by very simple keyboard intelligence. >> >> > >There is no keyboard intelligence in Hebrew keyboard layouts. > > OK, but it can easily be added. >Backwards compatibility is an artificial issue, because the small number of >relevant cases can easily convert to any decision. > > > The number of relevant cases is not small, but consists of all existing pointed Hebrew texts. Google just found me 14,500 texts with the pointed Hebrew word לֹא LO' (presumably found in nearly every pointed Hebrew text) (by the way, compared with 3,840,000 for the unpointed version of this word), and that is besides many texts which are not on the Internet. If the HH option is chosen, the correct strategy for conversion of all these legacy texts and pre-conversion Unicode texts has to be convert all HOLAMs which do not follow VAV to HH. That is not a trivial matter because it requires context-dependent conversion mapping. > ... > > >Whatever it is called, these practical considerations mean that when it is >used on any other letter rather than Vav, it will be treated as Holam Haser, >whatever the standard says. So why encumber the standard, font designers and >users with a meaningless and unenforceable semantic that corresponds to >nothing in the Hebrew script? People will be able to understand that there >is a separate code for Holam Haser should they wish to make the distinction, ... > > Here we come upon my fundamental objection to your approach. You seem to consider that it is acceptable if some people spell very common words like לֹא LO' with HOLAM and some people spell it with HH. That would be acceptable if the two characters were canonically equivalent, but as I have already demonstrated this is impossible. It is therefore necessary to define clearly that one of these spellings is correct and the other is incorrect. Otherwise the Hebrew language descends into chaos. >... but how can we explain that we have a different Holam Haser for Vav and a >different Holam Haser for Bet, ... > Well, you don't have to if you use an intelligent keyboard. On the other hand, as you repeatedly insist, this is an issue for only a very few people who actually want to write Vav Haluma distinctively, and these people (at least if they are Israelis rather than western biblical scholars) will realise that they are trying to write something rather unusual and so be prepared to use a separate code. So the problem is not as serious as it seems at first. >... and that if one makes a spelling mistake and >edits the Bet to a Vav or vice versa one has to edit the Holam Haser too? > > > I see the problem here. Well, TUS does recommend that editing software works with complete grapheme clusters. But in practice, if this isn't done, what will happen is that if Bet is edited to Vav but still has HOLAM, a Holam Male will appear, which should be visually distinct from Vav Haluma and so easily corrected; and if Vav is edited to Bet but still has HHFV, the rendering engine will be able to visually indicate the illegal sequence, as currently happens with sin and shin dots not following shin. There will be far more confusion when editing texts if no one knows whether a Holam Haser following a Bet is HOLAM or HH because both are legal. > > >> ... >> >>Canonical equivalence would solve these "improper" text and legacy >>mapping problems. But the cost of this is that the >>distinction between >>the two Holams is entirely neutralised, and the decision >>already made by >>the UTC to make the distinction in plain text is reversed in >>that there >>is defined to be no plain text distinction between canonical >>equivalents. >> >> > >The Holam Haser distinction is not entirely neutralized, it is affected by >canonical equivalence only when it is normalized, and this should happen >only in relation to processing. If you want to keep the distinction, don't >normalize. > > > Normalisation is not an issue for processing only. The standard clearly permits canonical equivalents to be folded for transmission and rendering as well as for processing (conformance clauses C9 and C10, pp. 59-60), and specifically recommends this for rendering (section 5.13, p.127). It is not permitted to assume that text will not be normalised when one doesn't want it to be. Jony, whether you like it or not, your suggestion of a canonical equivalence is equivalent (indeed, canonically equivalent) to suggesting that there should be no new character at all. You earlier agreed that a plain text distinction is required. But if you call for a canonical equivalence, you are saying that there should not be a plain text distinction, but only an alternative code for the same object. >>Do you have any other answer to these two arguments that these are >>significant practical advantages of HHFV over HH and HMD? >> >> > >I am not proposing HMD. I suggested to change the name of HHFV to HH to >correspond better with reality. Essentially this leaves the proposal >unchanged, except for the point discussed above. > > > I take it that the change (other than of name) which you suggest is to allow the new character to be used as an alternative for Holam Haser with base characters other than Vav. Well, to summarise, my objection is to the existence of an alternative legal spelling which is not canonically equivalent. -- Peter Kirk peter@qaya.org (personal) peterkirk@qaya.org (work) http://www.qaya.org/ From rosennej@qsm.co.il Sat Sep 4 12:01:32 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 12:01:32 -0500 (CDT) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.12.11/8.12.11) with ESMTP id i84H1WRv012724 for ; Sat, 4 Sep 2004 12:01:32 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.12.9p2/8.12.9) with SMTP id i84H1UHE017810 for ; Sat, 4 Sep 2004 10:01:30 -0700 (PDT) (envelope-from rosennej@qsm.co.il) Received: from [212.235.31.148] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id Dd40jhA2 authenticated by POP; Sat, 04 Sep 2004 10:01:27 -0700 (PDT) From: "Jony Rosenne" To: Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Date: Sat, 4 Sep 2004 20:01:23 +0300 Message-ID: <000001c492a0$d0e381d0$0401c80a@QSM4> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 In-Reply-To: Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 X-archive-position: 2618 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew > -----Original Message----- > From: hebrew-bounce@unicode.org > [mailto:hebrew-bounce@unicode.org] On Behalf Of Michael Everson > Sent: Saturday, September 04, 2004 6:07 PM > To: hebrew@unicode.org > Subject: [hebrew] Re: Compatibility mappings for new Hebrew points > > > At 17:59 +0300 2004-09-04, Jony Rosenne wrote: > > >I don't share this expectation. Most Hebrew users do not > need or want > >points, they just get in the way. > > So what? We aren't talking about people who DON'T use points. The question was raised as to whether it is proper for the rendering system to hide points. > > >QQ is simply a glyph variant, an alternate glyph for the same > >character. > > No, it is not. This is an absurd suggestion. HEBREW POINT QAMATS > QATAN *derived* from HEBREW POINT QAMATS, but it is a different > character. > > >Other glyph variants have canonical equivalence, why should QQ be > >different? > > It is not a glyph variant. What you are saying is nonsense. Whatever. What about interchange? Jony > -- > Michael Everson * * Everson Typography * * http://www.evertype.com > > > From rosennej@qsm.co.il Sat Sep 4 12:48:26 2004 Received: with ECARTIS (v1.0.0; list hebrew); Sat, 04 Sep 2004 12:48:26 -0500 (CDT) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.12.11/8.12.11) with ESMTP id i84HmPwm020411 for ; Sat, 4 Sep 2004 12:48:26 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.12.9p2/8.12.9) with SMTP id i84HmNHE022278 for ; Sat, 4 Sep 2004 10:48:23 -0700 (PDT) (envelope-from rosennej@qsm.co.il) Received: from [212.235.31.148] (via account qsm.co.il) by mx-out.daemonmail.net with ESMTP id In50orB2 authenticated by POP; Sat, 04 Sep 2004 10:48:20 -0700 (PDT) From: "Jony Rosenne" To: Subject: [hebrew] Re: Compatibility mappings for new Hebrew points Date: Sat, 4 Sep 2004 20:48:15 +0300 Message-ID: <000201c492a7$5cc53bc0$0401c80a@QSM4> MIME-Version: 1.0 Content-Type: text/plain; charset="windows-1255" X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6626 In-Reply-To: <4139F45F.2000809@qaya.org> Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id i84HmPwm020411 X-archive-position: 2619 X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew > -----Original Message----- > From: hebrew-bounce@unicode.org > [mailto:hebrew-bounce@unicode.org] On Behalf Of Peter Kirk > Sent: Saturday, September 04, 2004 7:59 PM > To: Jony Rosenne > Cc: hebrew@unicode.org > Subject: [hebrew] Re: Compatibility mappings for new Hebrew points > > > On 04/09/2004 15:52, Jony Rosenne wrote: > > > ... > > > >>This one person did not accept it for a long time, because it > >>seems so > >>counter-intuitive to anyone who knows the script. But eventually I > >>allowed myself to be persuaded that it causes significantly less > >>backward compatibility problems than the alternatives on the > >>table i.e. > >>HH and HMD, and that the illogicality can if desired be > >>hidden from end > >>users by very simple keyboard intelligence. > >> > >> > > > >There is no keyboard intelligence in Hebrew keyboard layouts. > > > > > > OK, but it can easily be added. Easy said. > > >Backwards compatibility is an artificial issue, because the small > >number of relevant cases can easily convert to any decision. > > > > > > > The number of relevant cases is not small, but consists of > all existing > pointed Hebrew texts. Google just found me 14,500 texts with > the pointed > Hebrew word LO' (presumably found in nearly every pointed Hebrew > text) (by the way, compared with 3,840,000 for the unpointed > version of > this word), and that is besides many texts which are not on the > Internet. If the HH option is chosen, the correct strategy for > conversion of all these legacy texts and pre-conversion Unicode texts > has to be convert all HOLAMs which do not follow VAV to HH. > That is not > a trivial matter because it requires context-dependent > conversion mapping. Why do you think these people want to make the distinction? How many of them are aware of it? The point is that they will mostly not change their ways, even for Vav Haluma. We need some sort of a bridge between the two alternative spellings of Vav Haluma. > > > ... > > > > > >Whatever it is called, these practical considerations mean > that when it > >is used on any other letter rather than Vav, it will be treated as > >Holam Haser, whatever the standard says. So why encumber the > standard, > >font designers and users with a meaningless and > unenforceable semantic > >that corresponds to nothing in the Hebrew script? People > will be able > >to understand that there is a separate code for Holam Haser > should they > >wish to make the distinction, ... > > > > > > Here we come upon my fundamental objection to your approach. > You seem to > consider that it is acceptable if some people spell very common words > like LO' with HOLAM and some people spell it with HH. > That would be > acceptable if the two characters were canonically equivalent, > but as I > have already demonstrated this is impossible. It is therefore > necessary > to define clearly that one of these spellings is correct and > the other > is incorrect. Otherwise the Hebrew language descends into chaos. This is exactly the point, with the additional observation that it is not possible to dictate to people when to use HHFV and when to use Holam. They will use them both at random. So for interchange we must expect either, and this is what normalization is for, whether canonical or compatibility. > > >... but how can we explain that we have a different Holam > Haser for Vav > >and a different Holam Haser for Bet, ... > > > > Well, you don't have to if you use an intelligent keyboard. This is a pie in the sky. Hebrew keyboards are straight keystroke to character things. > On the other > hand, as you repeatedly insist, this is an issue for only a very few > people who actually want to write Vav Haluma distinctively, and these > people (at least if they are Israelis rather than western biblical > scholars) will realise that they are trying to write something rather > unusual and so be prepared to use a separate code. So the > problem is not > as serious as it seems at first. > > >... and that if one makes a spelling mistake and > >edits the Bet to a Vav or vice versa one has to edit the Holam Haser > >too? > > > > > > > I see the problem here. Well, TUS does recommend that editing > software > works with complete grapheme clusters. These is not practical for Hebrew. > But in practice, if this isn't > done, what will happen is that if Bet is edited to Vav but still has > HOLAM, a Holam Male will appear, which should be visually > distinct from > Vav Haluma and so easily corrected; and if Vav is edited to Bet but > still has HHFV, the rendering engine will be able to visually > indicate > the illegal sequence, as currently happens with sin and shin dots not > following shin. > > There will be