From everson@evertype.com Wed Jul 11 08:10:17 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 08:12:01 -0500 (CDT) Received: from white.dnsireland.com (white.dnsireland.com [67.15.182.33]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BDAH7t007979 for ; Wed, 11 Jul 2007 08:10:17 -0500 Received: from [88.81.100.235] (helo=[192.168.1.134]) by white.dnsireland.com with esmtpa (Exim 4.66) (envelope-from ) id 1I8bxL-00072T-PD for hebrew@unicode.org; Wed, 11 Jul 2007 14:10:12 +0100 Mime-Version: 1.0 Message-Id: Date: Wed, 11 Jul 2007 14:09:35 +0100 To: Hebrew Discussion From: Michael Everson Subject: [hebrew] Draft of Samaritan proposal Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - white.dnsireland.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - evertype.com X-archive-position: 3008 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew Please see http://www.evertype.com/standards/iso10646/pdf/n3xxx-samaritan.pdf and comment. As always punctuation is the complex issue. (Sigh.) Thanks. -- Michael Everson * http://www.evertype.com From cowan@ccil.org Wed Jul 11 10:23:43 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 10:23:43 -0500 (CDT) Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BFNhos002123 for ; Wed, 11 Jul 2007 10:23:43 -0500 Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from ) id 1I8e2P-0002Ou-FV; Wed, 11 Jul 2007 11:23:33 -0400 Date: Wed, 11 Jul 2007 11:23:33 -0400 To: Michael Everson Cc: hebrew@unicode.org Subject: [hebrew] Re: Draft of Samaritan proposal Message-ID: <20070711152333.GF28262@mercury.ccil.org> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.13 (2006-08-11) From: John Cowan X-archive-position: 3009 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@ccil.org Precedence: bulk X-list: hebrew Michael Everson scripsit: > Please see > http://www.evertype.com/standards/iso10646/pdf/n3xxx-samaritan.pdf > and comment. > > As always punctuation is the complex issue. (Sigh.) First, an editorial note. Most of the document speaks (rightly) of VOWEL SIGNs, but in the chart on p. 16 we hear of POINTs. VOWEL SIGNs they should be. Since this is a contemporary-use script, I'm not going to argue that it should be unified with any other 22-character West Semitic abjad. And I hope no one else does either, or Your Humble Moderator is going to have to get busy. But what does concern me is the double encoding of vowels. This is not a situation like Indic, where the initial vowels are nothing like the vowel marks: the initial vowels are glyphically identical with the vowel marks, but encoded separately because Unicode combining marks must have a base. Calling the current plan Plan A, I propose two alternative plans: Plan B1: Use combining marks only, and add a SAMARITAN ZERO-WIDTH CONSONANT as a new base character for use before an initial vowel. This would lengthen texts slightly, but would be a regular and familiar situation. Plan B2: Use modifier letters only, relying on font kerning to move the vowel letters slightly to the right when preceded by a consonant. This solution is less artificial, but more unusual; on the other hand, it might be more legible in environments like Windows, where initially there would be no support for Samaritan combining characters in Uniscribe. -- John Cowan http://ccil.org/~cowan cowan@ccil.org The Penguin shall hunt and devour all that is crufty, gnarly and bogacious; all code which wriggles like spaghetti, or is infested with blighting creatures, or is bound by grave and perilous Licences shall it capture. And in capturing shall it replicate, and in replicating shall it document, and in documentation shall it bring freedom, serenity and most cool froodiness to the earth and all who code therein. --Gospel of Tux From cowan@ccil.org Wed Jul 11 11:22:57 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 11:33:01 -0500 (CDT) Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BGMufQ019729 for ; Wed, 11 Jul 2007 11:22:57 -0500 Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from ) id 1I8exn-0006WE-Ks for hebrew-repost@unicode.org; Wed, 11 Jul 2007 12:22:51 -0400 Date: Wed, 11 Jul 2007 12:22:51 -0400 To: hebrew-repost@unicode.org Subject: [hebrew] [peter@qaya.org: Re: Re: Draft of Samaritan proposal] Message-ID: <20070711162251.GH28262@mercury.ccil.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.5.13 (2006-08-11) From: John Cowan X-archive-position: 3010 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@ccil.org Precedence: bulk X-list: hebrew ----- Forwarded message from Peter Kirk ----- Date: Wed, 11 Jul 2007 16:55:33 +0100 From: Peter Kirk To: John Cowan CC: Michael Everson , hebrew@unicode.org Subject: Re: [hebrew] Re: Draft of Samaritan proposal On 11/07/2007 16:23, John Cowan wrote: >... >Since this is a contemporary-use script, I'm not going to argue that >it should be unified with any other 22-character West Semitic abjad. >And I hope no one else does either, or Your Humble Moderator is going >to have to get busy. > Don't worry, John, I won't argue this, although I am still lurking, and whatever I may think about consistency with what has been done with other scripts. I am glad that an apparently good proposal for Samaritan is being put forward. One small point: there is a reference to the modern Samaritans living in Israel, but many of them (not all) in fact live in the occupied territories of the West Bank, so you need to use the politically correct terminology here. -- Peter Kirk E-mail: peter@qaya.org Blog: http://www.qaya.org/blog/ Website: http://www.qaya.org/ ----- End forwarded message ----- -- John Cowan cowan@ccil.org http://ccil.org/~cowan If a traveler were informed that such a man [as Lord John Russell] was leader of the House of Commons, he may well begin to comprehend how the Egyptians worshiped an insect. --Benjamin Disraeli From john@tiro.ca Wed Jul 11 11:27:52 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 11:33:21 -0500 (CDT) Received: from pd3mo3so.prod.shaw.ca (shawidc-mo1.cg.shawcable.net [24.71.223.10]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BGRqW1021515 for ; Wed, 11 Jul 2007 11:27:52 -0500 Received: from pd2mr6so.prod.shaw.ca (pd2mr6so-qfe3.prod.shaw.ca [10.0.141.9]) by l-daemon (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0JL0002VMWEF8Q80@l-daemon> for hebrew@unicode.org; Wed, 11 Jul 2007 10:27:51 -0600 (MDT) Received: from pn2ml10so.prod.shaw.ca ([10.0.121.80]) by pd2mr6so.prod.shaw.ca (Sun Java System Messaging Server 6.2-7.05 (built Sep 5 2006)) with ESMTP id <0JL000CDRWCKMMT1@pd2mr6so.prod.shaw.ca> for hebrew@unicode.org; Wed, 11 Jul 2007 10:26:45 -0600 (MDT) Received: from [127.0.0.1] ([70.66.9.120]) by l-daemon (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0JL0001S6WCI1500@l-daemon> for hebrew@unicode.org; Wed, 11 Jul 2007 10:26:43 -0600 (MDT) Date: Wed, 11 Jul 2007 09:26:37 -0700 From: John Hudson Subject: [hebrew] Re: Draft of Samaritan proposal In-reply-to: <20070711152333.GF28262@mercury.ccil.org> To: John Cowan Cc: Michael Everson , hebrew@unicode.org Message-id: <469504BD.908@tiro.ca> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: 7bit References: <20070711152333.GF28262@mercury.ccil.org> User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) X-archive-position: 3011 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: john@tiro.ca Precedence: bulk X-list: hebrew John Cowan wrote: > Plan B2: Use modifier letters only, relying on font kerning to move > the vowel letters slightly to the right when preceded by a consonant. > This solution is less artificial, but more unusual; on the other hand, > it might be more legible in environments like Windows, where initially > there would be no support for Samaritan combining characters in Uniscribe. This would only work if the 'modifier letters' always needed to shift exactly the same distance to the right to be optimally placed over any preceding consonant, which I doubt is the case. If the distance varies, the kerning will cause either collisions or gaps for the consonant glyphs. One of the main reasons why combining marks are zero width is so that they can be positioned on bases using anchor attachments independently of the spacing and kerning of those bases. John Hudson -- Tiro Typeworks www.tiro.com Gulf Islands, BC tiro@tiro.com We say our understanding measures how things are, and likewise our perception, since that is how we find our way around, but in fact these do not measure. They are measured. -- Aristotle, Metaphysics From verdy_p@wanadoo.fr Wed Jul 11 10:54:36 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 11:34:06 -0500 (CDT) Received: from smtp24.orange.fr (smtp24.orange.fr [193.252.22.25]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BFsZsV010789 for ; Wed, 11 Jul 2007 10:54:36 -0500 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2432.orange.fr (SMTP Server) with ESMTP id 7806F1C00092 for ; Wed, 11 Jul 2007 17:54:30 +0200 (CEST) Received: from HARNON (APoitiers-156-1-94-116.w86-221.abo.wanadoo.fr [86.221.101.116]) by mwinf2432.orange.fr (SMTP Server) with ESMTP id 1FE3B1C00088; Wed, 11 Jul 2007 17:54:30 +0200 (CEST) X-ME-UUID: 20070711155430130.1FE3B1C00088@mwinf2432.orange.fr Reply-To: From: "Philippe Verdy" To: "'John Cowan'" , "'Michael Everson'" Cc: References: <20070711152333.GF28262@mercury.ccil.org> Subject: [hebrew] Re: Draft of Samaritan proposal Date: Wed, 11 Jul 2007 17:54:24 +0200 Organization: Ordinateur Personnel Message-ID: <01d501c7c3d3$c0df3d80$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <20070711152333.GF28262@mercury.ccil.org> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfD0WTLrbChQpknT7O04miSyixPeQAAffpQ X-archive-position: 3012 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew John Cowan wrote: > Plan B2: Use modifier letters only, relying on font kerning to move > the vowel letters slightly to the right when preceded by a consonant. > This solution is less artificial, but more unusual; on the other hand, > it might be more legible in environments like Windows, where initially > there would be no support for Samaritan combining characters in Uniscribe. How will Plan B2 work magically in Windows without updating Uniscribe? Isn't Samaritan a right-to-left script that will require Uniscribe support in every case, otherwise the Bidi algorithm (including mirroring issues) won't work correctly and the directionality will be uniformly left-to-right? From cowan@ccil.org Wed Jul 11 13:56:04 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 13:56:04 -0500 (CDT) Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BIu3qC024445 for ; Wed, 11 Jul 2007 13:56:04 -0500 Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from ) id 1I8hLp-00007v-V5; Wed, 11 Jul 2007 14:55:50 -0400 Date: Wed, 11 Jul 2007 14:55:49 -0400 To: Philippe Verdy Cc: "'John Cowan'" , "'Michael Everson'" , hebrew@unicode.org Subject: [hebrew] Re: Draft of Samaritan proposal Message-ID: <20070711185549.GA24331@mercury.ccil.org> References: <20070711152333.GF28262@mercury.ccil.org> <01d501c7c3d3$c0df3d80$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <01d501c7c3d3$c0df3d80$0a01a8c0@rodage.dyndns.org> User-Agent: Mutt/1.5.13 (2006-08-11) From: John Cowan X-archive-position: 3013 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@ccil.org Precedence: bulk X-list: hebrew Philippe Verdy scripsit: > Isn't Samaritan a right-to-left script that will require Uniscribe > support in every case, otherwise the Bidi algorithm (including mirroring > issues) won't work correctly and the directionality will be uniformly > left-to-right? Samaritan is indeed RTL, but it is in the reserved RTL range. See http://unicode.org/Public/UNIDATA/extracted/DerivedBidiClass.txt which says: # The unassigned characters that default to R are: # Hebrew, Cypriot_Syllabary, Kharoshthi, and the ranges \u07C0-\u08FF # \uFB1D-\uFB4F \U00010840-\U000109FF \U00010A60-\U00010FFF I don't know if current versions of Uniscribe actually take advantage of this. -- A few times, I did some exuberant stomping about, John Cowan like a hippo auditioning for Riverdance, though cowan@ccil.org I stopped when I thought I heard something at http://ccil.org/~cowan the far side of the room falling over in rhythm with my feet. -- Joseph Zitt From everson@evertype.com Wed Jul 11 11:56:45 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 13:56:40 -0500 (CDT) Received: from white.dnsireland.com (white.dnsireland.com [67.15.182.33]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BGujBD031498 for ; Wed, 11 Jul 2007 11:56:45 -0500 Received: from murrisk2.westnet.ie ([88.81.100.235] helo=[192.168.1.134]) by white.dnsireland.com with esmtpa (Exim 4.66) (envelope-from ) id 1I8fUV-0007YX-7d for hebrew@unicode.org; Wed, 11 Jul 2007 17:56:40 +0100 Mime-Version: 1.0 Message-Id: In-Reply-To: <4694FD75.6070002@qaya.org> References: <20070711152333.GF28262@mercury.ccil.org> <4694FD75.6070002@qaya.org> Date: Wed, 11 Jul 2007 17:55:30 +0100 To: hebrew@unicode.org From: Michael Everson Subject: [hebrew] Re: Draft of Samaritan proposal Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - white.dnsireland.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - evertype.com X-archive-position: 3014 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 16:55 +0100 2007-07-11, Peter Kirk wrote: >One small point: there is a reference to the modern Samaritans >living in Israel, but many of them (not all) in fact live in the >occupied territories of the West Bank, so you need to use the >politically correct terminology here. Propose a sentence or the text will stet. -- Michael Everson * http://www.evertype.com From dean.snyder@jhu.edu Wed Jul 11 12:21:21 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 13:59:05 -0500 (CDT) Received: from ipex1.johnshopkins.edu (ipex1.johnshopkins.edu [162.129.8.141]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BHLKuq007087 for ; Wed, 11 Jul 2007 12:21:21 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Aq4HACeulEZHOsDxQWdsb2JhbAANihCFEgEBAT0 X-IronPort-AV: E=Sophos;i="4.16,527,1175486400"; d="scan'208"; Received: from c-71-58-192-241.hsd1.pa.comcast.net (HELO [192.168.1.103]) ([71.58.192.241]) by ipex1.johnshopkins.edu with ESMTP/TLS/DHE-RSA-AES256-SHA; 11 Jul 2007 13:21:14 -0400 From: "Dean Snyder" To: " Hebrew List" Subject: [hebrew] Re: Draft of Samaritan proposal Date: Wed, 11 Jul 2007 13:21:32 -0400 Message-Id: <20070711172132.510376227@smtp.johnshopkins.edu> In-Reply-To: References: X-Mailer: CTM PowerMail version 5.5.2 build 4475 English (PPC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3015 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: dean.snyder@jhu.edu Precedence: bulk X-list: hebrew Michael Everson wrote at 2:09 PM on Wednesday, July 11, 2007: >Please see >http://www.evertype.com/standards/iso10646/pdf/n3xxx-samaritan.pdf >and comment. Michael Everson & Mark Shoulson wrote on page 1: "The destruction of the First Temple and the exile of educated Hebrew- speakers to Babylonia (a province of the Persian empire) changed things greatly, ..." The destruction of the temple and exile of Jews to Babylonia occurred under the reign of the Babylonian emperor, Nebuchadnezzar; the Medo- Persians had not yet conquered Babylon. ------------------------------------------------ John Cowan wrote at 11:23 AM on Wednesday, July 11, 2007: >Calling the current plan Plan A, I propose two alternative plans: > >Plan B1: Use combining marks only, and add a SAMARITAN ZERO-WIDTH >CONSONANT as a new base character for use before an initial vowel. ... > >Plan B2: Use modifier letters only, relying on font kerning to move >the vowel letters slightly to the right when preceded by a consonant. ... I suggest Plan C: break with the mistakes of the past made with Hebrew, Arabic, et al., and encode the Samaritan vowels as (can you imagine it?) stand-alone vowels, i.e., not as combining characters. Leave ligation issues to font technologies and rendering engines. Dean A. Snyder Associate Research Scholar Manager, Digital Hammurabi Project Technology Consultant, Neo-Babylonian Trial Procedure Project Computer Science Department, Whiting School of Engineering 420 Wyman Park Building, 3400 North Charles Street Johns Hopkins University Baltimore, Maryland, USA 21218 cell: 717 817-4897 www.jhu.edu/digitalhammurabi/ www.neh.gov/news/awards/researchawards_052006.html From everson@evertype.com Wed Jul 11 14:56:36 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 14:59:34 -0500 (CDT) Received: from white.dnsireland.com (white.dnsireland.com [67.15.182.33]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BJuZ7v014639 for ; Wed, 11 Jul 2007 14:56:36 -0500 Received: from [88.81.100.235] (helo=[192.168.1.134]) by white.dnsireland.com with esmtpa (Exim 4.66) (envelope-from ) id 1I8iIY-0002rG-Uh for hebrew@unicode.org; Wed, 11 Jul 2007 20:56:31 +0100 Mime-Version: 1.0 Message-Id: In-Reply-To: <20070711172132.510376227@smtp.johnshopkins.edu> References: <20070711172132.510376227@smtp.johnshopkins.edu> Date: Wed, 11 Jul 2007 20:53:24 +0100 To: From: Michael Everson Subject: [hebrew] Re: Draft of Samaritan proposal Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - white.dnsireland.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - evertype.com X-archive-position: 3016 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 13:21 -0400 2007-07-11, Dean Snyder wrote: >Michael Everson & Mark Shoulson wrote on page 1: > >"The destruction of the First Temple and the exile of educated Hebrew- >speakers to Babylonia (a province >of the Persian empire) changed things greatly, ..." > >The destruction of the temple and exile of Jews to Babylonia occurred >under the reign of the Babylonian emperor, Nebuchadnezzar; the Medo- >Persians had not yet conquered Babylon. Pray offer an amendment. And dates to back up your assertion. -- Michael Everson * http://www.evertype.com From cowan@ccil.org Wed Jul 11 15:03:59 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 15:03:59 -0500 (CDT) Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BK3xj1017777 for ; Wed, 11 Jul 2007 15:03:59 -0500 Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from ) id 1I8iPm-0005CJ-KI; Wed, 11 Jul 2007 16:03:58 -0400 Date: Wed, 11 Jul 2007 16:03:58 -0400 To: Dean Snyder Cc: Hebrew List Subject: [hebrew] Re: Draft of Samaritan proposal Message-ID: <20070711200358.GG24331@mercury.ccil.org> References: <20070711172132.510376227@smtp.johnshopkins.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070711172132.510376227@smtp.johnshopkins.edu> User-Agent: Mutt/1.5.13 (2006-08-11) From: John Cowan X-archive-position: 3017 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@ccil.org Precedence: bulk X-list: hebrew Dean Snyder scripsit: > The destruction of the temple and exile of Jews to Babylonia occurred > under the reign of the Babylonian emperor, Nebuchadnezzar; the Medo- > Persians had not yet conquered Babylon. Quite so. The First Temple was destroyed in 586 BCE, whereas the conquest of Babylon (and the restoration of the Jews) did not occur until 539 BCE. I propose the following text: The destruction of the First Temple and the exile of educated Hebrew-speakers to Babylonia changed things greatly, according to Naveh (p. 78). Later generations returned to Judah, by then a Persian province, > >Plan B2: Use modifier letters only, relying on font kerning to move > >the vowel letters slightly to the right when preceded by a consonant. ... > > I suggest Plan C: break with the mistakes of the past made with Hebrew, > Arabic, et al., and encode the Samaritan vowels as (can you imagine it?) > stand-alone vowels, i.e., not as combining characters. Leave ligation > issues to font technologies and rendering engines. This is the same as B2: modifier letters are not combining characters. They are simply smaller than other letters (and typically caseless; all Samaritan is caseless of course). -- Do I contradict myself? John Cowan Very well then, I contradict myself. cowan@ccil.org I am large, I contain multitudes. http://www.ccil.org/~cowan --Walt Whitman, Leaves of Grass From cowan@ccil.org Wed Jul 11 15:27:12 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 15:27:12 -0500 (CDT) Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BKRCZh027868 for ; Wed, 11 Jul 2007 15:27:12 -0500 Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from ) id 1I8imD-0006ms-Io; Wed, 11 Jul 2007 16:27:09 -0400 Date: Wed, 11 Jul 2007 16:27:09 -0400 To: John Hudson Cc: John Cowan , Michael Everson , hebrew@unicode.org Subject: [hebrew] Re: Draft of Samaritan proposal Message-ID: <20070711202709.GH24331@mercury.ccil.org> References: <20070711152333.GF28262@mercury.ccil.org> <469504BD.908@tiro.ca> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <469504BD.908@tiro.ca> User-Agent: Mutt/1.5.13 (2006-08-11) From: John Cowan X-archive-position: 3018 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@ccil.org Precedence: bulk X-list: hebrew John Hudson scripsit: > This would only work if the 'modifier letters' always needed to shift > exactly the same distance to the right to be optimally placed over any > preceding consonant, which I doubt is the case. If the distance varies, > the kerning will cause either collisions or gaps for the consonant > glyphs. I don't understand. I thought that the whole point of kerning pairs was to allow a variable amount of space (positive or negative) between specified pairs of letters. Note that Samaritan vowel signs, unlike Hebrew and Arabic ones, are decidedly after (not just above) the letters, so they do occupy some amount of horizontal space. -- It was impossible to inveigle John Cowan Georg Wilhelm Friedrich Hegel http://www.ccil.org/~cowan Into offering the slightest apology For his Phenomenology. --W. H. Auden, from "People" (1953) From everson@evertype.com Wed Jul 11 15:26:38 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 15:28:25 -0500 (CDT) Received: from white.dnsireland.com (white.dnsireland.com [67.15.182.33]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BKQcYw027661 for ; Wed, 11 Jul 2007 15:26:38 -0500 Received: from [88.81.100.235] (helo=[192.168.1.134]) by white.dnsireland.com with esmtpa (Exim 4.66) (envelope-from ) id 1I8ilc-0003uG-Kn for hebrew@unicode.org; Wed, 11 Jul 2007 21:26:33 +0100 Mime-Version: 1.0 Message-Id: In-Reply-To: <20070711200358.GG24331@mercury.ccil.org> References: <20070711172132.510376227@smtp.johnshopkins.edu> <20070711200358.GG24331@mercury.ccil.org> Date: Wed, 11 Jul 2007 21:21:46 +0100 To: Hebrew List From: Michael Everson Subject: [hebrew] Re: Draft of Samaritan proposal Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - white.dnsireland.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - evertype.com X-archive-position: 3019 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 16:03 -0400 2007-07-11, John Cowan wrote: > The destruction of the First Temple and the exile of educated > Hebrew-speakers to Babylonia changed things greatly, according > to Naveh (p. 78). Later generations returned to Judah, by then > a Persian province, Thank you for your correction. -- Michael Everson * http://www.evertype.com From rosennej@qsm.co.il Wed Jul 11 10:49:37 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 15:40:49 -0500 (CDT) Received: from mx-out2.daemonmail.net (mx-out2.daemonmail.net [216.104.160.40]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BFnaPc009531 for ; Wed, 11 Jul 2007 10:49:37 -0500 Received: from localhost.daemonmail.net (localhost [127.0.0.1]) by mx-out2.daemonmail.net (Postfix) with SMTP id 1304B4416D for ; Wed, 11 Jul 2007 08:49:34 -0700 (PDT) Received: from [85.250.17.76] (via account 11756) by mx-out2.daemonmail.net with ESMTP id gr108sE0 authenticated by POP; Wed, 11 Jul 2007 08:49:22 -0700 (PDT) From: "Jonathan Rosenne" To: "'Hebrew Discussion'" Subject: [hebrew] Re: Draft of Samaritan proposal Date: Wed, 11 Jul 2007 18:49:19 +0300 Message-ID: <002401c7c3d3$13e79b90$6502a8c0@QSM8> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6822 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfDveRP2unWCu7qRCKUsRCpEPNIuQAFLiIg In-Reply-To: Importance: Normal X-archive-position: 3020 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew Why is this on the Hebrew list? If you wish to claim that the Samaritan alphabet is a distinct alphabet it should not be on this list. Jony > -----Original Message----- > From: hebrew-bounce@unicode.org > [mailto:hebrew-bounce@unicode.org] On Behalf Of Michael Everson > Sent: Wednesday, July 11, 2007 4:10 PM > To: Hebrew Discussion > Subject: [hebrew] Draft of Samaritan proposal > > > Please see > http://www.evertype.com/standards/iso10646/pdf/n3xxx-samaritan.pdf > and comment. > > As always punctuation is the complex issue. (Sigh.) > > Thanks. > -- > Michael Everson * http://www.evertype.com > > From cowan@ccil.org Wed Jul 11 16:01:19 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 16:01:19 -0500 (CDT) Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BL1I3x014194 for ; Wed, 11 Jul 2007 16:01:19 -0500 Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from ) id 1I8jJF-0000Yk-Uh; Wed, 11 Jul 2007 17:01:18 -0400 Date: Wed, 11 Jul 2007 17:01:17 -0400 To: Jonathan Rosenne Cc: "'Hebrew Discussion'" Subject: [hebrew] Re: Draft of Samaritan proposal Message-ID: <20070711210117.GK24331@mercury.ccil.org> References: <002401c7c3d3$13e79b90$6502a8c0@QSM8> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <002401c7c3d3$13e79b90$6502a8c0@QSM8> User-Agent: Mutt/1.5.13 (2006-08-11) From: John Cowan X-archive-position: 3021 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@ccil.org Precedence: bulk X-list: hebrew Jonathan Rosenne scripsit: > Why is this on the Hebrew list? If you wish to claim that the Samaritan > alphabet is a distinct alphabet it should not be on this list. Because it's relevant to Hebraicists; the Hebrew list is not for Hebrew alone. -- Cash registers don't really add and subtract; John Cowan they only grind their gears. cowan@ccil.org But then they don't really grind their gears, either; they only obey the laws of physics. --Unknown From everson@evertype.com Wed Jul 11 15:41:54 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 16:02:13 -0500 (CDT) Received: from white.dnsireland.com (white.dnsireland.com [67.15.182.33]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BKfsVI002392 for ; Wed, 11 Jul 2007 15:41:54 -0500 Received: from [88.81.100.235] (helo=[192.168.1.134]) by white.dnsireland.com with esmtpa (Exim 4.66) (envelope-from ) id 1I8j0N-0000X8-VV for hebrew@unicode.org; Wed, 11 Jul 2007 21:41:50 +0100 Mime-Version: 1.0 Message-Id: In-Reply-To: <20070711152333.GF28262@mercury.ccil.org> References: <20070711152333.GF28262@mercury.ccil.org> Date: Wed, 11 Jul 2007 21:41:37 +0100 To: hebrew@unicode.org From: Michael Everson Subject: [hebrew] Re: Draft of Samaritan proposal Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - white.dnsireland.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - evertype.com X-archive-position: 3022 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 11:23 -0400 2007-07-11, John Cowan wrote: >First, an editorial note. Most of the document speaks (rightly) of >VOWEL SIGNs, but in the chart on p. 16 we hear of POINTs. VOWEL SIGNs >they should be. Ceartaithe. >Since this is a contemporary-use script, I'm not going to argue that >it should be unified with any other 22-character West Semitic abjad. Nor should you. This is another identified major node, and our practice is to encode those nodes. See N2311. >But what does concern me is the double encoding of vowels. This is not a >situation like Indic, where the initial vowels are nothing like the vowel >marks: the initial vowels are glyphically identical with the vowel marks, >but encoded separately because Unicode combining marks must have a base. That's the reality of the writing system. The writing system uses combining characters. But an initial vowel is written with a (narrowly) spacing character. The UCS has combining characters and spacing modifier letters. The proposed encoding doesn't create anything new. >Plan B1: Use combining marks only, and add a SAMARITAN ZERO-WIDTH >CONSONANT as a new base character for use before an initial vowel. >This would lengthen texts slightly, but would be a regular and familiar >situation. That is INVISIBLE LETTER and the UTC doesn't like it. >Plan B2: Use modifier letters only, relying on font kerning to move >the vowel letters slightly to the right when preceded by a consonant. >This solution is less artificial, but more unusual; on the other hand, >it might be more legible in environments like Windows, where initially >there would be no support for Samaritan combining characters in Uniscribe. You will be guaranteed dumb fonts that will space all of the vowel modifier letters incorrectly. Combining characters is better. -- Michael Everson * http://www.evertype.com From everson@evertype.com Wed Jul 11 16:09:49 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 16:17:02 -0500 (CDT) Received: from white.dnsireland.com (white.dnsireland.com [67.15.182.33]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BL9n9K018890 for ; Wed, 11 Jul 2007 16:09:49 -0500 Received: from [88.81.100.235] (helo=[192.168.1.134]) by white.dnsireland.com with esmtpa (Exim 4.66) (envelope-from ) id 1I8jRQ-0005tr-8I for hebrew@unicode.org; Wed, 11 Jul 2007 22:09:45 +0100 Mime-Version: 1.0 Message-Id: In-Reply-To: <002401c7c3d3$13e79b90$6502a8c0@QSM8> References: <002401c7c3d3$13e79b90$6502a8c0@QSM8> Date: Wed, 11 Jul 2007 22:07:40 +0100 To: From: Michael Everson Subject: [hebrew] Re: Draft of Samaritan proposal Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - white.dnsireland.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - evertype.com X-archive-position: 3023 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 18:49 +0300 2007-07-11, Jonathan Rosenne wrote: >Why is this on the Hebrew list? If you wish to claim that the Samaritan >alphabet is a distinct alphabet it should not be on this list. Jony, really. They live in Israel and the West Bank. They revere the Pentateuch, even. And their script is closely related to Hebrew and the people who know and care about these things are on this list. -- Michael Everson * http://www.evertype.com From john@tiro.ca Wed Jul 11 16:29:47 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 16:37:55 -0500 (CDT) Received: from pd2mo1so.prod.shaw.ca (shawidc-mo1.cg.shawcable.net [24.71.223.10]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BLTl6Q023475 for ; Wed, 11 Jul 2007 16:29:47 -0500 Received: from pd4mr2so.prod.shaw.ca (pd4mr2so-qfe3.prod.shaw.ca [10.0.141.213]) by l-daemon (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0JL100C72ABUMG60@l-daemon> for hebrew@unicode.org; Wed, 11 Jul 2007 15:28:42 -0600 (MDT) Received: from pn2ml1so.prod.shaw.ca ([10.0.121.145]) by pd4mr2so.prod.shaw.ca (Sun Java System Messaging Server 6.2-7.05 (built Sep 5 2006)) with ESMTP id <0JL100LVXABOUQ81@pd4mr2so.prod.shaw.ca> for hebrew@unicode.org; Wed, 11 Jul 2007 15:28:36 -0600 (MDT) Received: from [192.168.1.101] ([70.66.9.120]) by l-daemon (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0JL100GK7ABNID00@l-daemon> for hebrew@unicode.org; Wed, 11 Jul 2007 15:28:35 -0600 (MDT) Date: Wed, 11 Jul 2007 14:28:33 -0700 From: John Hudson Subject: [hebrew] Re: Draft of Samaritan proposal In-reply-to: <20070711202709.GH24331@mercury.ccil.org> To: John Cowan Cc: Michael Everson , hebrew@unicode.org Message-id: <46954B81.2010605@tiro.ca> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: 7bit References: <20070711152333.GF28262@mercury.ccil.org> <469504BD.908@tiro.ca> <20070711202709.GH24331@mercury.ccil.org> User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) X-archive-position: 3024 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: john@tiro.ca Precedence: bulk X-list: hebrew John Cowan wrote: >> This would only work if the 'modifier letters' always needed to shift >> exactly the same distance to the right to be optimally placed over any >> preceding consonant, which I doubt is the case. If the distance varies, >> the kerning will cause either collisions or gaps for the consonant >> glyphs. > I don't understand. I thought that the whole point of kerning pairs > was to allow a variable amount of space (positive or negative) between > specified pairs of letters. Yes, but if one of the glyphs in that pair were zero-width -- as I was presuming to be the case with dependent marks -- then pair kerning affects the relationship of the glyph on either side of the zero-width glyph in ways that are likely to be undesirable. This is why mark positioning is normally and properly handled independently of base kerning (with possible exceptions in some scripts, e.g. kerning tall Thai vowels off preceding mark stacks). > Note that Samaritan vowel signs, unlike > Hebrew and Arabic ones, are decidedly after (not just above) the letters, > so they do occupy some amount of horizontal space. If they are not zero-width, then they would perhaps not be positioned using mark anchor attachment. But if the advance width is narrow, then the above kerning issue may still present problems and require contextual kerning, which can get messy. I have not had a chance to review the Samaritan proposal yet, so these comments are based on basic font handling. John Hudson -- Tiro Typeworks www.tiro.com Gulf Islands, BC tiro@tiro.com We say our understanding measures how things are, and likewise our perception, since that is how we find our way around, but in fact these do not measure. They are measured. -- Aristotle, Metaphysics From everson@evertype.com Wed Jul 11 16:36:37 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 16:39:36 -0500 (CDT) Received: from white.dnsireland.com (white.dnsireland.com [67.15.182.33]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BLabi3024707 for ; Wed, 11 Jul 2007 16:36:37 -0500 Received: from [88.81.100.235] (helo=[192.168.1.134]) by white.dnsireland.com with esmtpa (Exim 4.66) (envelope-from ) id 1I8jrM-0006Gl-Rv for hebrew@unicode.org; Wed, 11 Jul 2007 22:36:33 +0100 Mime-Version: 1.0 Message-Id: In-Reply-To: <46954B81.2010605@tiro.ca> References: <20070711152333.GF28262@mercury.ccil.org> <469504BD.908@tiro.ca> <20070711202709.GH24331@mercury.ccil.org> <46954B81.2010605@tiro.ca> Date: Wed, 11 Jul 2007 22:33:37 +0100 To: hebrew@unicode.org From: Michael Everson Subject: [hebrew] Re: Draft of Samaritan proposal Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - white.dnsireland.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - evertype.com X-archive-position: 3025 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 14:28 -0700 2007-07-11, John Hudson wrote: >I have not had a chance to review the Samaritan proposal yet, so >these comments are based on basic font handling. Let's say that circumflex means "a" and we want to write the word ABRACADABRA in Samaritan. We would write ^BR^C^D^BR^. The problem is that the first one has no letter to attach to. So we propose a spacing modifier ^ for that and combining ^ for the others. -- Michael Everson * http://www.evertype.com From mark@kli.org Wed Jul 11 18:02:02 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 18:05:11 -0500 (CDT) Received: from pi.meson.org (pi.meson.org [66.134.26.207]) by unicode.org (8.13.4/8.12.11) with SMTP id l6BN21oE013135 for ; Wed, 11 Jul 2007 18:02:01 -0500 Received: (qmail 8549 invoked from network); 11 Jul 2007 23:01:57 -0000 Received: from nagas.meson.org (HELO ?192.168.1.101?) (1000@192.168.1.101) by pi.meson.org with SMTP; 11 Jul 2007 23:01:57 -0000 Message-ID: <46956164.1020302@kli.org> Date: Wed, 11 Jul 2007 19:01:56 -0400 From: "Mark E. Shoulson" User-Agent: Thunderbird 1.5.0.12 (X11/20070509) MIME-Version: 1.0 To: John Hudson CC: John Cowan , Michael Everson , hebrew@unicode.org Subject: [hebrew] Re: Draft of Samaritan proposal References: <20070711152333.GF28262@mercury.ccil.org> <469504BD.908@tiro.ca> <20070711202709.GH24331@mercury.ccil.org> <46954B81.2010605@tiro.ca> In-Reply-To: <46954B81.2010605@tiro.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3026 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: mark@kli.org Precedence: bulk X-list: hebrew John Hudson wrote: > John Cowan wrote: > >> Note that Samaritan vowel signs, unlike >> Hebrew and Arabic ones, are decidedly after (not just above) the >> letters, >> so they do occupy some amount of horizontal space. > > If they are not zero-width, then they would perhaps not be positioned > using mark anchor attachment. But if the advance width is narrow, then > the above kerning issue may still present problems and require > contextual kerning, which can get messy. Welll... They seem to be zero-width, but are written decidedly on the left "shoulder" of the letters (usually). And when there are several of them, they line up side by side and actually push the next letter away, thus acting like they are spacing. So maybe they're spacing but only very very little, or zero (and they kern between themselves?) > I have not had a chance to review the Samaritan proposal yet, so these > comments are based on basic font handling. Basically the problem is that Samaritan vowels are *mostly* like ordinary Hebrew vowels, except that in a few cases they can actually appear *before* the first letter of the word (for an epenthetic vowel). Sometimes you wind up with some conflict depending on how you resolve this problem. For example, the word "women" is NUN-SHIN-YOD-MEM (using Hebrew letter-names for familiarity's sake). In Samaritan pronunciation, this is vocalized as "inshem", with an "i" vowel *before* the NUN, which then carries a null-vowel in the usual way (on the left), followed by the SHIN with the "e" vowel (and the YOD has no vowel). Using a modifier letter here makes a certain amount of sense, more sense than using a combining character that comes after the NUN but appears before it, especially since the NUN has another vowel to carry. But when the definite article "aa-" is added (the HEH is silent in Samaritan pronunciation), we get "aa-inshem" (with a glottal stop of some kind separating the vowels), that is, HEH+AA, then the "I" and the NUN+null and so on as above. So this is a case of two vowels on the HEH, and maybe would be coded as HEH+AA+I, which is okay, except that intuitively we seem to have changed the spelling of the base word somehow. Anyone got any better ideas? ~mark From everson@evertype.com Wed Jul 11 18:14:27 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 18:30:30 -0500 (CDT) Received: from white.dnsireland.com (white.dnsireland.com [67.15.182.33]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BNERxp015921 for ; Wed, 11 Jul 2007 18:14:27 -0500 Received: from [88.81.100.235] (helo=[192.168.1.134]) by white.dnsireland.com with esmtpa (Exim 4.66) (envelope-from ) id 1I8lNu-0006jO-Jw; Thu, 12 Jul 2007 00:14:15 +0100 Mime-Version: 1.0 Message-Id: In-Reply-To: <46956164.1020302@kli.org> References: <20070711152333.GF28262@mercury.ccil.org> <469504BD.908@tiro.ca> <20070711202709.GH24331@mercury.ccil.org> <46954B81.2010605@tiro.ca> <46956164.1020302@kli.org> Date: Thu, 12 Jul 2007 00:13:58 +0100 To: "Mark E. Shoulson" , John Hudson From: Michael Everson Subject: [hebrew] Re: Draft of Samaritan proposal Cc: John Cowan , hebrew@unicode.org Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - white.dnsireland.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - evertype.com X-archive-position: 3027 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 19:01 -0400 2007-07-11, Mark E. Shoulson wrote: >Welll... They seem to be zero-width, You mean "non-spacing" i.e. "combining" >but are written decidedly on the left "shoulder" of the letters >(usually). And when there are several of them, they line up side by >side and actually push the next letter away, thus acting like they >are spacing. Not so differentl from Greek. >Basically the problem is that Samaritan vowels are *mostly* like >ordinary Hebrew vowels, except that in a few cases they can actually >appear *before* the first letter of the word (for an epenthetic >vowel). That's the modifier letter. >Sometimes you wind up with some conflict depending on how you >resolve this problem. For example, the word "women" is >NUN-SHIN-YOD-MEM (using Hebrew letter-names for familiarity's sake). >In Samaritan pronunciation, this is vocalized as "inshem", with an >"i" vowel *before* the NUN, which then carries a null-vowel in the >usual way (on the left), followed by the SHIN with the "e" vowel >(and the YOD has no vowel). Mod-I + N + SUKUN + SH + E + Y + M. >Using a modifier letter here makes a certain amount of sense, more >sense than using a combining character that comes after the NUN but >appears before it, especially since the NUN has another vowel to >carry. Exactly. >But when the definite article "aa-" is added (the HEH is silent in >Samaritan pronunciation), we get "aa-inshem" (with a glottal stop of >some kind separating the vowels), that is, HEH+AA, then the "I" and >the NUN+null and so on as above. So this is a case of two vowels on >the HEH, and maybe would be coded as HEH+AA+I, which is okay, except >that intuitively we seem to have changed the spelling of the base >word somehow. We'd need an example. -- Michael Everson * http://www.evertype.com From verdy_p@wanadoo.fr Wed Jul 11 18:30:20 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 18:31:11 -0500 (CDT) Received: from smtp24.orange.fr (smtp24.orange.fr [193.252.22.28]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BNUKjL019040 for ; Wed, 11 Jul 2007 18:30:20 -0500 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2447.orange.fr (SMTP Server) with ESMTP id 5FA7A1C0008E for ; Thu, 12 Jul 2007 01:30:14 +0200 (CEST) Received: from HARNON (APoitiers-156-1-94-116.w86-221.abo.wanadoo.fr [86.221.101.116]) by mwinf2447.orange.fr (SMTP Server) with ESMTP id EE47D1C00088; Thu, 12 Jul 2007 01:30:13 +0200 (CEST) X-ME-UUID: 20070711233013976.EE47D1C00088@mwinf2447.orange.fr Reply-To: From: "Philippe Verdy" To: "'John Cowan'" Cc: "'Michael Everson'" , References: <20070711152333.GF28262@mercury.ccil.org> <01d501c7c3d3$c0df3d80$0a01a8c0@rodage.dyndns.org> <20070711185549.GA24331@mercury.ccil.org> Subject: [hebrew] Re: Draft of Samaritan proposal Date: Thu, 12 Jul 2007 01:30:07 +0200 Organization: Ordinateur Personnel Message-ID: <021101c7c413$6adbc940$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <20070711185549.GA24331@mercury.ccil.org> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfD7RuJuzB/dyckSna2dDYuPnRwMwAIsuAQ Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id l6BNUKjL019040 X-archive-position: 3028 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew John Cowan [mailto:cowan@ccil.org] > Envoyé : mercredi 11 juillet 2007 20:56 > À : Philippe Verdy > Cc : 'John Cowan'; 'Michael Everson'; hebrew@unicode.org > Objet : Re: [hebrew] Re: Draft of Samaritan proposal > > Philippe Verdy scripsit: > > > Isn't Samaritan a right-to-left script that will require Uniscribe > > support in every case, otherwise the Bidi algorithm (including mirroring > > issues) won't work correctly and the directionality will be uniformly > > left-to-right? > > Samaritan is indeed RTL, but it is in the reserved RTL range. > See http://unicode.org/Public/UNIDATA/extracted/DerivedBidiClass.txt > which says: > > # The unassigned characters that default to R are: > # Hebrew, Cypriot_Syllabary, Kharoshthi, and the ranges \u07C0-\u08FF > # \uFB1D-\uFB4F \U00010840-\U000109FF \U00010A60-\U00010FFF > > I don't know if current versions of Uniscribe actually take advantage > of this. I am still not sure that the script's block will be in the BMP within this area. It is what Michael suggests in question 6: > > 6a. After giving due considerations to the principles in the P&P > > document must the proposed characters be entirely in the BMP? > Yes. > > 6b. If YES, is a rationale provided? > Yes. > > 6c. If YES, reference. > Accordance with the Roadmap; RTL script with modern use. But the formal assignment of the block is still not in the proposal. In fact, there's an indication given indirectly within the proposed chart (which indicates 0800-083F) but as you said, I'm not sure that the default BiDi properties are used by Uniscribe. And the "modern use" may still be contested (otherwise, many other extinct scripts could be as well considered "modern use", only based on the fact that someone is working with it today, to make the encoding proposal); there are lots of people working in ancient ehyptian hieroglyphs, but this does not make it "modern use". There's only one sample provided (from weekly Saramitan News A.B.) but no comment about the actual language used by this weekly paper : this may be a sample illustrative page within a cultural section of the paper (some indications: this is page 24 only, there are decorating frames.) All other samples are from historical manuscripts, or descriptions of the Samaritan script within another modern language. What may be criticized in the proposal, is that the samples are dated from the date of publication of the book that contain it, not the date when the artistic work or text was actually created before being reproduced in that publication. All these samples seem to come from palaeographic studies by orientalist searchers. From verdy_p@wanadoo.fr Wed Jul 11 18:44:14 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 19:09:16 -0500 (CDT) Received: from smtp24.orange.fr (smtp24.orange.fr [193.252.22.28]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6BNiDls021589 for ; Wed, 11 Jul 2007 18:44:14 -0500 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2447.orange.fr (SMTP Server) with ESMTP id 6A5E41C00087 for ; Thu, 12 Jul 2007 01:44:08 +0200 (CEST) Received: from HARNON (APoitiers-156-1-94-116.w86-221.abo.wanadoo.fr [86.221.101.116]) by mwinf2447.orange.fr (SMTP Server) with ESMTP id 1916C1C00081; Thu, 12 Jul 2007 01:44:08 +0200 (CEST) X-ME-UUID: 20070711234408103.1916C1C00081@mwinf2447.orange.fr Reply-To: From: "Philippe Verdy" To: "'Michael Everson'" , "'Hebrew Discussion'" References: Subject: [hebrew] Re: Draft of Samaritan proposal Date: Thu, 12 Jul 2007 01:44:01 +0200 Organization: Ordinateur Personnel Message-ID: <021201c7c415$5bffb330$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" X-Mailer: Microsoft Office Outlook 11 In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfDvpJbvX7SH9cRRrmZZ5DNqGX5tgAVeNqw Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id l6BNiDls021589 X-archive-position: 3029 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew There's a small editorial typo in section "4." of the proposal: 4. Vowels and other marks of pronunciation. (...) Users concerned with spoofing possibilities should note the similarity between "’" MODIFIER LETTER SHORT A and "â—Œ<" VOWEL SIGN SHORT A and between "<" MODIFIER LETTER I and "â—Œ<" VOWEL SIGN I. Obviously, the similarity of VOWEL SIGN SHORT A shows the glyph for the proposed VOWEL SIGN I. Philippe. > -----Message d'origine----- Michael Everson wrote: > > Please see > http://www.evertype.com/standards/iso10646/pdf/n3xxx-samaritan.pdf > and comment. From verdy_p@wanadoo.fr Wed Jul 11 20:09:54 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 22:22:16 -0500 (CDT) Received: from smtp24.orange.fr (smtp24.orange.fr [193.252.22.27]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C19sWN029735 for ; Wed, 11 Jul 2007 20:09:54 -0500 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2453.orange.fr (SMTP Server) with ESMTP id 78A3A1C00088 for ; Thu, 12 Jul 2007 03:09:48 +0200 (CEST) Received: from HARNON (APoitiers-156-1-94-116.w86-221.abo.wanadoo.fr [86.221.101.116]) by mwinf2453.orange.fr (SMTP Server) with ESMTP id 09DF11C00084; Thu, 12 Jul 2007 03:09:47 +0200 (CEST) X-ME-UUID: 20070712010948406.09DF11C00084@mwinf2453.orange.fr Reply-To: From: "Philippe Verdy" To: "'John Cowan'" , "'Dean Snyder'" Cc: "'Hebrew List'" References: <20070711172132.510376227@smtp.johnshopkins.edu> <20070711200358.GG24331@mercury.ccil.org> Subject: [hebrew] Re: Draft of Samaritan proposal Date: Thu, 12 Jul 2007 03:09:41 +0200 Organization: Ordinateur Personnel Message-ID: <021301c7c421$539f1d50$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <20070711200358.GG24331@mercury.ccil.org> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfD+I5M0ErujbyFQpKc0057sFWLhAAI9b0Q X-archive-position: 3030 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew De John Cowan: > This is the same as B2: modifier letters are not combining characters. > They are simply smaller than other letters (and typically caseless; > all Samaritan is caseless of course). The actual difference between combining characters and modifier letters is that combining letters cannot be used reliably and meaningfully isolately without a base letter. On the opposite, modifier letters, despite they are generally used to alter some other nearby letter or grapheme cluster, may be seen isolately, and the associated letter are not modified significantly (they are still keeping their identity phonologically, even if there's a minor phonetic variation). Using the term "modifier letter" for two vowels may seem a bit abusive, because they are in fact not modifying the consonant at all; in the leading position, they do not modify anything and stand for their own in the samples provided. Why aren't they simply letters? The most probable reason is that they are not given any difference by Samaritans with other vowels. Given the graphical features exposed, I also have doubts that the other vowels should be treated as combining diacritics. Making them all encoded as "modifier letters" (even if the term seems abusive) is probably the best solution, given that they are rendered with kerning, and not always above the left side of the consonant they follow, but sometimes as well on the right side of the consonant they precede (as seen in figure 8). Looking at figure 8, it really seems that the vowels were all added after writing the whole words, on top of them. Most often they are not stacked, but there are a few places in this figure where stacking was apparently necessary (because another base letter on the left has a higher part slightly kerned above the base letter that it follows; see the end of line 3). On other cases, the multiple vowels may have been written after completing the middle of the word, causing a small gap before the final letter (or this letter may have been added afterward). Many things in this figure really suggest that vowels are treated as a separate flow, one good reason to think they are not true diacritics. We can't encode them as normal letters given that this flowis optional. So why not making them all encoded a modifier letters? This would still reveal their optionality, but this may be confirmed within collation rules. The fact that vowels may alter the distance between base vowels, creating gaps in some cases, also suggests that they are not true diacritics. So all could be treated through simple kerning pairs in fonts. For monospaced fonts in terminals or editors, in fact I see nothing wrong if the vowels take a complete cell with their own non-zero advance: this already occurs in the historic figures shown in the proposals. This may look ugly, but it is already happening in some cases. But the background rationale for the choice proposed by Michael may be the proximity of the script with Hebrew whose vowels are treated as diacritics. From kenw@sybase.com Wed Jul 11 20:19:26 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 22:22:16 -0500 (CDT) Received: from fm200.sybase.com (fm200.sybase.com [192.138.151.122]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C1JPdv004388 for ; Wed, 11 Jul 2007 20:19:25 -0500 Received: from smtp2.sybase.com (sybgate2.sybase.com [10.22.97.85]) by fm200.sybase.com with ESMTP id l6C1JKl23608; Wed, 11 Jul 2007 18:19:20 -0700 (PDT) Received: from atlantis-new.sybase.com (localhost [127.0.0.1]) by smtp2.sybase.com with ESMTP id l6C1JJH28929; Wed, 11 Jul 2007 18:19:19 -0700 (PDT) Received: from birdie.sybase.com (birdie.sybase.com [10.22.85.43]) by atlantis-new.sybase.com (8.13.7+Sun/8.13.7) with ESMTP id l6C1Ix7E003027; Wed, 11 Jul 2007 18:18:59 -0700 (PDT) Received: from birdie (birdie [10.22.85.43]) by birdie.sybase.com (8.11.6+Sun/8.11.6) with SMTP id l6C1IxW06378; Wed, 11 Jul 2007 18:18:59 -0700 (PDT) Message-Id: <200707120118.l6C1IxW06378@birdie.sybase.com> Date: Wed, 11 Jul 2007 18:18:59 -0700 (PDT) From: Kenneth Whistler Reply-To: Kenneth Whistler Subject: [hebrew] Re: Draft of Samaritan proposal To: cowan@ccil.org Cc: hebrew@unicode.org, kenw@sybase.com MIME-Version: 1.0 Content-Type: TEXT/plain; charset=us-ascii Content-MD5: ThnCtVJXj68Rmf/9JgOqYw== X-Mailer: dtmail 1.3.0 @(#)CDE Version 1.4.6_06 SunOS 5.8 sun4u sparc X-archive-position: 3031 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: kenw@sybase.com Precedence: bulk X-list: hebrew John Cowan started off this discussion with: > But what does concern me is the double encoding of vowels. Me, too. > This is not a > situation like Indic, where the initial vowels are nothing like the vowel > marks: the initial vowels are glyphically identical with the vowel marks, > but encoded separately because Unicode combining marks must have a base. Exactly. So Plan A is an encoding hack, which doesn't follow from the logic of the Samaritan script per se, but follows from the structure of the Unicode encoding, given a determination that the vowel signs are combining marks (which seems justified, given their general behavior and their relationship to Hebrew vowelling), added to the need for Unicode combining marks to have a base in order to result in well-formed combining character sequences. > > Calling the current plan Plan A, I propose two alternative plans: > > Plan B1: Use combining marks only, and add a SAMARITAN ZERO-WIDTH > CONSONANT as a new base character for use before an initial vowel. > This would lengthen texts slightly, but would be a regular and familiar > situation. This would be neither ZERO-WIDTH nor a CONSONANT, so calling it that would be a bit of a misnomer. If something like this were to be added to the encoding, I would suggest instead: SAMARITAN BLANK BASE, since its glyph would be blank, and its function would be to serve as a base character, not as a consonant. As Michael pointed out, this is getting very close again to the concept of a generic INVISIBLE LETTER, which hasn't passed muster yet in the UTC, although it hasn't actually been rejected as a concept yet, either. So the problem with proposing a SAMARITAN BLANK BASE character would be that it would immediately raise all the issues about a generic character for this functionality. The UTC is unlikely to want to encode another blank base character each time this kind of display behavior shows up in a script. And if advocating for Plan B1, one needs to first analyze and either turn a thumbs up or down on the following options as well: Plan B1a: Use NBSP as the base. Plan B1b: Use NNBSP as the base. Note that both U+00A0 NBSP and U+202F NNBSP are Grapheme_Base=True precisely for use in this kind of combining mark display. NBSP (or NNBSP) followed by a combining mark *is* a well-formed combining character sequence. NNBSP might be the better choice, because these initial vowelings don't appear to require abnormally wide spaces to sit on. Of course, U+0020 SPACE is also a Grapheme_Base, but it has all the wrong breaking properties for this kind of functionality. But NBSP and NNBSP are lb=GL, which is o.k. for linebreaking. Wordbreaking wouldn't be correct by default, but it would be straightforward to fix for Samaritan: you would want NNBSP + NSM --> ALetter, for the purposes of wordbreaking. The advantage of going with NNBSP, besides it simply displaying correctly immediately in a properly constructed implementation, is that it keeps the i and a vowels as a single character, not requiring special equivalencing in the collation algorithm or other matching algorithms, and it doesn't require arguing for any *new* oddball character functionality -- since the character is already encoded. > Plan B2: Use modifier letters only, relying on font kerning to move > the vowel letters slightly to the right when preceded by a consonant. > This solution is less artificial, but more unusual; on the other hand, > it might be more legible in environments like Windows, where initially > there would be no support for Samaritan combining characters in Uniscribe. I agree with John Hudson's general critique of heading this direction. Functionally and historically these are combining marks, and their display is also handled most generically, I think, within the context of the kind of behavior already dealt with in fonts for combining marks. The existence of two of these vowel marks in Samaritan being displayed ahead of the consonants, with no visible base, is not a strong enough reason, in my reckoning, to introduce yet another paradigm, as it were, for dealing with combining marks whose display departs a little from regularly stacking nonspacing marks (directly) above base letters. Also, from all the evidence of the figures in the proposal, Samaritan also follows the conventions of Hebrew in normally being written without vowel signs at all. I wouldn't want what is clearly a secondary tier of rendering behavior for the writing system to turn what is nominally a very straightforward RtL script into something which we treat as overly complicated and model-breaking for the standard. I would like someone to attempt to offer a convincing argument against Plan B1b (use NNBSP) -- that is simply encoding the vowel signs once and specifying a convention of using the existing NNBSP when you need to spell a Samaritan word with an initial i- or a- vowelling. --Ken From verdy_p@wanadoo.fr Wed Jul 11 20:54:18 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 22:22:31 -0500 (CDT) Received: from smtp24.orange.fr (smtp24.orange.fr [193.252.22.27]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C1sHLa019387 for ; Wed, 11 Jul 2007 20:54:18 -0500 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2411.orange.fr (SMTP Server) with ESMTP id 440E41C00082 for ; Thu, 12 Jul 2007 03:54:12 +0200 (CEST) Received: from HARNON (APoitiers-156-1-94-116.w86-221.abo.wanadoo.fr [86.221.101.116]) by mwinf2411.orange.fr (SMTP Server) with ESMTP id F3B7B1C00081; Thu, 12 Jul 2007 03:54:11 +0200 (CEST) X-ME-UUID: 20070712015411998.F3B7B1C00081@mwinf2411.orange.fr Reply-To: From: "Philippe Verdy" To: "'Michael Everson'" , References: <20070711152333.GF28262@mercury.ccil.org> <469504BD.908@tiro.ca> <20070711202709.GH24331@mercury.ccil.org> <46954B81.2010605@tiro.ca> Subject: [hebrew] Re: Draft of Samaritan proposal Date: Thu, 12 Jul 2007 03:54:05 +0200 Organization: Ordinateur Personnel Message-ID: <021c01c7c427$87657390$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfEBRDa1r8pIuvxSFWM8fDD01eKxAAIOU7w X-archive-position: 3032 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew Michael Everson wrote: > > >I have not had a chance to review the Samaritan proposal yet, so > >these comments are based on basic font handling. > > Let's say that circumflex means "a" and we want to write the word > ABRACADABRA in Samaritan. We would write ^BR^C^D^BR^. The problem is > that the first one has no letter to attach to. So we propose a > spacing modifier ^ for that and combining ^ for the others. And this becomes non-sense because you are desunifying the same "^" vowel. There's no demonstration that Samaritan vowels effectively "attach" to the base consonant. I see more evidences that they are completely separated and not logically bound to the consonant above which they MAY appear. So I would simply encode them all as base characters, i.e. as (optional) "letter modifiers". This will stop the problem of modified orthographies with compound words, such as the one suggested with the definite article "aa-" before a word starting with a vowel with no intermediate consonant. All the rest may be solved by kerning pairs in fonts (which would optionally group a consonant+vowel pair with GSUB to allow further kerning of this group with another consonant after it (i.e. on the left), if such grouping is possible. When a sequence like exists, it is likely that only the first group will be substituted into a composed glyph into followed by another glyph for that will likely not kern into the the following consonant but will keep its intrincic minimum spacing. From dean.snyder@jhu.edu Wed Jul 11 22:10:23 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 22:22:49 -0500 (CDT) Received: from ipex4.johnshopkins.edu (ipex4.johnshopkins.edu [128.220.161.141]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C3ANa1022335 for ; Wed, 11 Jul 2007 22:10:23 -0500 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: AgAAACo4lUZHOsDxQWdsb2JhbAANjyMBAQE9 X-IronPort-AV: E=Sophos;i="4.16,529,1175486400"; d="scan'208"; Received: from c-71-58-192-241.hsd1.pa.comcast.net (HELO [192.168.1.103]) ([71.58.192.241]) by ipex4.johnshopkins.edu with ESMTP/TLS/DHE-RSA-AES256-SHA; 11 Jul 2007 23:10:06 -0400 From: "Dean Snyder" To: " Hebrew List" Subject: [hebrew] Re: Draft of Samaritan proposal Date: Wed, 11 Jul 2007 23:10:24 -0400 Message-Id: <20070712031024.1779461035@smtp.johnshopkins.edu> In-Reply-To: <20070711200358.GG24331@mercury.ccil.org> References: <20070711172132.510376227@smtp.johnshopkins.edu> <20070711200358.GG24331@mercury.ccil.org> X-Mailer: CTM PowerMail version 5.5.2 build 4475 English (PPC) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-archive-position: 3033 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: dean.snyder@jhu.edu Precedence: bulk X-list: hebrew John Cowan wrote at 4:03 PM on Wednesday, July 11, 2007: >Dean Snyder scripsit: >> I suggest Plan C: break with the mistakes of the past made with Hebrew, >> Arabic, et al., and encode the Samaritan vowels as (can you imagine it?) >> stand-alone vowels, i.e., not as combining characters. Leave ligation >> issues to font technologies and rendering engines. > >This is the same as B2: >modifier letters are not combining characters. My Plan C is not the same as your Plan B2. Modifier letters are free- standing spacing characters, something one cannot claim for the Samaritan vowels. In Unicode parlance, Samaritan vowels are no more modifier letters than a, e, i, o, & u are, and therefore should not be encoded as such. They are, like Hebrew vowels, simply letters that are usually, but not always, non-spacing, and their Unicode properties should reflect their real-world properties. Dean A. Snyder Associate Research Scholar Manager, Digital Hammurabi Project Technology Consultant, Neo-Babylonian Trial Procedure Project Computer Science Department, Whiting School of Engineering 420 Wyman Park Building, 3400 North Charles Street Johns Hopkins University Baltimore, Maryland, USA 21218 cell: 717 817-4897 www.jhu.edu/digitalhammurabi/ www.neh.gov/news/awards/researchawards_052006.html From verdy_p@wanadoo.fr Wed Jul 11 22:42:14 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 11 Jul 2007 22:46:26 -0500 (CDT) Received: from smtp28.orange.fr (smtp28.orange.fr [80.12.242.99]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C3gDcB006394 for ; Wed, 11 Jul 2007 22:42:13 -0500 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2829.orange.fr (SMTP Server) with ESMTP id B28957000087 for ; Thu, 12 Jul 2007 05:42:07 +0200 (CEST) Received: from HARNON (APoitiers-156-1-94-116.w86-221.abo.wanadoo.fr [86.221.101.116]) by mwinf2829.orange.fr (SMTP Server) with ESMTP id 54DE67000082; Thu, 12 Jul 2007 05:42:07 +0200 (CEST) X-ME-UUID: 20070712034207347.54DE67000082@mwinf2829.orange.fr Reply-To: From: "Philippe Verdy" To: "'Kenneth Whistler'" , Cc: References: <200707120118.l6C1IxW06378@birdie.sybase.com> Subject: [hebrew] Re: Draft of Samaritan proposal Date: Thu, 12 Jul 2007 05:41:59 +0200 Organization: Ordinateur Personnel Message-ID: <000001c7c436$9a864f80$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <200707120118.l6C1IxW06378@birdie.sybase.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfENRK8WiCAtbbHSYW56AbyY3fSFQAATifA Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id l6C3gDcB006394 X-archive-position: 3034 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew Kenneth Whistler wrote: > Envoyé : jeudi 12 juillet 2007 03:19 > À : cowan@ccil.org > Cc : hebrew@unicode.org; kenw@sybase.com > Objet : [hebrew] Re: Draft of Samaritan proposal > > John Cowan started off this discussion with: > > > But what does concern me is the double encoding of vowels. > > Me, too. > (...) > I would like someone to attempt to offer a convincing argument > against Plan B1b (use NNBSP) -- that is simply encoding the > vowel signs once and specifying a convention of using the > existing NNBSP when you need to spell a Samaritan word with > an initial i- or a- vowelling. So why not accepting the two proposed letter modifiers, but making them canonically equivalent to a space holder followed by the normal combining vowel. This ensures the identity of the two approaches, and explicitly says that the two kinds of vowels are in fact unified (through normalization)... From verdy_p@wanadoo.fr Wed Jul 11 23:27:50 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 00:21:29 -0500 (CDT) Received: from smtp28.orange.fr (smtp28.orange.fr [80.12.242.101]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C4Rofe025404 for ; Wed, 11 Jul 2007 23:27:50 -0500 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2819.orange.fr (SMTP Server) with ESMTP id 57159700008C for ; Thu, 12 Jul 2007 06:27:44 +0200 (CEST) Received: from HARNON (APoitiers-156-1-94-116.w86-221.abo.wanadoo.fr [86.221.101.116]) by mwinf2819.orange.fr (SMTP Server) with ESMTP id E6A717000083; Thu, 12 Jul 2007 06:27:43 +0200 (CEST) X-ME-UUID: 20070712042743944.E6A717000083@mwinf2819.orange.fr Reply-To: From: "Philippe Verdy" To: , "'Kenneth Whistler'" , Cc: References: <200707120118.l6C1IxW06378@birdie.sybase.com> <000001c7c436$9a864f80$0a01a8c0@rodage.dyndns.org> Subject: [hebrew] Re: Draft of Samaritan proposal Date: Thu, 12 Jul 2007 06:27:36 +0200 Organization: Ordinateur Personnel Message-ID: <000601c7c43c$f9a4afb0$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <000001c7c436$9a864f80$0a01a8c0@rodage.dyndns.org> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfENRK8WiCAtbbHSYW56AbyY3fSFQAATifAAAD9lhA= Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id l6C4Rofe025404 X-archive-position: 3035 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew I wrote: > Kenneth Whistler wrote: > > Envoyé : jeudi 12 juillet 2007 03:19 > > À : cowan@ccil.org > > Cc : hebrew@unicode.org; kenw@sybase.com > > Objet : [hebrew] Re: Draft of Samaritan proposal > > > > John Cowan started off this discussion with: > > > > > But what does concern me is the double encoding of vowels. > > > > Me, too. > > (...) > > I would like someone to attempt to offer a convincing argument > > against Plan B1b (use NNBSP) -- that is simply encoding the > > vowel signs once and specifying a convention of using the > > existing NNBSP when you need to spell a Samaritan word with > > an initial i- or a- vowelling. > > So why not accepting the two proposed letter modifiers, but making them > canonically equivalent to a space holder followed by the normal combining > vowel. This ensures the identity of the two approaches, and explicitly > says > that the two kinds of vowels are in fact unified (through > normalization)... After some thoughts, I considered the case where a Samaritan text encoded with explicit vowels, needs to be converted into text without the optional vowels. In that case, the initial vowel should disappear, but the base holder should not impact word breaking (for example when the initial vowel follows the definite article): is the space or even the narrow space appropriate? Shouldn't it be instead a zero-width non-breaking space (ZWNBSP) or word joiner (WJ), so that: - "inshem" may be represented completely as or processed as if we remove vowels, and whose first character is ignorable and then displayed correctly as . - "aa-inshem" may be represented completely as or processed as if we remove vowels, and whose first character is ignorable and then displayed correctly as The interesting thing with ZWNBSP here (or WORD JOINER if we want to avoid ZWNBSP due to its possible ambiguity at the beginning of text, which is likely to occur here because we are speaking of initial vowels) is that it is already ignorable with the default collation, it does not allow a word break to occur between the definite article prefix and the following article. It is the perfect choice for the invisible null-consonnant, including in other scripts. From john@tiro.ca Wed Jul 11 23:51:22 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 00:21:42 -0500 (CDT) Received: from pd4mo2so.prod.shaw.ca (shawidc-mo1.cg.shawcable.net [24.71.223.10]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C4pLMx018380 for ; Wed, 11 Jul 2007 23:51:21 -0500 Received: from pd2mr2so.prod.shaw.ca (pd2mr2so-qfe3.prod.shaw.ca [10.0.141.109]) by l-daemon (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0JL100DF5UTHPL00@l-daemon> for hebrew@unicode.org; Wed, 11 Jul 2007 22:51:17 -0600 (MDT) Received: from pn2ml8so.prod.shaw.ca ([10.0.121.152]) by pd2mr2so.prod.shaw.ca (Sun Java System Messaging Server 6.2-7.05 (built Sep 5 2006)) with ESMTP id <0JL100B97UTGARH0@pd2mr2so.prod.shaw.ca> for hebrew@unicode.org; Wed, 11 Jul 2007 22:51:17 -0600 (MDT) Received: from [192.168.1.101] ([70.66.9.120]) by l-daemon (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0JL100JRVUTFSCG0@l-daemon> for hebrew@unicode.org; Wed, 11 Jul 2007 22:51:16 -0600 (MDT) Date: Wed, 11 Jul 2007 21:51:13 -0700 From: John Hudson Subject: [hebrew] Re: Draft of Samaritan proposal In-reply-to: <46956164.1020302@kli.org> To: "Mark E. Shoulson" Cc: John Cowan , Michael Everson , hebrew@unicode.org Message-id: <4695B341.3090402@tiro.ca> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: 7bit References: <20070711152333.GF28262@mercury.ccil.org> <469504BD.908@tiro.ca> <20070711202709.GH24331@mercury.ccil.org> <46954B81.2010605@tiro.ca> <46956164.1020302@kli.org> User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) X-archive-position: 3036 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: john@tiro.ca Precedence: bulk X-list: hebrew Mark E. Shoulson wrote: > Welll... They seem to be zero-width, but are written decidedly on the > left "shoulder" of the letters (usually). And when there are several of > them, they line up side by side and actually push the next letter away, > thus acting like they are spacing. So maybe they're spacing but only > very very little, or zero (and they kern between themselves?) Ouch. So you have vowels that you are proposing to encode as combining marks but which may not actually behave like combining marks. They may be zero-width or they maybe on a 'very very little' width (which from a layout perspective is like being a little bit pregnant), and in other situations they are decidedly not zero-width and need to kern to each other and also to following letters. Based on this description, I think I would be inclined to encode the vowels only once and as non-combining, spacing modifiers, and to rely on font layout to provide appropriate spacing and positioning relative to letters and other marks. It is easier to collapse the advance width of a spacing glyph in layout (or to substitute zero-width variants contextually, which can be classed as combining marks at the glyph level) than it is to give a width to a combining mark character and have it behave like a combining mark. Indeed, there are already layout engines that enforce a zero-width on combining mark characters for some scripts. This would also avoid the spoofing concern. But all that is based on Mark's description of the vowel behaviour and, in particular, the admission that they might be 'spacing but only very very little'. A character is either spacing or non-spacing, and if it is spacing it shouldn't be encoded as a combining mark. John Hudson -- Tiro Typeworks www.tiro.com Gulf Islands, BC tiro@tiro.com We say our understanding measures how things are, and likewise our perception, since that is how we find our way around, but in fact these do not measure. They are measured. -- Aristotle, Metaphysics From john@tiro.ca Thu Jul 12 00:19:43 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 00:22:02 -0500 (CDT) Received: from pd4mo2so.prod.shaw.ca (shawidc-mo1.cg.shawcable.net [24.71.223.10]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C5Jhs0002526 for ; Thu, 12 Jul 2007 00:19:43 -0500 Received: from pd4mr3so.prod.shaw.ca (pd4mr3so-qfe3.prod.shaw.ca [10.0.141.214]) by l-daemon (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0JL100DNWW3XPM20@l-daemon> for hebrew@unicode.org; Wed, 11 Jul 2007 23:19:09 -0600 (MDT) Received: from pn2ml1so.prod.shaw.ca ([10.0.121.145]) by pd4mr3so.prod.shaw.ca (Sun Java System Messaging Server 6.2-7.05 (built Sep 5 2006)) with ESMTP id <0JL100M97W3WTK00@pd4mr3so.prod.shaw.ca> for hebrew@unicode.org; Wed, 11 Jul 2007 23:19:09 -0600 (MDT) Received: from [192.168.1.101] ([70.66.9.120]) by l-daemon (Sun ONE Messaging Server 6.0 HotFix 1.01 (built Mar 15 2004)) with ESMTP id <0JL100GPFW3WIDB0@l-daemon> for hebrew@unicode.org; Wed, 11 Jul 2007 23:19:08 -0600 (MDT) Date: Wed, 11 Jul 2007 22:19:06 -0700 From: John Hudson Subject: [hebrew] Re: Draft of Samaritan proposal In-reply-to: <021101c7c413$6adbc940$0a01a8c0@rodage.dyndns.org> To: verdy_p@wanadoo.fr Cc: "'John Cowan'" , "'Michael Everson'" , hebrew@unicode.org Message-id: <4695B9CA.8040903@tiro.ca> MIME-version: 1.0 Content-type: text/plain; charset=UTF-8; format=flowed Content-transfer-encoding: 7bit References: <20070711152333.GF28262@mercury.ccil.org> <01d501c7c3d3$c0df3d80$0a01a8c0@rodage.dyndns.org> <20070711185549.GA24331@mercury.ccil.org> <021101c7c413$6adbc940$0a01a8c0@rodage.dyndns.org> User-Agent: Thunderbird 1.5.0.12 (Windows/20070509) X-archive-position: 3037 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: john@tiro.ca Precedence: bulk X-list: hebrew Philippe Verdy wrote: > What may be criticized in the proposal, is that the samples are dated from > the date of publication of the book that contain it, not the date when the > artistic work or text was actually created before being reproduced in that > publication. For the three sizes of Samaritan type from the Imprimerie Nationale shown in Figure 6: The corps 18 was cut in Rome for the Propaganda Fides in 1636. The corps 11 was cut in Paris in 1860, and the corps 13, 'une augmentation du corps 11', in 1867. Michael, Mark, if you are interested, I can provide specimens of the English Samaritan from Oxford (first used in 1685), which may have been cut by Peter de Walpergen although this is uncertain. I also have a small specimen of Caslon's Samaritan type, (possibly of Dutch origin) from the late 18th century. John Hudson -- Tiro Typeworks www.tiro.com Gulf Islands, BC tiro@tiro.com We say our understanding measures how things are, and likewise our perception, since that is how we find our way around, but in fact these do not measure. They are measured. -- Aristotle, Metaphysics From cowan@ccil.org Thu Jul 12 00:29:49 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 00:29:49 -0500 (CDT) Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C5TmHY008303 for ; Thu, 12 Jul 2007 00:29:48 -0500 Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from ) id 1I8rFJ-0004fW-CE; Thu, 12 Jul 2007 01:29:45 -0400 Date: Thu, 12 Jul 2007 01:29:45 -0400 To: Philippe Verdy Cc: "'John Cowan'" , "'Dean Snyder'" , "'Hebrew List'" Subject: [hebrew] Re: Draft of Samaritan proposal Message-ID: <20070712052945.GE8912@mercury.ccil.org> References: <20070711172132.510376227@smtp.johnshopkins.edu> <20070711200358.GG24331@mercury.ccil.org> <021301c7c421$539f1d50$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <021301c7c421$539f1d50$0a01a8c0@rodage.dyndns.org> User-Agent: Mutt/1.5.13 (2006-08-11) From: John Cowan X-archive-position: 3038 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@ccil.org Precedence: bulk X-list: hebrew Philippe Verdy scripsit: > The actual difference between combining characters and modifier letters is > that combining letters cannot be used reliably and meaningfully isolately > without a base letter. Quite so. > On the opposite, modifier letters, despite they are > generally used to alter some other nearby letter or grapheme cluster, Not necessarily. For example, 02B9 and 02BA, MODIFIER LETTER PRIME and DOUBLE PRIME, are used as ordinary alphabetic characters in Nenets writing. The term "modifier" should not be interpreted literally. > may be > seen isolately, and the associated letter are not modified significantly > (they are still keeping their identity phonologically, even if there's a > minor phonetic variation). Quite so, except that there may not be an associated letter. Essentially, "modifier" just means "small and caseless". > Using the term "modifier letter" for two vowels may seem a bit abusive, > because they are in fact not modifying the consonant at all; in the leading > position, they do not modify anything and stand for their own in the samples > provided. As I say, that is normal. I'll put you down as supporting plan B2. -- That you can cover for the plentiful John Cowan and often gaping errors, misconstruals, http://www.ccil.org/~cowan and disinformation in your posts cowan@ccil.org through sheer volume -- that is another misconception. --Mike to Peter From cowan@ccil.org Thu Jul 12 00:38:55 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 00:38:55 -0500 (CDT) Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C5ct31012207 for ; Thu, 12 Jul 2007 00:38:55 -0500 Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from ) id 1I8rOB-0005Iz-BM; Thu, 12 Jul 2007 01:38:55 -0400 Date: Thu, 12 Jul 2007 01:38:55 -0400 To: Dean Snyder Cc: Hebrew List Subject: [hebrew] Re: Draft of Samaritan proposal Message-ID: <20070712053855.GF8912@mercury.ccil.org> References: <20070711172132.510376227@smtp.johnshopkins.edu> <20070711200358.GG24331@mercury.ccil.org> <20070712031024.1779461035@smtp.johnshopkins.edu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20070712031024.1779461035@smtp.johnshopkins.edu> User-Agent: Mutt/1.5.13 (2006-08-11) From: John Cowan X-archive-position: 3039 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@ccil.org Precedence: bulk X-list: hebrew Dean Snyder scripsit: > >This is the same as B2: > >modifier letters are not combining characters. > > My Plan C is not the same as your Plan B2. Modifier letters are free- > standing spacing characters, something one cannot claim for the > Samaritan vowels. I do claim it, though sometimes the space disappears at the glyph level. > In Unicode parlance, Samaritan vowels are no more modifier letters than > a, e, i, o, & u are, and therefore should not be encoded as such. The difference between modifier letters (Lm) and plain caseless letters (Lo) is rarely significant. Basically, modifier letters are small, but they are spacing. > are, like Hebrew vowels, simply letters that are usually, but not > always, non-spacing, and their Unicode properties should reflect their > real-world properties. Unfortunately, the Unicode architecture can't cope with letters that are sometimes non-spacing and sometimes not (at the character level). At the glyph level, as John Hudson points out, it's easier to make spacing characters non-spacing than vice versa. -- Her he asked if O'Hare Doctor tidings sent from far John Cowan coast and she with grameful sigh him answered that http://ccil.org/~cowan O'Hare Doctor in heaven was. Sad was the man that word cowan@ccil.org to hear that him so heavied in bowels ruthful. All she there told him, ruing death for friend so young, James Joyce, Ulysses algate sore unwilling God's rightwiseness to withsay. "Oxen of the Sun" From cowan@ccil.org Thu Jul 12 00:40:26 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 00:40:26 -0500 (CDT) Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C5ePg9012740 for ; Thu, 12 Jul 2007 00:40:25 -0500 Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from ) id 1I8rPa-0005Vp-Fy; Thu, 12 Jul 2007 01:40:22 -0400 Date: Thu, 12 Jul 2007 01:40:22 -0400 To: Philippe Verdy Cc: "'Kenneth Whistler'" , cowan@ccil.org, hebrew@unicode.org Subject: [hebrew] Re: Draft of Samaritan proposal Message-ID: <20070712054022.GG8912@mercury.ccil.org> References: <200707120118.l6C1IxW06378@birdie.sybase.com> <000001c7c436$9a864f80$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000001c7c436$9a864f80$0a01a8c0@rodage.dyndns.org> User-Agent: Mutt/1.5.13 (2006-08-11) From: John Cowan X-archive-position: 3040 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@ccil.org Precedence: bulk X-list: hebrew Philippe Verdy scripsit: > So why not accepting the two proposed letter modifiers, but making them > canonically equivalent to a space holder followed by the normal combining > vowel. This ensures the identity of the two approaches, and explicitly says > that the two kinds of vowels are in fact unified (through normalization)... I think it unlikely in the extreme that any new characters with canonical decompositions will be added, though it would not actually violate any stability policy (because the canonically equivalent characters are themselves newly added). -- Cash registers don't really add and subtract; John Cowan they only grind their gears. cowan@ccil.org But then they don't really grind their gears, either; they only obey the laws of physics. --Unknown From cowan@ccil.org Thu Jul 12 00:48:48 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 00:48:48 -0500 (CDT) Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C5mm8k018958 for ; Thu, 12 Jul 2007 00:48:48 -0500 Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from ) id 1I8rXg-0006Jb-7P; Thu, 12 Jul 2007 01:48:44 -0400 Date: Thu, 12 Jul 2007 01:48:44 -0400 To: Kenneth Whistler Cc: cowan@ccil.org, hebrew@unicode.org Subject: [hebrew] Re: Draft of Samaritan proposal Message-ID: <20070712054844.GH8912@mercury.ccil.org> References: <200707120118.l6C1IxW06378@birdie.sybase.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200707120118.l6C1IxW06378@birdie.sybase.com> User-Agent: Mutt/1.5.13 (2006-08-11) From: John Cowan X-archive-position: 3041 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@ccil.org Precedence: bulk X-list: hebrew Kenneth Whistler scripsit: > SAMARITAN BLANK BASE, since its glyph would be blank, and its > function would be to serve as a base character, not as a consonant. That's fine. > Note that both U+00A0 NBSP and U+202F NNBSP are Grapheme_Base=True > precisely for use in this kind of combining mark display. However, they aren't letters. So for example if one double-clicks to select a word of Samaritan and then moves or copies it elsewhere, the NBSP or NNBSP would be left behind, not being part of the word. Bad Things then happen, probably involving the vowel sitting on a word-separating SPACE. That's why I believe a letter (category Lo) is needed. > NBSP (or NNBSP) followed by a combining mark *is* a well-formed > combining character sequence. NNBSP might be the better choice, > because these initial vowelings don't appear to require > abnormally wide spaces to sit on. I assumed that NBSP followed by a normal combining character would shrink to the width of the combining character; no? > The advantage of going with NNBSP, besides it simply displaying > correctly immediately in a properly constructed implementation, > is that it keeps the i and a vowels as a single character, > not requiring special equivalencing in the collation algorithm > or other matching algorithms, and it doesn't require arguing > for any *new* oddball character functionality -- since the > character is already encoded. Quite so. > I agree with John Hudson's general critique of heading this > direction. Functionally and historically these are > combining marks, and their display is also handled most > generically, I think, within the context of the kind of > behavior already dealt with in fonts for combining marks. John H. has backed away from this position since you posted this, at least partially. -- All Gaul is divided into three parts: the part John Cowan that cooks with lard and goose fat, the part http://ccil.org/~cowan that cooks with olive oil, and the part that cowan@ccil.org cooks with butter. -- David Chessler From rosennej@qsm.co.il Thu Jul 12 01:19:35 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 08:09:24 -0500 (CDT) Received: from mx-out.daemonmail.net (mx-out.daemonmail.net [216.104.160.39]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6C6JYAa005901 for ; Thu, 12 Jul 2007 01:19:35 -0500 Received: from localhost.daemonmail.net (localhost.daemonmail.net [127.0.0.1]) by mx-out.daemonmail.net (8.13.1/8.12.9) with SMTP id l6C6JhBE083878 for ; Wed, 11 Jul 2007 23:19:43 -0700 (PDT) (envelope-from rosennej@qsm.co.il) Received: from [89.138.11.12] (via account 11756) by mx-out.daemonmail.net with ESMTP id loL0HtR2 authenticated by POP; Wed, 11 Jul 2007 23:19:41 -0700 (PDT) From: "Jonathan Rosenne" To: Subject: [hebrew] Re: Draft of Samaritan proposal Date: Thu, 12 Jul 2007 09:19:30 +0300 Message-ID: <000001c7c44c$9c53e190$6502a8c0@QSM8> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook, Build 10.0.6822 In-Reply-To: Importance: Normal Thread-Index: AcfEAXMxLJvTEHcsQWuaqG/e649vmQASqE0g X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 X-archive-position: 3042 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rosennej@qsm.co.il Precedence: bulk X-list: hebrew > -----Original Message----- > From: hebrew-bounce@unicode.org > [mailto:hebrew-bounce@unicode.org] On Behalf Of Michael Everson > Sent: Thursday, July 12, 2007 12:08 AM > To: hebrew@unicode.org > Subject: [hebrew] Re: Draft of Samaritan proposal > > > At 18:49 +0300 2007-07-11, Jonathan Rosenne wrote: > > >Why is this on the Hebrew list? If you wish to claim that > the Samaritan > >alphabet is a distinct alphabet it should not be on this list. > > Jony, really. They live in Israel and the West Bank. They revere the > Pentateuch, even. Irrelevant. > And their script is closely related to Hebrew So what? Cyrillic is much closer to Greek. > and > the people who know and care about these things are on this list. And also on other Unicode lists. Jony > -- > Michael Everson * http://www.evertype.com > > From mark@kli.org Thu Jul 12 06:37:22 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 08:17:46 -0500 (CDT) Received: from pi.meson.org (pi.meson.org [66.134.26.207]) by unicode.org (8.13.4/8.12.11) with SMTP id l6CBbKp6015485 for ; Thu, 12 Jul 2007 06:37:21 -0500 Received: (qmail 1097 invoked from network); 12 Jul 2007 11:37:17 -0000 Received: from nagas.meson.org (HELO ?192.168.1.101?) (1000@192.168.1.101) by pi.meson.org with SMTP; 12 Jul 2007 11:37:17 -0000 Message-ID: <4696126C.4070107@kli.org> Date: Thu, 12 Jul 2007 07:37:16 -0400 From: "Mark E. Shoulson" User-Agent: Thunderbird 1.5.0.12 (X11/20070509) MIME-Version: 1.0 To: John Hudson CC: John Cowan , Michael Everson , hebrew@unicode.org Subject: [hebrew] Re: Draft of Samaritan proposal References: <20070711152333.GF28262@mercury.ccil.org> <469504BD.908@tiro.ca> <20070711202709.GH24331@mercury.ccil.org> <46954B81.2010605@tiro.ca> <46956164.1020302@kli.org> <4695B341.3090402@tiro.ca> In-Reply-To: <4695B341.3090402@tiro.ca> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-archive-position: 3043 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: mark@kli.org Precedence: bulk X-list: hebrew John Hudson wrote: > Mark E. Shoulson wrote: > >> Welll... They seem to be zero-width, but are written decidedly on the >> left "shoulder" of the letters (usually). And when there are several >> of them, they line up side by side and actually push the next letter >> away, thus acting like they are spacing. So maybe they're spacing >> but only very very little, or zero (and they kern between themselves?) > > Ouch. So you have vowels that you are proposing to encode as combining > marks but which may not actually behave like combining marks. They may > be zero-width or they maybe on a 'very very little' width (which from > a layout perspective is like being a little bit pregnant), and in > other situations they are decidedly not zero-width and need to kern to > each other and also to following letters. Part of the problem is that spacing-ness isn't as meaningful for Samaritan lettering as it is in other writing systems. Because Samaritan separates words with points and not spaces, letters may be freely spaced out within words, and frequently are (there's a whole tradition of adjusting the spaces so that similar letters or words wind up on top of each other in columns down the page, or to draw designs with the whitespace). So spacing out the word isn't as obtrusive and unnatural as it would be in English or something. ~mark From everson@evertype.com Thu Jul 12 06:59:45 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 08:18:09 -0500 (CDT) Received: from white.dnsireland.com (white.dnsireland.com [67.15.182.33]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6CBxjbA026561 for ; Thu, 12 Jul 2007 06:59:45 -0500 Received: from [88.81.100.235] (helo=[192.168.1.134]) by white.dnsireland.com with esmtpa (Exim 4.66) (envelope-from ) id 1I8xKd-0007aQ-F9 for hebrew@unicode.org; Thu, 12 Jul 2007 12:59:40 +0100 Mime-Version: 1.0 Message-Id: In-Reply-To: <4696126C.4070107@kli.org> References: <20070711152333.GF28262@mercury.ccil.org> <469504BD.908@tiro.ca> <20070711202709.GH24331@mercury.ccil.org> <46954B81.2010605@tiro.ca> <46956164.1020302@kli.org> <4695B341.3090402@tiro.ca> <4696126C.4070107@kli.org> Date: Thu, 12 Jul 2007 12:54:04 +0100 To: Hebrew Discussion From: Michael Everson Subject: [hebrew] Re: Draft of Samaritan proposal Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - white.dnsireland.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - evertype.com X-archive-position: 3044 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 07:37 -0400 2007-07-12, Mark E. Shoulson wrote: >>Ouch. So you have vowels that you are proposing to encode as >>combining marks but which may not actually behave like combining >>marks. They may be zero-width or they maybe on a 'very very little' >>width (which from a layout perspective is like being a little bit >>pregnant), and in other situations they are decidedly not >>zero-width and need to kern to each other and also to following >>letters. I'd better show a scan of the modern Pentateuch so you can see what they are doing these days. >Part of the problem is that spacing-ness isn't as meaningful for >Samaritan lettering as it is in other writing systems. Because >Samaritan separates words with points and not spaces, There are Samaritan texts which use SPACE between words, though the WORD SEPARATION POINT is used more frequently. -- Michael Everson * http://www.evertype.com From verdy_p@wanadoo.fr Thu Jul 12 07:04:44 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 08:19:07 -0500 (CDT) Received: from smtp28.orange.fr (smtp28.orange.fr [80.12.242.101]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6CC4hAp028982 for ; Thu, 12 Jul 2007 07:04:43 -0500 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2819.orange.fr (SMTP Server) with ESMTP id E4DC37000098 for ; Thu, 12 Jul 2007 14:04:37 +0200 (CEST) Received: from HARNON (APoitiers-156-1-45-176.w86-213.abo.wanadoo.fr [86.213.92.176]) by mwinf2819.orange.fr (SMTP Server) with ESMTP id 931A67000090; Thu, 12 Jul 2007 14:04:37 +0200 (CEST) X-ME-UUID: 20070712120437602.931A67000090@mwinf2819.orange.fr Reply-To: From: "Philippe Verdy" To: "'John Cowan'" , "'Kenneth Whistler'" Cc: References: <200707120118.l6C1IxW06378@birdie.sybase.com> <20070712054844.GH8912@mercury.ccil.org> Subject: [hebrew] Re: Draft of Samaritan proposal Date: Thu, 12 Jul 2007 14:04:29 +0200 Organization: Ordinateur Personnel Message-ID: <000d01c7c47c$cd239b00$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <20070712054844.GH8912@mercury.ccil.org> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfESmVUUOAAQbTiSciNOFFFQGwhZQAL5g5Q Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by unicode.org id l6CC4hAp028982 X-archive-position: 3045 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew > -----Message d'origine----- > De : hebrew-bounce@unicode.org [mailto:hebrew-bounce@unicode.org] De la > part de John Cowan > Envoyé : jeudi 12 juillet 2007 07:49 > À : Kenneth Whistler > Cc : cowan@ccil.org; hebrew@unicode.org > Objet : [hebrew] Re: Draft of Samaritan proposal > > Kenneth Whistler scripsit: > > > SAMARITAN BLANK BASE, since its glyph would be blank, and its > > function would be to serve as a base character, not as a consonant. > > That's fine. > > > Note that both U+00A0 NBSP and U+202F NNBSP are Grapheme_Base=True > > precisely for use in this kind of combining mark display. > > However, they aren't letters. So for example if one double-clicks > to select a word of Samaritan and then moves or copies it elsewhere, > the NBSP or NNBSP would be left behind, not being part of the word. > Bad Things then happen, probably involving the vowel sitting on a > word-separating SPACE. That's why I believe a letter (category Lo) > is needed. You're wrong here. A word selection must select full grapheme clusters, so it cannot leave the space holder behind. And don't forget that word selections need not only select letters. Just remember the case of apostrophes, the Catalan middle-dot. Don't forget also the case of other spacing accents that are used in other languages and that have decompositions with a whitespace holder. Anyway, In Unicode we currently have decompositions using SPACE, despite the standard prefers NBSP, despite the best holder should have been a zero-width non-breaking space (ZWNBSP) which has been deprecated (for use only as a BOM) in favour of WORD JOINER (WJ), the latter behing explicitly included as NOT breaking a word (but forgetting the implied requirement of being zero-width, something which was wrong even for ZWNBSP). Finally don't forget that other format controls are used in other languages, including the combining grapheme joiner within sequences of combining characters. So, there's no requirement to use only a "L" general category character. The L category is just the "simplest" part of the characters that may make a word. Reread the specifications about word breakers and (unbreakable) grapheme clusters. Don't infer that I am supporting any of the proposed plans. The only thing that I support is that we should avoid desunifying vowels that are perceived as identical. The solution adopted will work if the Samaritan vowels are encoded either as diacritics, i.e. combining characters (requiring some space holder if used as initials), or as letter modifiers (that are not necessarily spacing, given that they will be kerned most of the time). It just seems that, for keeping the tradition, they should be encoded as diacritics. Keeping these two vowels unified, but with a proposed solution for handling the two special cases of initial vowels, would also offer a model for some tricky cases where some author would still need to use exceptionnaly other vowels in initial positions (for transliterating imported foreign words), without requiring additional encoding: they will simply reuse the existing vowels. From everson@evertype.com Thu Jul 12 08:19:51 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 08:20:28 -0500 (CDT) Received: from white.dnsireland.com (white.dnsireland.com [67.15.182.33]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6CDJpFi025830 for ; Thu, 12 Jul 2007 08:19:51 -0500 Received: from [88.81.100.235] (helo=[192.168.1.134]) by white.dnsireland.com with esmtpa (Exim 4.66) (envelope-from ) id 1I8ya9-0005hu-MR for hebrew@unicode.org; Thu, 12 Jul 2007 14:19:46 +0100 Mime-Version: 1.0 Message-Id: In-Reply-To: <000001c7c44c$9c53e190$6502a8c0@QSM8> References: <000001c7c44c$9c53e190$6502a8c0@QSM8> Date: Thu, 12 Jul 2007 14:16:35 +0100 To: From: Michael Everson Subject: [hebrew] Re: Draft of Samaritan proposal Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - white.dnsireland.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - evertype.com X-archive-position: 3046 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 09:19 +0300 2007-07-12, Jonathan Rosenne wrote: > > Jony, really. They live in Israel and the West Bank. They revere the >> Pentateuch, even. > >Irrelevant. Good for you, Jony. You can ignore the discussion then. -- Michael Everson * http://www.evertype.com From everson@evertype.com Thu Jul 12 06:34:01 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 08:22:19 -0500 (CDT) Received: from white.dnsireland.com (white.dnsireland.com [67.15.182.33]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6CBY1Jt013720 for ; Thu, 12 Jul 2007 06:34:01 -0500 Received: from [88.81.100.235] (helo=[192.168.1.134]) by white.dnsireland.com with esmtpa (Exim 4.66) (envelope-from ) id 1I8wvi-0001KI-TC for hebrew@unicode.org; Thu, 12 Jul 2007 12:33:55 +0100 Mime-Version: 1.0 Message-Id: In-Reply-To: <200707120118.l6C1IxW06378@birdie.sybase.com> References: <200707120118.l6C1IxW06378@birdie.sybase.com> Date: Thu, 12 Jul 2007 12:32:20 +0100 To: hebrew@unicode.org From: Michael Everson Subject: [hebrew] Re: Draft of Samaritan proposal Content-Type: text/plain; charset="us-ascii" ; format="flowed" X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - white.dnsireland.com X-AntiAbuse: Original Domain - unicode.org X-AntiAbuse: Originator/Caller UID/GID - [0 0] / [47 12] X-AntiAbuse: Sender Address Domain - evertype.com X-archive-position: 3047 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: everson@evertype.com Precedence: bulk X-list: hebrew At 18:18 -0700 2007-07-11, Kenneth Whistler wrote: >John Cowan started off this discussion with: > >> But what does concern me is the double encoding of vowels. > >Me, too. Oh, great. This is going to be another struggle. >>This is not a situation like Indic, where the initial vowels are >>nothing like the vowel marks: the initial vowels are glyphically >>identical with the vowel marks, but encoded separately because >>Unicode combining marks must have a base. > >Exactly. So Plan A is an encoding hack, which doesn't follow from >the logic of the Samaritan script per se, I protest. This is not a "hack". It is based on an analysis of what is actually happening -- of the logic of the script. Samaritan uses non-spacing marks. It also uses some in initial position. The logic of the script is that they write in logical order. SHORT-A + NUN + SUKUN + GAMAN + E + DALAT = "anged". >but follows from the structure of the Unicode encoding, given a >determination that the vowel signs are combining marks (which seems >justified, given their general behavior and their relationship to >Hebrew vowelling), added to the need for Unicode combining marks to >have a base in order to result in well-formed combining character >sequences. For goodness' sake! I have been in this business for a while now, and absolutely the most frustrating thing about it is that members of the UTC changes their minds completely arbitrarily about what is and is not acceptable. How many times have you and I discussed encoding matters where we have agreed that introducing something "new" in terms of the encoding model is a bad thing, because non-linguists in the UTC are skittish about that sort of thing. Here we have a perfectly straightforward example of a script which uses combining marks which *anyone* can see are genuinely non-spacing marks. Naturally those should be encoded as combining marks. But there's a problem with two of them, which an occur word-initially. These could be handled with a different combining mark. We might have COMBINING-SHORT-A which rests atop the left side of a letter. And we might have a second COMBINING-INITIAL-SHORT-A which rests atop the right side of a letter, indicating that it precedes the consonant in reading. Thus: DALAT + INITIAL-SHORT-A + SHORT-A = "ada" NUN + INITIAL-SHORT-A + SUKUN + GAMAN + E + DALAT = "anged". But this breaks the *ordinary* logic of the standard, which is that things are encoded in phonetic order. Is there an advantage to Samaritan to have COMBINING-INITIAL-SHORT-A follow the DALAT? I don't think so. It will not be logical for input, certainly, and I don't see an advantage to Samaritans to have to have complex inputting software written for them. And it doesn't deal with the question of what to do if someone types SHORT-A and INITIAL-SHORT-A in the wrong order. (This may not be a big deal of course, but it is something to consider, since more than one vowel sign can occur with a base character in Samaritan -- normally in phonetic order.) I would consider the COMBINING-INITIAL-SHORT-A to be a "hack". But what Mark and I proposed is not one. What we proposed is in accordance with the practice of the standard already. The standard also has spacing marks which look like non-spacing ones. And not just legacy clones of ASCII. We have U+02BC MODIFIER LETTER APOSTROPHE right alongside U+0313 COMBINING COMMA ABOVE. Nothing prevents those from being used together in an orthography. We've got U+1D43 MODIFIER LETTER SMALL A and U+0363 COMBINING LATIN SMALL LETTER A. Both of those can be used together in an orthography. And they look just the same. So since we have precedent for spacing marks which look and function like non-spacing ones, Mark and I proposed to solve the problem simply: MODIFIER-SHORT-A + DALAT + AA = "ada". MODIFIER-SHORT-A + NUN + SUKUN + GAMAN + E + DALAT = "anged". This is logical. It is in accordance with other things in the standard. It doesn't invent new invisible letters and impose it on users for ordinary text. (The INVISIBLE LETTER itself was not for ordinary text. It was for special paedagogical purposes.) It doesn't interfere with inputting. So what is the problem? Is it the security issue and IDN? There are 750 Samaritans. I don't believe there is a need to have Samaritan script in IDN. >>Calling the current plan Plan A, I propose two alternative plans: >> >>Plan B1: Use combining marks only, and add a SAMARITAN ZERO-WIDTH >>CONSONANT as a new base character for use before an initial vowel. >>This would lengthen texts slightly, but would be a regular and >>familiar situation. > >This would be neither ZERO-WIDTH nor a CONSONANT, so calling it that >would be a bit of a misnomer. If something like this were to be >added to the encoding, I would suggest instead: SAMARITAN BLANK >BASE, since its glyph would be blank, and its function would be to >serve as a base character, not as a consonant. That's just invention. I think Mark and I have done a better job by not inventing something like this. >As Michael pointed out, this is getting very close again to the >concept of a generic INVISIBLE LETTER, which hasn't passed muster >yet in the UTC, although it hasn't actually been rejected as a >concept yet, either. So the problem with proposing a SAMARITAN BLANK >BASE character would be that it would immediately raise all the >issues about a generic character for this functionality. The >UTC is unlikely to want to encode another blank base character each >time this kind of display behavior shows up in a script. And I don't believe we should impose such a thing on the Samaritans. >And if advocating for Plan B1, one needs to first analyze and either >turn a thumbs up or down on the following options as well: > >Plan B1a: Use NBSP as the base. > >Plan B1b: Use NNBSP as the base. I think this is just unprecedented. You are saying that in the orthography of this language words beginning with vowels should begin with these special control characters. *That* is a hack. >Note that both U+00A0 NBSP and U+202F NNBSP are Grapheme_Base=True >precisely for use in this kind of combining mark display. This is completely wrong, in my view. The idea was never to make use of this in natural orthography. It was for displaying combining characters in isolation, as in schoolbooks. >NBSP (or NNBSP) followed by a combining mark *is* a well-formed >combining character sequence. NNBSP might be the better choice, >because these initial vowelings don't appear to require abnormally >wide spaces to sit on. I really can't believe that you are suggesting that words in Samaritan should begin with space characters. That is not what Samaritan does. >Of course, U+0020 SPACE is also a Grapheme_Base, but it has all the >wrong breaking properties for this kind of functionality. But NBSP >and NNBSP are lb=GL, which is o.k. for linebreaking. Wordbreaking >wouldn't be correct by default, but it would be straightforward to >fix for Samaritan: you would want NNBSP + NSM --> ALetter, for the >purposes of wordbreaking. I am fundamentally opposed to this idea the more I think of it. I believe Mark and I did the better analysis of the script. >The advantage of going with NNBSP, besides it simply displaying >correctly immediately in a properly constructed implementation, is >that it keeps the i and a vowels as a single character, not >requiring special equivalencing in the collation algorithm or other >matching algorithms, and it doesn't require arguing for any *new* >oddball character functionality -- since the >character is already encoded. I see no advantage at all. I'm not trying to be obstinate, either. >>Plan B2: Use modifier letters only, relying on font kerning to >>move the vowel letters slightly to the right when preceded by a >>consonant. This solution is less artificial, but more unusual; on >>the other hand, it might be more legible in environments like >>Windows, where initially there would be no support for Samaritan >>combining characters in Uniscribe. > >I agree with John Hudson's general critique of heading this >direction. Functionally and historically these are combining marks, >and their display is also handled most generically, I think, within >the context of the kind of >behavior already dealt with in fonts for combining marks. I agree as well. Samaritan vowels are quite certainly combining marks. The only problem is what to do when they occur initially. I think the only reasonable models are: MODIFIER-SHORT-A + DALAT + COMBINING-SHORT-A = "ada" or DALAT + COMBINING-INITIAL-SHORT-A + COMBINING-SHORT-A = "ada" But I believe that the latter is bad for both data storage, input, and linguistic analysis. I do not believe that NNBSP + COMBINING-SHORT-A + DALAT + COMBINING-SHORT-A = "ada" is defensible. No orthography has words which begin with such characters. >The existence of two of these vowel marks in Samaritan being >displayed ahead of the consonants, with no visible base, is not a >strong enough reason, in my reckoning, to introduce yet another >paradigm, as it were, for dealing with >combining marks whose display departs a little from regularly >stacking nonspacing marks (directly) above base letters. We haven't introduced a new paradigm. MODIFIER LETTER APOSTROPHE can easily sit at the beginning of a word -- just as the proposed SAMARITAN MODIFIER LETTER SHORT A can. A word beginning with MODIFIER LETTER APOSTROPHE may also have another letter on which COMBINING COMMA ABOVE may occur -- just as the proposed SAMARITAN VOWEL SIGN SHORT A can. >Also, from all the evidence of the figures in the proposal, >Samaritan also follows the conventions of Hebrew in normally being >written without vowel signs at all. I have two numbers of the newsletter "A.B.". Some articles are entirely unpointed, some are partially pointed. and some are fully pointed. >I wouldn't want what is clearly a secondary tier of rendering >behavior for the writing system to turn what is nominally a very >straightforward RtL script into something which we treat as overly >complicated and model-breaking for the standard. There is nothing "model-breaking" about what Mark and I proposed. Combining vowel signs are used but in those words which begin with short-a and i a spacing vowel is used. The fact that it kerns closely to the letter is not even controversial; we give rendering advice all the time. The model that Mark and I propose does not require any special rendering (apart from some side-by-side behaviour for multiple vowels following a base consonant which is familar from Greek). And whether a text is pointed or not is also irrelevant to our model. You either use the points or you don't. In fact an operation to strip out points could cause trouble with the NNBSP model since it's invisible and might be forgotten by the user (who will not consider it in the least bit natural). >I would like someone to attempt to offer a convincing argument >against Plan B1b (use NNBSP) -- that is simply encoding the vowel >signs once and specifying a convention of using the existing NNBSP >when you need to spell a Samaritan word with an initial i- or a- >vowelling. You haven't got anywhere near a convincing argument that Samaritan orthography should make use of trickery like this. Sorry. I don't like disagreeing with you. And I know I get passionate. But I stand by our original proposal. I think our encoding model best serves the behaviour of the Samaritan script and the expectations of the users. -- Michael Everson * http://www.evertype.com PS. The Markah Shameri font has three COMBINING SHORT A glyphs to the right of the zero-width boundary. These will all be used like our COMBINING SHORT A, following a base consonant in the string (there are three because they will be used with wide/medium/narrow base letters). And it has one narrowly spacing glyph to the left of the same boundary. This will be used like our MODIFIER LETTER SHORT A, preceding a base consonant in the string. From cowan@ccil.org Thu Jul 12 08:42:00 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 12 Jul 2007 08:42:00 -0500 (CDT) Received: from earth.ccil.org (earth.ccil.org [192.190.237.11]) by unicode.org (8.13.4/8.12.11) with ESMTP id l6CDfxFM006031 for ; Thu, 12 Jul 2007 08:42:00 -0500 Received: from cowan by earth.ccil.org with local (Exim 4.63) (envelope-from ) id 1I8yvc-0006ys-K9; Thu, 12 Jul 2007 09:41:56 -0400 Date: Thu, 12 Jul 2007 09:41:56 -0400 To: Philippe Verdy Cc: "'John Cowan'" , "'Kenneth Whistler'" , hebrew@unicode.org Subject: [hebrew] Re: Draft of Samaritan proposal Message-ID: <20070712134156.GG18978@mercury.ccil.org> References: <200707120118.l6C1IxW06378@birdie.sybase.com> <20070712054844.GH8912@mercury.ccil.org> <000d01c7c47c$cd239b00$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <000d01c7c47c$cd239b00$0a01a8c0@rodage.dyndns.org> User-Agent: Mutt/1.5.13 (2006-08-11) From: John Cowan X-archive-position: 3048 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: cowan@ccil.org Precedence: bulk X-list: hebrew Philippe Verdy scripsit: > You're wrong here. A word selection must select full grapheme clusters, > so it cannot leave the space holder behind. > > And don't forget that word selections need not only select letters. Just > remember the case of apostrophes, the Catalan middle-dot. Good points. However, the fewer such special cases there are, the better, especially for such a rare and obscure script. > Don't forget also