From peterkirk@qaya.org Thu Aug 2 10:56:12 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Aug 2007 11:15:33 -0500 (CDT) Received: from mail.bcisgnet.co.uk (mail.bcisgnet.co.uk [212.100.232.232]) by unicode.org (8.13.4/8.12.11) with ESMTP id l72FuBPR012781 for ; Thu, 2 Aug 2007 10:56:12 -0500 Received: from mail.bcisgnet.co.uk (mail.bcisgnet.co.uk [212.100.232.232]) by mail.bcisgnet.co.uk with ESMTP id l72Fs2EI014453 (SMTP Authenticated by TLS version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) ; Thu, 2 Aug 2007 16:54:03 +0100 Message-ID: <46B1FE4B.8090406@qaya.org> Date: Thu, 02 Aug 2007 16:54:51 +0100 From: Peter Kirk User-Agent: Thunderbird 2.0.0.5 (Windows/20070716) MIME-Version: 1.0 To: Michael Everson CC: hebrew@unicode.org Subject: [hebrew] Re: Revised Preliminary proposal for Samaritan References: <036a01c7d2cf$7ff319d0$0a01a8c0@rodage.dyndns.org> In-Reply-To: Content-Type: multipart/alternative; boundary="------------010709020400000904010300" X-Virus-Scanned: Scanned by ClamAV (http://www.clamav.net/ X-Virus-Scanned: Scanned by ClamAV (http://www.clamav.net/ X-Scanned-By: MIMEDefang 2.56 on 212.100.232.232 X-archive-position: 3213 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew This is a multi-part message in MIME format. --------------010709020400000904010300 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit On 31/07/2007 09:24, Michael Everson wrote: > At 19:31 +0200 2007-07-30, Philippe Verdy wrote: > >> And still encoding a few vowels as letter modifiers, despite the case of >> epenthetic yut is still not addressed correctly and left with >> hypothesis not >> supported by proof. > > Try reading the text of the proposal before being rude about > "unsupported hypotheses". Notice that it states that there are two > issues under study and discussion. I have to agree with Philippe here in so far as you are making a proposal based on "unsupported hypotheses". The fact that you admit this does not make the proposal any more suitable for encoding. The unsupported hypothesis I object to is: > We have seen that, typically, /when more than one mark co-occurs with > a base consonant, one of them is centred above the base letter and the > second takes its place centred between the two letters./ Actually what we have seen is that certain marks are always centred above the base letter (which may be a space) and certain other marks always take their place centred between the two letters. No exceptions! - except for your proposed rewriting of /miyya?sfå^ riy/, /'e-umer/ and /hå-ins?em/. After this misleading point you write: > If the analysis here is correct ... and continue on the assumption that it is, for example when you repeat your speculative rewriting following "As shown above ...." when in fact you have shown nothing, only hypothesised. But this analysis is not only an unsupported hypothesis but also conflicts with the evidence presented. It seems to me, from your evidence presented, that if you prefer to use the spacing variants of combining marks approach rather than the (N)NBSP + combining mark approach, you really do need a spacing variant of epenthetic yod, just for your /miyya?sfå^ riy/ case and anything similar which might occur. I would accept that there is no need for that only if you can show that your rewriting of /miyya?sfå^ riy/ etc is acceptable to the Samaritan community. -- Peter Kirk E-mail: peter@qaya.org Blog: http://www.qaya.org/blog/ Website: http://www.qaya.org/ --------------010709020400000904010300 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 31/07/2007 09:24, Michael Everson wrote:
At 19:31 +0200 2007-07-30, Philippe Verdy wrote:

And still encoding a few vowels as letter modifiers, despite the case of
epenthetic yut is still not addressed correctly and left with hypothesis not
supported by proof.

Try reading the text of the proposal before being rude about "unsupported hypotheses". Notice that it states that there are two issues under study and discussion.

I have to agree with Philippe here in so far as you are making a proposal based on "unsupported hypotheses". The fact that you admit this does not make the proposal any more suitable for encoding.

The unsupported hypothesis I object to is:
We have seen that, typically, when more than one mark co-occurs with a base consonant, one of them is centred above the base letter and the second takes its place centred between the two letters.
Actually what we have seen is that certain marks are always centred above the base letter (which may be a space) and certain other marks always take their place centred between the two letters. No exceptions! - except for your proposed rewriting of miyya˘sfåˆ riy, ’e-umer and hå-insˇem. After this misleading point you write:
If the analysis here is correct ...
and continue on the assumption that it is, for example when you repeat your speculative rewriting following "As shown above ...." when in fact you have shown nothing, only hypothesised. But this analysis is not only an unsupported hypothesis but also conflicts with the evidence presented.

It seems to me, from your evidence presented, that if you prefer to use the spacing variants of combining marks approach rather than the (N)NBSP + combining mark approach, you really do need a spacing variant of epenthetic yod, just for your miyya˘sfåˆ riy case and anything similar which might occur. I would accept that there is no need for that only if you can show that your rewriting of miyya˘sfåˆ riy etc is acceptable to the Samaritan community.

-- 
Peter Kirk
E-mail:  peter@qaya.org
Blog:    http://www.qaya.org/blog/
Website: http://www.qaya.org/
--------------010709020400000904010300-- From peterkirk@qaya.org Thu Aug 2 11:48:01 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 02 Aug 2007 12:23:06 -0500 (CDT) Received: from mail.bcisgnet.co.uk (mail.bcisgnet.co.uk [212.100.232.232]) by unicode.org (8.13.4/8.12.11) with ESMTP id l72Gm0GE001116 for ; Thu, 2 Aug 2007 11:48:01 -0500 Received: from mail.bcisgnet.co.uk (mail.bcisgnet.co.uk [212.100.232.232]) by mail.bcisgnet.co.uk with ESMTP id l72GlVta015779 (SMTP Authenticated by TLS version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT) ; Thu, 2 Aug 2007 17:47:33 +0100 Message-ID: <46B20AD3.3050105@qaya.org> Date: Thu, 02 Aug 2007 17:48:19 +0100 From: Peter Kirk User-Agent: Thunderbird 2.0.0.5 (Windows/20070716) MIME-Version: 1.0 To: Michael Everson CC: hebrew@unicode.org Subject: [hebrew] Re: Revised Preliminary proposal for Samaritan References: <036a01c7d2cf$7ff319d0$0a01a8c0@rodage.dyndns.org> <46B1FE4B.8090406@qaya.org> In-Reply-To: <46B1FE4B.8090406@qaya.org> Content-Type: multipart/alternative; boundary="------------080604080907010003080000" X-Virus-Scanned: Scanned by ClamAV (http://www.clamav.net/ X-Virus-Scanned: Scanned by ClamAV (http://www.clamav.net/ X-Scanned-By: MIMEDefang 2.56 on 212.100.232.232 X-archive-position: 3214 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: peterkirk@qaya.org Precedence: bulk X-list: hebrew This is a multi-part message in MIME format. --------------080604080907010003080000 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 8bit On 02/08/2007 16:54, Peter Kirk wrote: > ... > > The unsupported hypothesis I object to is: >> We have seen that, typically, /when more than one mark co-occurs with >> a base consonant, one of them is centred above the base letter and >> the second takes its place centred between the two letters./ Correction: the above is the incorrect observation from which the unsupported hypothesis was derived. The unsupported hypothesis is that it is possible to centre vowel marks and place marks like epenthetic yut above the spaces between letters, and that the rewriting of /miyya?sfå^ riy/ is acceptable. -- Peter Kirk E-mail: peter@qaya.org Blog: http://www.qaya.org/blog/ Website: http://www.qaya.org/ --------------080604080907010003080000 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit On 02/08/2007 16:54, Peter Kirk wrote:
...

The unsupported hypothesis I object to is:
We have seen that, typically, when more than one mark co-occurs with a base consonant, one of them is centred above the base letter and the second takes its place centred between the two letters.

Correction: the above is the incorrect observation from which the unsupported hypothesis was derived. The unsupported hypothesis is that it is possible to centre vowel marks and place marks like epenthetic yut above the spaces between letters, and that the rewriting of miyya˘sfåˆ riy is acceptable.

-- 
Peter Kirk
E-mail:  peter@qaya.org
Blog:    http://www.qaya.org/blog/
Website: http://www.qaya.org/
--------------080604080907010003080000-- From verdy_p@wanadoo.fr Tue Aug 7 14:00:20 2007 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 07 Aug 2007 14:42:20 -0500 (CDT) Received: from smtp28.orange.fr (smtp28.orange.fr [80.12.242.101]) by unicode.org (8.13.4/8.12.11) with ESMTP id l77J0JIu017850 for ; Tue, 7 Aug 2007 14:00:19 -0500 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2822.orange.fr (SMTP Server) with ESMTP id 9F38470000BE for ; Tue, 7 Aug 2007 21:00:13 +0200 (CEST) Received: from HARNON (APoitiers-156-1-128-87.w90-5.abo.wanadoo.fr [90.5.143.87]) by mwinf2822.orange.fr (SMTP Server) with ESMTP id 7C4F870000B3; Tue, 7 Aug 2007 21:00:12 +0200 (CEST) X-ME-UUID: 20070807190012509.7C4F870000B3@mwinf2822.orange.fr Reply-To: From: "Philippe Verdy" To: "'Peter Kirk'" , "'Michael Everson'" Cc: References: <036a01c7d2cf$7ff319d0$0a01a8c0@rodage.dyndns.org> <46B1FE4B.8090406@qaya.org> <46B20AD3.3050105@qaya.org> Subject: [hebrew] Re: Revised Preliminary proposal for Samaritan Date: Tue, 7 Aug 2007 20:58:03 +0200 Organization: Ordinateur Personnel Message-ID: <04ca01c7d924$e245e530$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_04CB_01C7D935.A5CEB530" X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <46B20AD3.3050105@qaya.org> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfVK3UzACVD2AJXRoC8b4Ah26XbxQD9UmVQ X-archive-position: 3215 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew This is a multi-part message in MIME format. ------=_NextPart_000_04CB_01C7D935.A5CEB530 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Exactly ! And I see other reasons why the approach with an empty base = letter would be useful too: * for collation: the spacing vowels should be ignored at the primary = level, like all other vowels. * For conversion from texts written with vowels to texts written without = them. In that case, the centered spacing epenthetic yut and leading = vowels would disappear as well. The implicit empty base letter would = have a null collation, completely ignorable in all levels (except the = last implicit level comparing codepoints); =20 The Samaritan empty base letter approach (in Hebrew it would be a = visible Alef) is interesting: it should be a special letter (Lo), with = right-to-left direction, and ignorable for collation purpose =20 If this makes sense for Samaritan, it could be assigned the same primary = collation level as Alef, if this makes sense for Samaritan, with a = secondary difference marking the fact that it is not rendered itself, if = the initial vowels must sort before or with Alef. Depending on = application, it could still be tailored this way or tailored to be = ignorable so that words will sort according to the base letter following = the initial vowels. =20 It can be a valid holder for any Samaritan diacritic (including the = proposed ones); if you still want to encode the spacing vowels, you need = to encode the spacing epenthetic yut, and all of them can be given a = canonical equivalence with the Samaritan empty base letter followed by = the normal vowel diacritics. And when forming compound words, initial = vowels are kept associated with the empty base letter. =20 In other words, all happens as if we did not really need any of the = spacing vowels. If they are provided, it=E2=80=99s only to help the = implementation and use with simple keyboard drivers. But I expect that a = driver would assign for example one key for a vowel diacritic, and = pressing the same key with AltGr would preferably generate the canonical = equivalent (i.e. the precomposed spacing vowel) of the empty base letter = (or alternate Alef that could be composed by AltGr+A (i.e. on the key = assigned for Alef), or the sequence Empty base letter +vowel. For users = of text editors, pressing the empty base letter alone could display a = glyph for Alef in a dotted square if the Alef interpretation is kept. =20 Keyboard dsigns are then more logical and elegant this way. And all = vowels are supported. =20 If the interpretation as an alternate Alef is wrong, the empty base = letter would become a format control with an optional rendering within = text editors (similar to the Khmer =E2=80=9Ccoeng=E2=80=9D sign = U+17D2=E2=80=A6) but this will not change the fact that the encoded = spacing letters would be canonically equivalent to this format control = plus the normal diacritic. The format control would have naturally a = null collation by default (but still tailorable=E2=80=A6) =20 I don=E2=80=99t really like the approach with (N)NBSP+vowel due to the = fact that (N)NBSP alone are spacing and not ignorable. But a separate = encoding as a format control (collation-ignorable and normally not = spacing except in editors) would respect the default grapheme cluster = boundaries (like for the coeng sign in Khmer, that is entered and = encoded before the letter it modifies). =20 All these are ideas showing that the Samaritan script could be easily = supported this way and cleanly, without creating an artificial = difference between initial vowels and other vowel signs. The canonical = equivalence would enforce this interpretation (and no door is = immediately closed for possible future IDN implementations if it is = desired later)=E2=80=A6 =20 The mapping with Hebrew (for transliteration purpose) would be = simplified as well (the empty base letter or format control would become = an Hebrew Alef, or another suitable baseletter depending on the = Samaritan vowel sign encoded with it). =20 _____ =20 De : hebrew-bounce@unicode.org [mailto:hebrew-bounce@unicode.org] De la = part de Peter Kirk Envoy=C3=A9 : jeudi 2 ao=C3=BBt 2007 18:48 =C3=80 : Michael Everson Cc : hebrew@unicode.org Objet : [hebrew] Re: Revised Preliminary proposal for Samaritan =20 On 02/08/2007 16:54, Peter Kirk wrote:=20 ... The unsupported hypothesis I object to is:=20 We have seen that, typically, when more than one mark co-occurs with a = base consonant, one of them is centred above the base letter and the = second takes its place centred between the two letters. Correction: the above is the incorrect observation from which the = unsupported hypothesis was derived. The unsupported hypothesis is that = it is possible to centre vowel marks and place marks like epenthetic yut = above the spaces between letters, and that the rewriting of = miyya=CB=98sf=C3=A5=CB=86 riy is acceptable. --=20 Peter Kirk E-mail: peter@qaya.org Blog: http://www.qaya.org/blog/ Website: http://www.qaya.org/ ------=_NextPart_000_04CB_01C7D935.A5CEB530 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Exactly ! = And I see other reasons why the approach with an empty base letter would be useful = too:

  • for collation: the spacing vowels should be ignored at the = primary level, like all other vowels.
  • For conversion from texts written with vowels to texts = written without them. In that case, the centered spacing epenthetic yut and leading vowels would disappear as well. The implicit empty base = letter would have a null collation, completely ignorable in all levels = (except the last implicit level comparing = codepoints);

 =

The Samaritan = empty base letter approach (in Hebrew it would be a visible Alef) is interesting: = it should be a special letter (Lo), with right-to-left direction, and ignorable = for collation purpose

 =

If this makes = sense for Samaritan, it could be assigned the same primary collation level as = Alef, if this makes sense for Samaritan, with a secondary difference marking the = fact that it is not rendered itself, if the initial vowels must sort before = or with Alef. Depending on application, it could still be tailored this way or = tailored to be ignorable so that words will sort according to the base letter following = the initial vowels.

 =

It can be a = valid holder for any Samaritan diacritic (including the proposed ones); if you still = want to encode the spacing vowels, you need to encode the spacing epenthetic = yut, and all of them can be given a canonical equivalence with the Samaritan empty = base letter followed by the normal vowel diacritics. And when forming = compound words, initial vowels are kept associated with the empty base = letter.

 =

In other words, = all happens as if we did not really need any of the spacing vowels. If they = are provided, it=E2=80=99s only to help the implementation and use with = simple keyboard drivers. But I expect that a driver would assign for example = one key for a vowel diacritic, and pressing the same key with AltGr would = preferably generate the canonical equivalent (i.e. the precomposed spacing vowel) = of the empty base letter (or alternate Alef that could be composed by AltGr+A = (i.e. on the key assigned for Alef), or the sequence Empty base letter +vowel. = For users of text editors, pressing the empty base letter alone could display a = glyph for Alef in a dotted square if the Alef interpretation is = kept.

 =

Keyboard dsigns = are then more logical and elegant this way. And all vowels are = supported.

 =

If the = interpretation as an alternate Alef is wrong, the empty base letter would become a format = control with an optional rendering within text editors (similar to the Khmer = =E2=80=9Ccoeng=E2=80=9D sign U+17D2=E2=80=A6) but this will not change the fact that the encoded = spacing letters would be canonically equivalent to this format control plus the = normal diacritic. The format control would have naturally a null collation by = default (but still tailorable=E2=80=A6)

 =

I don=E2=80=99t = really like the approach with (N)NBSP+vowel due to the fact that (N)NBSP alone are = spacing and not ignorable. But a separate encoding as a format control = (collation-ignorable and normally not spacing except in editors) would respect the default = grapheme cluster boundaries (like for the coeng sign in Khmer, that is entered and = encoded before the letter it modifies).

 =

All these are = ideas showing that the Samaritan script could be easily supported this way and cleanly, without creating an artificial difference between initial = vowels and other vowel signs. The canonical equivalence would enforce this = interpretation (and no door is immediately closed for possible future IDN = implementations if it is desired later)=E2=80=A6

 =

The mapping with = Hebrew (for transliteration purpose) would be simplified as well (the empty base = letter or format control would become an Hebrew Alef, or another suitable baseletter = depending on the Samaritan vowel sign encoded with = it).

 =


De : hebrew-bounce@unicode.org = [mailto:hebrew-bounce@unicode.org] De la part de Peter = Kirk
Envoy=C3=A9 : jeudi = 2 ao=C3=BBt 2007 18:48
=C3=80 : Michael = Everson
Cc : = hebrew@unicode.org
Objet : [hebrew] Re: = Revised Preliminary proposal for Samaritan

 

On 02/08/2007 16:54, Peter Kirk wrote: =

...

The unsupported hypothesis I object to is:

We have seen that, typically, when more than one mark co-occurs with a = base consonant, one of them is centred above the base letter and the second = takes its place centred between the two = letters.


Correction: the above is the incorrect observation from which the = unsupported hypothesis was derived. The unsupported hypothesis is that it is = possible to centre vowel marks and place marks like epenthetic yut above the spaces = between letters, and that the rewriting of miyya=CB=98sf=C3=A5=CB=86 riy is acceptable.


-- 
Peter =
Kirk
E-mail:=C2=A0 peter@qaya.org
Blog:=C2=A0=C2=A0=C2=A0 http://www.qaya.org/blog/
Website: http://www.qaya.org/<=
/font>
------=_NextPart_000_04CB_01C7D935.A5CEB530-- From verdy_p@wanadoo.fr Tue Aug 7 17:35:06 2007 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 07 Aug 2007 19:03:22 -0500 (CDT) Received: from smtp28.orange.fr (smtp28.orange.fr [80.12.242.101]) by unicode.org (8.13.4/8.12.11) with ESMTP id l77MZ5Ss030228 for ; Tue, 7 Aug 2007 17:35:05 -0500 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2822.orange.fr (SMTP Server) with ESMTP id D1BC5700009D for ; Wed, 8 Aug 2007 00:34:59 +0200 (CEST) Received: from HARNON (APoitiers-156-1-128-87.w90-5.abo.wanadoo.fr [90.5.143.87]) by mwinf2822.orange.fr (SMTP Server) with ESMTP id 7224D7000099; Wed, 8 Aug 2007 00:34:58 +0200 (CEST) X-ME-UUID: 20070807223458467.7224D7000099@mwinf2822.orange.fr Reply-To: From: "Philippe Verdy" To: , "'Peter Kirk'" , "'Michael Everson'" Cc: References: <036a01c7d2cf$7ff319d0$0a01a8c0@rodage.dyndns.org> <46B1FE4B.8090406@qaya.org> <46B20AD3.3050105@qaya.org> <04ca01c7d924$e245e530$0a01a8c0@rodage.dyndns.org> Subject: [hebrew] Re: Revised Preliminary proposal for Samaritan Date: Wed, 8 Aug 2007 00:32:49 +0200 Organization: Ordinateur Personnel Message-ID: <04db01c7d942$e2d81770$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_04DC_01C7D953.A660E770" X-Mailer: Microsoft Office Outlook 11 In-Reply-To: <04ca01c7d924$e245e530$0a01a8c0@rodage.dyndns.org> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfVK3UzACVD2AJXRoC8b4Ah26XbxQD9UmVQAAV/jjA= X-archive-position: 3216 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew This is a multi-part message in MIME format. ------=_NextPart_000_04DC_01C7D953.A660E770 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I see other applications for this representation as a combination of an = empty base letter plus a vowel diacritic, and here again this is related = to collation. =20 According to the reference documents shown in the proposal, we can see = that initial vowels, when they are not marked using spacing vowels, will = be replaced by base consonants. Here again, we need a decomposition = during collation to exhibit the effective vowel, and match it with an = approximant base consonant (like alef, yod, het=E2=80=A6) and this will = be useful when looking for matches with the Hebrew orArabic scripts or = transliterations of Samaritan in those scripts. =20 If initial vowels are ignored at the primary collation level when = sorting, but are used in the secondary level, I have the feeling that = they should be treated like if they were diacritics for the previous = letter. This is possible only if the spacing vowels are considered = composed of a ignorable empty base letter, and a combining vowel = diacritic. Depending on the kind of application, the collation may make = the empty base letter fully ignorable, so that the spacing vowels after = another grapheme cluster with a base consonant, is considered as a = single cluster sorted with this previous base letter, but after all = compositions using that previous base letter and a possible combining = vowel; This will respect the boundaries between the original clusters, = particularly in the case of syllable initials in the middle of a = compound word. =20 One may argue that this could be done as well with the encoding of = spacing vowels as letter modifiers only, but given the various usages in = tailored collations, it seems that the decomposition will be present in = almost all cases, whether they sort with the following consonant and are = considered as diacritics of it during collation, or they are considered = as special forms of another implicit variable base consonant depending = on the vowel diacritic, or considered as diacritics of the base letter = before them. =20 So, it should be interesting to see exactly how Samaritans are sorting = their dictionaries when there are spacing initial vowels (I include here = the case of the interletter epenthetic yut, not centered above a base = consonant but between two of them, and that I consider as being also a = case of a regular spacing vowel, like the proposed initial vowels). =20 The considerations in the proposal on the fact that letter modifiers = have been used in other scripts is irrelevant here, because it does not = correlate directly to Samaritan. And we have other options for handling = the case of initial vowels. =20 In fact, whatever was done in Hebrew, it could have followed as well my = analysis and would have become much more logical if the tricky cases had = been solved like this (but I won=E2=80=99t reform the Hebrew script and = its numerous caveats, which come from the round-trip compatibility with = legacy encodings). =20 For me, it is much enough to see the existing encoding of the Khmer = =E2=80=9Ccoeng=E2=80=9D sign as a proof that what is proposed here is = not a new thing in Unicode. =20 It could have been done as well for Khmer lunar date symbols (and in = fact a base symbol was encoded for it as a format control, but this was = discouraged when the lunar date symbols were finally encoded later = without any decomposition, although this could have been done to = preserve the equivalences). But even if this was discouraged, and no = decomposition was explicitly made, khmer collation need to treat the = lunar date symbols as if they were composed when creating collation = keys=E2=80=A6 If Unicode had been logical, the lunar date symbols should = have been given canonical decompositions (not excluded from = compositions, so that the precomposed symbols are the normal = representation in most cases). =20 What I am suggesting here, is that nothing is new in such proposal of = considering the initial vowels and interletter epenthetic yut as being = effectively composed with a base letter modified by a regular combining = diacritic. I admit that the case of Khmer coeng is a bit different = however, because it does not combine with a combining diacritic after = it, but with a base letter after it. But the important thing here is = that the combination is still creating a larger unbreakable default = grapheme cluster, treated as such in collation too ! =20 My concept here is different from the use of (N)NBSP with a vowel, which = will remain only as a way to exhibit isolated vowel diacritics without = any semantic meaning. We know that this form is a trick just needed for = rendering, but it does not create a new word, and the (N)NBSP have the = incorrect properties which cause various complications. =20 What I really propose is to encode the empty base letter as a control = format (like Khmer coeng), that can be shown as spacing in isolation = within text editors (using for example an alef within a dotted square) = before the effective combining vowel is added, but the pair then = combines graphically into a single spacing glyph (like Khmer coeng) = where the dotted square will disappear. =20 Then we make the proposed initial vowels as being canonically equivalent = (with a canonical decomposition NOT excluded from composition) to this = base format control. =20 Then, during processes that ignore vowel diacritics, they will also = ignore this format control, so it will just remain the base letter = (without any (N)NBSP in the middle), or, depending on the application, = the grapheme cluster made of this format control and the initial vowel = represented like this will be mappable to a base consonant. This will be = useful for correct collation of texts encoded with vowels and texts = encoded without vowels but were some of them have been converted to = consonants (as seen in the samples of the proposal). =20 For Samaritan linguists, that want the precision of vowels, they will = be able to treat combining vowels exactly like initial vowels, and will = be able to sort them correctly, using several collation schemes. =20 For renderers, it will not be a complication (the fact that this empty = base character is a format control and not a regular letter does not = change the things, as they will manage it like all other base letters, = and will place the diacritics correctly on it, they will adjust = accordingly the display width of the cluster, and they will choose to = map it to a visible special glyph for editors when it appears in = isolation, using the existing substitution rules. =20 So my proposal would be: * Most probably, encode the empty base before the base letters (move = them one position later) within the Samaritan block (so that it sorts = correctly at the codepoint level). * Name it something like =E2=80=9CSAMARITAN VOWEL HOLDER MARK=E2=80=9D * Move the existing proposed letter modifiers in the chart just after it = but before the base consonants. Encode combining vowels after the = consonants. * Encode your currently proposed initial vowels (modifier letters) as = canonically equivalent to this base character plus the normal combining = vowel and don=E2=80=99t exclude them from composition. * Possibly add a syllable initial epenthetic yut (as a modifier letter = too), and treat it like the other proposed initial vowels. * Give to the Samaritan vowel holder mark a GENERAL CATEGORY = =E2=80=9CCf=E2=80=9D (format control) like the Khmer Coeng sign * In charts, show a special =E2=80=9Carbitrary=E2=80=9D REPRESENTATIVE = GLYPH (like for Khmer coeng) using for example a dotted square = containing a small Samaritan alef. * Give it the Samaritan SCRIPT PROPERTY (to avoid script breaks) * Give it a null COMBINING CLASS (like other format controls); the = sequence made with it followed by a combining mark will not be = defective. * Give it a RTL direction property like letters (to avoid BiDi caveats = and complications) * Give it a null COLLATION key for the first level in the DUCET (let = tailored collations decide if they need to map this holder mark = +combining vowel as another base consonant for compatibility with = Hebrew, unless there are arguments that such tailoring would be almost = always needed and another option should be chosen by default : ask to = Samaritan users about how they conventionally sort words with initial = vowels, and how they handle the case of middle initials and epenthetic = yut between two syllables) * Give it a null COLLATION key for a second level (or third level but = Samaritan is unicameral and so the second level in this case would be = all zeroes), unlike the combining vowels that will have their non null = collation value for this level (and the same collation value for the = other proposed initial vowels) * For line-breaking, word-breaking and cluster boundary properties, make = it non breaking between it and a Samaritan letter (I think that the = current breaking algorithm already specifies this behaviour between a = format control and another letter), it will still break between it and a = whitespace or another explicit line break. =20 With the encoded precomposed initial vowels, it will still be easy to = create a suitable keyboard (AltGr + the key assigned to the combining = vowel will generate the precomposed initial vowel, OR the holder mark = plus the normal vowel ; both sequences will be treated as canonically = equivalent). Some text editors will permit decomposing it to remove the = vowel but still keeping the base holder mark which will become suddenly = visible in absence of a suitable Samaritan combining diacritic, but this = visibility will not happen if there are other non-vowel combining = diacritics for special cases in epigraphic texts. In browsers an = isolated base holder mark could be visible or invisible (rendered like a = zero-width non-breaking space) =20 Every Samaritan combining diacritic will be suitable to combine with the = holder mark, not just combining vowels. =20 The holder mark may have a zero width advance in some cases, when it = appears as the initial of a syllable in the middle of the word where the = previous syllable does not have its own vowel diacritic.But by default, = in renderers it will have a default width from the size of the special = glyph used when it appears in isolation, and this width will be adjusted = accordingly when it combines with other diacritics and may become = zero-width in some cases where the vowel mark can fit centered on the = middle of the gap between two base letters. =20 Renderers will be capable of rendering the encoded precombined initial = vowels exactly the same way (they will just start from a default width = of the diacritic, and will reduce it if necessary if the diacritic can = be kerned within the gap between base letters. But for Unicode encoding = purpose, this does not imply anything about the fact that the character = will be visible or not or zero-width or not. Conforming renderers will = not be required to show the arbitrary special glyph for the case where = it appears in isolation, thanks to the encoding as a control and not as = a letter. =20 =20 _____ =20 De : hebrew-bounce@unicode.org [mailto:hebrew-bounce@unicode.org] De la = part de Philippe Verdy Envoy=C3=A9 : mardi 7 ao=C3=BBt 2007 20:58 =C3=80 : 'Peter Kirk'; 'Michael Everson' Cc : hebrew@unicode.org Objet : [hebrew] Re: Revised Preliminary proposal for Samaritan =20 Exactly ! And I see other reasons why the approach with an empty base = letter would be useful too: * for collation: the spacing vowels should be ignored at the primary = level, like all other vowels. * For conversion from texts written with vowels to texts written without = them. In that case, the centered spacing epenthetic yut and leading = vowels would disappear as well. The implicit empty base letter would = have a null collation, completely ignorable in all levels (except the = last implicit level comparing codepoints); =20 _____ =20 De : hebrew-bounce@unicode.org [mailto:hebrew-bounce@unicode.org] De la = part de Peter Kirk Envoy=C3=A9 : jeudi 2 ao=C3=BBt 2007 18:48 =C3=80 : Michael Everson Cc : hebrew@unicode.org Objet : [hebrew] Re: Revised Preliminary proposal for Samaritan =20 On 02/08/2007 16:54, Peter Kirk wrote:=20 ... The unsupported hypothesis I object to is:=20 We have seen that, typically, when more than one mark co-occurs with a = base consonant, one of them is centred above the base letter and the = second takes its place centred between the two letters. Correction: the above is the incorrect observation from which the = unsupported hypothesis was derived. The unsupported hypothesis is that = it is possible to centre vowel marks and place marks like epenthetic yut = above the spaces between letters, and that the rewriting of = miyya=CB=98sf=C3=A5=CB=86 riy is acceptable. --=20 Peter Kirk E-mail: peter@qaya.org Blog: http://www.qaya.org/blog/ Website: http://www.qaya.org/ ------=_NextPart_000_04DC_01C7D953.A660E770 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

I see other = applications for this representation as a combination of an empty base letter plus a = vowel diacritic, and here again this is related to = collation.

 =

According to the reference documents shown in the proposal, we can see that initial = vowels, when they are not marked using spacing vowels, will be replaced by base = consonants. Here again, we need a decomposition during collation to exhibit the effective = vowel, and match it with an approximant base consonant (like alef, yod, = het=E2=80=A6) and this will be useful when looking for matches with the Hebrew orArabic scripts = or transliterations of Samaritan in those = scripts.

 =

If initial = vowels are ignored at the primary collation level when sorting, but are used in the = secondary level, I have the feeling that they should be treated like if they were = diacritics for the previous letter. This is possible only if the spacing vowels are = considered composed of a ignorable empty base letter, and a combining vowel = diacritic. Depending on the kind of application, the collation may make the empty base letter = fully ignorable, so that the spacing vowels after another grapheme cluster = with a base consonant, is considered as a single cluster sorted with this previous = base letter, but after all compositions using that previous base letter and a possible combining vowel; This will respect the boundaries between the = original clusters, particularly in the case of syllable initials in the middle of = a compound word.

 =

One may argue = that this could be done as well with the encoding of spacing vowels as letter = modifiers only, but given the various usages in tailored collations, it seems that = the decomposition will be present in almost all cases, whether they sort = with the following consonant and are considered as diacritics of it during = collation, or they are considered as special forms of another implicit variable base = consonant depending on the vowel diacritic, or considered as diacritics of the = base letter before them.

 =

So, it should be interesting to see exactly how Samaritans are sorting their dictionaries = when there are spacing initial vowels (I include here the case of the = interletter epenthetic yut, not centered above a base consonant but between two of = them, and that I consider as being also a case of a regular spacing vowel, like = the proposed initial vowels).

 =

The = considerations in the proposal on the fact that letter modifiers have been used in other = scripts is irrelevant here, because it does not correlate directly to Samaritan. = And we have other options for handling the case of initial = vowels.

 =

In fact, = whatever was done in Hebrew, it could have followed as well my analysis and would = have become much more logical if the tricky cases had been solved like this = (but I won=E2=80=99t reform the Hebrew script and its numerous caveats, which come from the = round-trip compatibility with legacy encodings).

 =

For me, it is = much enough to see the existing encoding of the Khmer =E2=80=9Ccoeng=E2=80=9D sign = as a proof that what is proposed here is not a new thing in = Unicode.

 =

It could have = been done as well for Khmer lunar date symbols (and in fact a base symbol was = encoded for it as a format control, but this was discouraged when the lunar date = symbols were finally encoded later without any decomposition, although this = could have been done to preserve the equivalences). But even if this was = discouraged, and no decomposition was explicitly made, khmer collation need to treat the = lunar date symbols as if they were composed when creating collation = keys=E2=80=A6 If Unicode had been logical, the lunar date symbols should have been given canonical decompositions (not excluded from compositions, so that the precomposed = symbols are the normal representation in most = cases).

 =

What I am = suggesting here, is that nothing is new in such proposal of considering the initial = vowels and interletter epenthetic yut as being effectively composed with a base = letter modified by a regular combining diacritic. I admit that the case of = Khmer coeng is a bit different however, because it does not combine with a combining diacritic after it, but with a base letter after it. But the important = thing here is that the combination is still creating a larger unbreakable = default grapheme cluster, treated as such in collation too = !

 =

My concept here = is different from the use of (N)NBSP with a vowel, which will remain only = as a way to exhibit isolated vowel diacritics without any semantic meaning. We = know that this form is a trick just needed for rendering, but it does not create a = new word, and the (N)NBSP have the incorrect properties which cause various complications.

 =

What I really = propose is to encode the empty base letter as a control format (like Khmer coeng), = that can be shown as spacing in isolation within text editors (using for = example an alef within a dotted square) before the effective combining vowel is = added, but the pair then combines graphically into a single spacing glyph (like = Khmer coeng) where the dotted square will = disappear.

 =

Then we make the = proposed initial vowels as being canonically equivalent (with a canonical = decomposition NOT excluded from composition) to this base format = control.

 =

Then, during = processes that ignore vowel diacritics, they will also ignore this format control, = so it will just remain the base letter (without any (N)NBSP in the middle), = or, depending on the application, the grapheme cluster made of this format = control and the initial vowel represented like this will be mappable to a base = consonant. This will be useful for correct collation of texts encoded with vowels = and texts encoded without vowels but were some of them have been converted = to consonants (as seen in the samples of the proposal).

 =

For=C2=A0 = Samaritan linguists, that want the precision of vowels, they will be able to treat combining = vowels exactly like initial vowels, and will be able to sort them correctly, = using several collation schemes.

 =

For renderers, = it will not be a complication (the fact that this empty base character is a = format control and not a regular letter does not change the things, as they = will manage it like all other base letters, and will place the diacritics = correctly on it, they will adjust accordingly the display width of the cluster, = and they will choose to map it to a visible special glyph for editors when it = appears in isolation, using the existing substitution = rules.

 =

So my proposal = would be:

  • Most probably, encode the empty base before the base letters = (move them one position later) within the Samaritan block (so that it = sorts correctly at the codepoint level).
  • Name it something like =E2=80=9CSAMARITAN VOWEL HOLDER = MARK=E2=80=9D
  • Move the existing proposed letter modifiers in the chart = just after it but before the base consonants. Encode combining vowels after = the consonants.
  • Encode your currently proposed initial vowels (modifier = letters) as canonically equivalent to this base character plus the normal = combining vowel and don=E2=80=99t exclude them from = composition.
  • Possibly add a syllable initial epenthetic yut (as a = modifier letter too), and treat it like the other proposed initial = vowels.
  • Give to the Samaritan vowel holder mark a GENERAL CATEGORY = =E2=80=9CCf=E2=80=9D (format control) like the Khmer Coeng sign
  • In charts, show a special =E2=80=9Carbitrary=E2=80=9D = REPRESENTATIVE GLYPH (like for Khmer coeng) using for example a dotted square containing a = small Samaritan alef.
  • Give it the Samaritan SCRIPT PROPERTY (to avoid script = breaks)
  • Give it a null COMBINING CLASS (like other format controls); = the sequence made with it followed by a combining mark will not be = defective.
  • Give it a RTL direction property like letters (to avoid BiDi caveats and complications)
  • Give it a null COLLATION key for the first level in the = DUCET (let tailored collations decide if they need to map this holder mark = +combining vowel as another base consonant for compatibility with Hebrew, = unless there are arguments that such tailoring would be almost always = needed and another option should be chosen by default : ask to Samaritan users = about how they conventionally sort words with initial vowels, and how = they handle the case of middle initials and epenthetic yut between two = syllables)
  • Give it a null COLLATION key for a second level (or third = level but Samaritan is unicameral and so the second level in this case would = be all zeroes), unlike the combining vowels that will have their non null collation value for this level (and the same collation value for = the other proposed initial vowels)
  • For line-breaking, word-breaking and cluster boundary = properties, make it non breaking between it and a Samaritan letter (I think = that the current breaking algorithm already specifies this behaviour between = a format control and another letter), it will still break between it = and a whitespace or another explicit line break.

 =

With the encoded precomposed initial vowels, it will still be easy to create a suitable = keyboard (AltGr + the key assigned to the combining vowel will generate the = precomposed initial vowel, OR the holder mark plus the normal vowel ; both sequences = will be treated as canonically equivalent). Some text editors will permit = decomposing it to remove the vowel but still keeping the base holder mark which will = become suddenly visible in absence of a suitable Samaritan combining diacritic, = but this visibility will not happen if there are other non-vowel combining diacritics for special cases in epigraphic texts. In browsers an = isolated base holder mark could be visible or invisible (rendered like a zero-width = non-breaking space)

 =

Every Samaritan = combining diacritic will be suitable to combine with the holder mark, not just = combining vowels.

 =

The holder mark = may have a zero width advance in some cases, when it appears as the initial of a syllable in the middle of the word where the previous syllable does not = have its own vowel diacritic.But by default, in renderers it will have a = default width from the size of the special glyph used when it appears in = isolation, and this width will be adjusted accordingly when it combines with other = diacritics and may become zero-width in some cases where the vowel mark can fit = centered on the middle of the gap between two base = letters.

 =

Renderers will = be capable of rendering the encoded precombined initial vowels exactly the same way = (they will just start from a default width of the diacritic, and will reduce = it if necessary if the diacritic can be kerned within the gap between base letters. But = for Unicode encoding purpose, this does not imply anything about the fact that the character will be visible or not or zero-width or not. Conforming = renderers will not be required to show the arbitrary special glyph for the case = where it appears in isolation, thanks to the encoding as a control and not as a = letter.

 =

 =


De : hebrew-bounce@unicode.org = [mailto:hebrew-bounce@unicode.org] De la part de Philippe = Verdy
Envoy=C3=A9 : mardi = 7 ao=C3=BBt 2007 20:58
=C3=80 : 'Peter = Kirk'; 'Michael Everson'
Cc : = hebrew@unicode.org
Objet : [hebrew] Re: = Revised Preliminary proposal for Samaritan

 

Exactly ! = And I see other reasons why the approach with an empty base letter would be useful = too:

  • for collation: the spacing vowels should be ignored at the = primary level, like all other vowels.
  • For conversion from texts written with vowels to texts = written without them. In that case, the centered spacing epenthetic yut and leading vowels would disappear as well. The implicit empty base = letter would have a null collation, completely ignorable in all levels = (except the last implicit level comparing = codepoints);

 =


De : hebrew-bounce@unicode.org = [mailto:hebrew-bounce@unicode.org] De la part de Peter = Kirk
Envoy=C3=A9 : jeudi = 2 ao=C3=BBt 2007 18:48
=C3=80 : Michael = Everson
Cc : = hebrew@unicode.org
Objet : [hebrew] Re: = Revised Preliminary proposal for Samaritan

 

On 02/08/2007 16:54, Peter Kirk wrote: =

...

The unsupported hypothesis I object to is:

We have seen that, typically, when more than one mark co-occurs with a = base consonant, one of them is centred above the base letter and the second takes its = place centred between the two letters.


Correction: the above is the incorrect observation from which the = unsupported hypothesis was derived. The unsupported hypothesis is that it is = possible to centre vowel marks and place marks like epenthetic yut above the spaces = between letters, and that the rewriting of miyya=CB=98sf=C3=A5=CB=86 riy is acceptable.

-- 
Peter =
Kirk
E-mail:  peter@qaya.org
Blog:    http://www.qaya.org/blog/
Website: http://www.qaya.org/<=
/font>
------=_NextPart_000_04DC_01C7D953.A660E770-- From verdy_p@wanadoo.fr Tue Aug 7 18:02:06 2007 Received: with ECARTIS (v1.0.0; list hebrew); Tue, 07 Aug 2007 19:17:58 -0500 (CDT) Received: from smtp28.orange.fr (smtp28.orange.fr [80.12.242.99]) by unicode.org (8.13.4/8.12.11) with ESMTP id l77N25nK001472 for ; Tue, 7 Aug 2007 18:02:06 -0500 Received: from me-wanadoo.net (localhost [127.0.0.1]) by mwinf2803.orange.fr (SMTP Server) with ESMTP id 4A4848000086 for ; Wed, 8 Aug 2007 01:02:00 +0200 (CEST) Received: from HARNON (APoitiers-156-1-128-87.w90-5.abo.wanadoo.fr [90.5.143.87]) by mwinf2803.orange.fr (SMTP Server) with ESMTP id 69A3C8000085; Wed, 8 Aug 2007 01:01:59 +0200 (CEST) X-ME-UUID: 20070807230159432.69A3C8000085@mwinf2803.orange.fr Reply-To: From: "Philippe Verdy" To: , "'Peter Kirk'" , "'Michael Everson'" Cc: References: <036a01c7d2cf$7ff319d0$0a01a8c0@rodage.dyndns.org> <46B1FE4B.8090406@qaya.org> <46B20AD3.3050105@qaya.org> <04ca01c7d924$e245e530$0a01a8c0@rodage.dyndns.org> Subject: [hebrew] Re: Revised Preliminary proposal for Samaritan Date: Wed, 8 Aug 2007 00:59:50 +0200 Organization: Ordinateur Personnel Message-ID: <04e301c7d946$a8eed130$0a01a8c0@rodage.dyndns.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_04E4_01C7D957.6C77A130" X-Mailer: Microsoft Office Outlook 11 In-Reply-To: X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.3138 Thread-Index: AcfVK3UzACVD2AJXRoC8b4Ah26XbxQD9UmVQAAV/jjAAA1V5QA== X-archive-position: 3217 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: verdy_p@wanadoo.fr Precedence: bulk X-list: hebrew This is a multi-part message in MIME format. ------=_NextPart_000_04E4_01C7D957.6C77A130 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable If you don=E2=80=99t like the idea of making this proposed holder as = format control, you may still encode it as a special letter (general = category =E2=80=9CLo=E2=80=9D) with all the other properties exposed in = my previous message. =20 But in that case, the representative glyph would be a narrow dotted = rectangle, whose height matches the height of other letters.This = representative glyph suggests its use. =20 But my proposal has added two new characters (the vowel holder mark and = the letter modifier epenthetic yut), this nearly fills the allocated = Samaritan block keeping only 1 place for a possible addition. As the = encoding of the initial vowels and other letter modifiers becomes = unnecessary (no need to map legacy encodings, and canonical = decompositions), these letter modifiers could be all dropped from the = proposal, keeping just the vowel holder mark encoded before all base = letters, that are then followed directly by the combining vowels. =20 Suddenly, there=E2=80=99s no more canonical equivalences to handle, and = the script is even simpler to manage like this. And we get new unused = positions for future Samaritan extensions (numerals probably, or some = other combining marks for epigraphic texts). =20 But keyboard drivers will need to generate sequences made with the vowel = holder mark and a combining vowel. OR they will just assign a single = keyboard mapping for the vowel holder mark, to type before the key for = the regular combining vowel. Keyboard drivers MAY avoid generating the = vowel holder in their output in most cases, by making it a dead key : if = followed by something else than a combining vowel key, they could beep = and ignore the sequence, but the vowel mark could still be forced by = typing the spacebar after the dead key, OR they would do nothing at all = and no dead key will be setup. =20 It will just be to the text renderer to show a spacing vowel for the = combining sequence instead of showing the default dotted rectangle. And = up to them to decide when this sequence making a default grapheme = cluster can fit between two other base letters in the middle of a word = as a median syllable initial (This is what they are already doing for = Latin, or when creating ligatures or contextual forms for Arabic or = indic scripts, or when rendering Khmer grapheme clusters starting by a = coeng followed by a Khmer base letter). =20 _____ =20 De : Philippe Verdy [mailto:verdy_p@wanadoo.fr]=20 Envoy=C3=A9 : mercredi 8 ao=C3=BBt 2007 00:33 =C3=80 : 'verdy_p@wanadoo.fr'; 'Peter Kirk'; 'Michael Everson' Cc : 'hebrew@unicode.org' Objet : RE: [hebrew] Re: Revised Preliminary proposal for Samaritan =20 =20 So my proposal would be: * (=E2=80=A6) * In charts, show a special =E2=80=9Carbitrary=E2=80=9D REPRESENTATIVE = GLYPH (like for Khmer coeng) using for example a dotted square = containing a small Samaritan alef. * (=E2=80=A6) =20 =20 _____ =20 De : hebrew-bounce@unicode.org [mailto:hebrew-bounce@unicode.org] De la = part de Philippe Verdy Envoy=C3=A9 : mardi 7 ao=C3=BBt 2007 20:58 =C3=80 : 'Peter Kirk'; 'Michael Everson' Cc : hebrew@unicode.org Objet : [hebrew] Re: Revised Preliminary proposal for Samaritan =20 Exactly ! And I see other reasons why the approach with an empty base = letter would be useful too: * for collation: the spacing vowels should be ignored at the primary = level, like all other vowels. * For conversion from texts written with vowels to texts written without = them. In that case, the centered spacing epenthetic yut and leading = vowels would disappear as well. The implicit empty base letter would = have a null collation, completely ignorable in all levels (except the = last implicit level comparing codepoints); =20 _____ =20 De : hebrew-bounce@unicode.org [mailto:hebrew-bounce@unicode.org] De la = part de Peter Kirk Envoy=C3=A9 : jeudi 2 ao=C3=BBt 2007 18:48 =C3=80 : Michael Everson Cc : hebrew@unicode.org Objet : [hebrew] Re: Revised Preliminary proposal for Samaritan =20 On 02/08/2007 16:54, Peter Kirk wrote:=20 ... The unsupported hypothesis I object to is:=20 We have seen that, typically, when more than one mark co-occurs with a = base consonant, one of them is centred above the base letter and the = second takes its place centred between the two letters. Correction: the above is the incorrect observation from which the = unsupported hypothesis was derived. The unsupported hypothesis is that = it is possible to centre vowel marks and place marks like epenthetic yut = above the spaces between letters, and that the rewriting of = miyya=CB=98sf=C3=A5=CB=86 riy is acceptable. --=20 Peter Kirk E-mail: peter@qaya.org Blog: http://www.qaya.org/blog/ Website: http://www.qaya.org/ ------=_NextPart_000_04E4_01C7D957.6C77A130 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

If you = don=E2=80=99t like the idea of making this proposed holder as format control, you may still = encode it as a special letter (general category =E2=80=9CLo=E2=80=9D) with all the = other properties exposed in my previous message.

 =

But in that = case, the representative glyph would be a narrow dotted rectangle, whose height = matches the height of other letters.This representative glyph suggests its = use.

 =

But my proposal = has added two new characters (the vowel holder mark and the letter modifier = epenthetic yut), this nearly fills the allocated Samaritan block keeping only 1 place for = a possible addition. As the encoding of the initial vowels and other = letter modifiers becomes unnecessary (no need to map legacy encodings, and = canonical decompositions), these letter modifiers could be all dropped from the = proposal, keeping just the vowel holder mark encoded before all base letters, that = are then followed directly by the combining = vowels.

 =

Suddenly, = there=E2=80=99s no more canonical equivalences to handle, and the script is even simpler to = manage like this. And we get new unused positions for future Samaritan extensions = (numerals probably, or some other combining marks for epigraphic = texts).

 =

But keyboard = drivers will need to generate sequences made with the vowel holder mark and a = combining vowel. OR they will just assign a single keyboard mapping for the vowel holder = mark, to type before the key for the regular combining vowel. Keyboard drivers = MAY avoid generating the vowel holder in their output in most cases, by = making it a dead key : if followed by something else than a combining vowel key, = they could beep and ignore the sequence, but the vowel mark could still be forced = by typing the spacebar after the dead key, OR they would do nothing at all = and no dead key will be setup.

 =

It will just be = to the text renderer to show a spacing vowel for the combining sequence instead of = showing the default dotted rectangle. And up to them to decide when this sequence = making a default grapheme cluster can fit between two other base letters in the middle of = a word as a median syllable initial (This is what they are already doing for = Latin, or when creating ligatures or contextual forms for Arabic or indic scripts, = or when rendering Khmer grapheme clusters starting by a coeng followed by a = Khmer base letter).

 =


De : Philippe Verdy [mailto:verdy_p@wanadoo.fr]
Envoy=C3=A9 : = mercredi 8 ao=C3=BBt 2007 00:33
=C3=80 : = 'verdy_p@wanadoo.fr'; 'Peter Kirk'; 'Michael Everson'
Cc : = 'hebrew@unicode.org'
Objet : RE: [hebrew] = Re: Revised Preliminary proposal for Samaritan

 

 =

So my proposal = would be:

  • (=E2=80=A6)
  • In charts, show a special =E2=80=9Carbitrary=E2=80=9D = REPRESENTATIVE GLYPH (like for Khmer coeng) using for example a dotted square containing a = small Samaritan alef.
  • (=E2=80=A6)

 =

 =


De : hebrew-bounce@unicode.org = [mailto:hebrew-bounce@unicode.org] De la part de Philippe = Verdy
Envoy=C3=A9 : mardi = 7 ao=C3=BBt 2007 20:58
=C3=80 : 'Peter = Kirk'; 'Michael Everson'
Cc : = hebrew@unicode.org
Objet : [hebrew] Re: = Revised Preliminary proposal for Samaritan

 

Exactly ! = And I see other reasons why the approach with an empty base letter would be useful = too:

  • for collation: the spacing vowels should be ignored at the = primary level, like all other vowels.
  • For conversion from texts written with vowels to texts = written without them. In that case, the centered spacing epenthetic yut and leading vowels would disappear as well. The implicit empty base = letter would have a null collation, completely ignorable in all levels = (except the last implicit level comparing = codepoints);

 =


De : hebrew-bounce@unicode.org = [mailto:hebrew-bounce@unicode.org] De la part de Peter = Kirk
Envoy=C3=A9 : jeudi = 2 ao=C3=BBt 2007 18:48
=C3=80 : Michael = Everson
Cc : = hebrew@unicode.org
Objet : [hebrew] Re: = Revised Preliminary proposal for Samaritan

 

On 02/08/2007 16:54, Peter Kirk wrote: =

...

The unsupported hypothesis I object to is:

We have seen that, typically, when more than one mark co-occurs with a = base consonant, one of them is centred above the base letter and the second = takes its place centred between the two = letters.


Correction: the above is the incorrect observation from which the = unsupported hypothesis was derived. The unsupported hypothesis is that it is = possible to centre vowel marks and place marks like epenthetic yut above the spaces = between letters, and that the rewriting of miyya=CB=98sf=C3=A5=CB=86 riy is acceptable.

-- 
Peter =
Kirk
E-mail:  peter@qaya.org
Blog:    http://www.qaya.org/blog/
Website: http://www.qaya.org/<=
/font>
------=_NextPart_000_04E4_01C7D957.6C77A130-- From rick@unicode.org Thu Aug 16 21:41:49 2007 Received: with ECARTIS (v1.0.0; list hebrew); Thu, 16 Aug 2007 23:33:57 -0500 (CDT) Received: from izanami (c-71-202-247-55.hsd1.ca.comcast.net [71.202.247.55]) by unicode.org (8.13.4/8.12.11) with SMTP id l7H2fjtC018722; Thu, 16 Aug 2007 21:41:45 -0500 Message-Id: <200708170241.l7H2fjtC018722@unicode.org> To: unicode@unicode.org Subject: [hebrew] New Corrigendum to The Unicode Standard Date: Thu, 16 Aug 2007 19:41:45 -0700 From: Rick McGowan received: by Apple.Mailer (2.95.2) X-archive-position: 3218 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk X-list: hebrew The Unicode Consortium has issued a new Corrigendum to The Unicode Standard Version 5.0.0. For details on this corrigendum, see: http://www.unicode.org/versions/corrigendum6.html For general information on corrigenda to The Unicode Standard, see: http://www.unicode.org/versions/corrigenda.html In brief, this corrigendum corrects the Bidi_Mirrored property for several characters. Regards, Rick McGowan Unicode, Inc. From rick@unicode.org Wed Aug 29 09:31:27 2007 Received: with ECARTIS (v1.0.0; list hebrew); Wed, 29 Aug 2007 09:53:46 -0500 (CDT) Received: from izanami (c-71-202-247-55.hsd1.ca.comcast.net [71.202.247.55]) by unicode.org (8.13.4/8.12.11) with SMTP id l7TEVNQ1001250; Wed, 29 Aug 2007 09:31:23 -0500 Message-Id: <200708291431.l7TEVNQ1001250@unicode.org> To: unicode@unicode.org Subject: [hebrew] New Public Review Issue: #109 Proposed Draft UTR #42: An XML representation of the UCD Date: Wed, 29 Aug 2007 07:31:21 -0700 From: Rick McGowan received: by Apple.Mailer (2.95.2) X-archive-position: 3219 X-Approved-By: cowan@ccil.org X-ecartis-version: Ecartis v1.0.0 Sender: hebrew-bounce@unicode.org Errors-to: hebrew-bounce@unicode.org X-original-sender: rick@unicode.org Precedence: bulk X-list: hebrew The Unicode Technical Committee has posted a new issue for public review and comment. Details are on the following web page: http://www.unicode.org/review/ Review period for the new item closes on October 10, 2007. Please see the page for links to discussion and relevant documents. Briefly, the new issue is: Proposed draft Technical Report #42 describes an XML representation of the Unicode Character Database, and is available for public review and comment. Please see the separate background document for details of this review and how to obtain data files. If you have comments for official UTC consideration, please post them by submitting your comments through our feedback & reporting page: http://www.unicode.org/reporting.html If you wish to discuss issues on the Unicode mail list, then please use the following link to subscribe (if necessary). Please be aware that discussion comments on the Unicode mail list are not automatically recorded as input to the UTC. You must use the reporting link above to generate comments for UTC consideration. http://www.unicode.org/consortium/distlist.html Regards, Rick McGowan Unicode, Inc.