Re: Folding algorithm and canonical equivalence

From: Peter Kirk (peterkirk@qaya.org)
Date: Sat Jul 17 2004 - 18:59:05 CDT

Next message: E. Keown: "Request for 'Hebrew Extended' block in BMP"

Previous message: Asmus Freytag: "Re: Folding algorithm and canonical equivalence"
In reply to: Asmus Freytag: "Re: Folding algorithm and canonical equivalence"
Next in thread: John Cowan: "Re: Folding algorithm and canonical equivalence"
Reply: John Cowan: "Re: Folding algorithm and canonical equivalence"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 18/07/2004 00:46, Asmus Freytag wrote:

> Thank you for reviewing this.
>
> DiacriticFolding (unlike AccentFolding) is selective about which
> combining marks it removes for which base character. I wonder whether
> that's truly intended, or whether it could be replaced by a
> combination of
>
> AccentFolding
> OtherDiacriticFolding
>
> where AccentFolding removes *all* nonspacing marks following Latin,
> Greek or Cyrillic letters and we would remove from DiacriticFolding
> all cases that are already handled by accent folding.
>
> That still doesn't take care of Hebrew, so we would need to decide how
> to handle that. Perhaps you would like to put forth a proposal as to
> what accents or diacritics should be folded for Hebrew, and in what
> context. Is it just Dagesh?

No, Dagesh is actually the *least* likely combining mark to be stripped
as it is the most closely bound to the base character (and for this
reason ended up in legacy precomposed characters and thence into the
draft table). But I think the best thing to do is to drop *all* Hebrew
combining marks; the result of this is valid unpointed Hebrew. This
corresponds to the implicit folding already defined by SII and described
in the quotation from SI 4281 in
http://www.qsm.co.il/Hebrew/Responses%20to%20Several%20Hebrew%20Items.pdf
= L2/04-213. But Jony Rosenne needs to provide input on this.

>
> The other alternative would be to limit the nonspacing marks to those
> that actually occur with Latin / Greek / Cyrillic letters as ordinary
> diacritics (i.e. all the diacritics that show up in
> DiacriticFolding.txt), but then remove them if they follow *any* base
> character from that set, not just in certain fixed combinations.

Are there actually cases where these marks follow any other base
characters and they should *not* be removed? That is what confuses me.
It would be much simpler just to delete them independent of context.

>
> Rather than list the mappings in a file, we would simply list the
> conditions, similar to AccendFolding (see
> http://www.unicode.org/reports/tr30/Foldings.txt) and reduce the data
> file to those cases where there are no mappings (o with stroke -> o,
> combining stroke overlay, etc.).

I think you mean
http://www.unicode.org/reports/tr30/datafiles/Foldings.txt. This seems
sensible to me.

>
> John, you proposed the initial set. Do you have any suggestion here?
>
> A./
>
>
>
>
>

-- 
Peter Kirk
peter@qaya.org (personal)
peterkirk@qaya.org (work)
http://www.qaya.org/

Next message: E. Keown: "Request for 'Hebrew Extended' block in BMP"
Previous message: Asmus Freytag: "Re: Folding algorithm and canonical equivalence"
In reply to: Asmus Freytag: "Re: Folding algorithm and canonical equivalence"
Next in thread: John Cowan: "Re: Folding algorithm and canonical equivalence"
Reply: John Cowan: "Re: Folding algorithm and canonical equivalence"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Sat Jul 17 2004 - 19:00:51 CDT