Re: Latin ligatures and Unicode

From: Mark E. Davis (markdavis@ispchannel.com)
Date: Wed Dec 22 1999 - 15:26:25 EST


In V2, look on p6-132.

John Cowan wrote:

> "Reynolds, Gregg" wrote:
>
> > If the standard says, as
> > Mark just noted in a message, that they are to be ignored for the purposes
> > of join analysis, then I stand corrected; but I haven't been able to find
> > anything (admittedly I'm looking at v. 2) that says this.
>
> I think this was a post-2.0 clarification. Probably on the Web site
> someplace.
>
> > Well, sort of, but "word" isn't sufficient. You'd still want to be able to
> > distinguish between distinct lexemes packaged as a single lexigraphic word
> > and lexigraphic words - ordinarily whitespace delimited, but then again,
> > because Arabic encodes word boundaries in the letterforms themselves, one
> > could also remove SP word boundaries and use ZWSP (I think ;).
>
> Probably. If you need to be more refined than that, you are looking at
> tagging text rather than writing it, and you need either Private Zone codes
> or higher-level markup.
>
> > Still, decomposing such a form into its consituent root
> > (k,t,b) and theme (ma-prefix, internal shape) is utterly elementary for
> > anybody with a little Arabic. That's how it would be entered and looked up
> > in dictionaries, for example.
>
> What that shows is that brute-force string search isn't very useful for
> Arabic, but we knew that anyhow.
>
> > Seems a shame; a little formal semantics would go a long way.
>
> Feel free to contribute it.
>
> > But the "li-" in "li-<<-al-..." must be lam-initial, so I think ZWJ would be
> > the thing for it. Otherwise wouldn't the guillemets send it into isolate
> > form?
>
> Quite right, my mistake.
>
> > In some cases one may want to place diacritics over some whitespace or a
> > tatweel stroke, within a word.
>
> Ah, in the category of "whitespace" you include the whitespace between
> non-ligated characters, for example the space between "a" and "b" in
> this example: "ab". Whereas when I talk of whitespace, I mean whitespace
> that is wider than normal inter-letter spacing (absent ties between
> letters), as in "a b". Does this difference simply make no sense in
> Arabic script?
>
> --
>
> Schlingt dreifach einen Kreis vom dies! || John Cowan <jcowan@reutershealth.com>
> Schliesst euer Aug vor heiliger Schau, || http://www.reutershealth.com
> Denn er genoss vom Honig-Tau, || http://www.ccil.org/~cowan
> Und trank die Milch vom Paradies. -- Coleridge (tr. Politzer)



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:57 EDT