Re: FW: Private Use Area - Building Combining Classes

From: Peter_Constable@sil.org
Date: Tue Oct 30 2001 - 13:13:56 EST

Previous message: Michael Everson: "YO, ho ho, and a bottle of vodka"
Maybe in reply to: Magda Danish (Unicode): "FW: Private Use Area - Building Combining Classes"
Next in thread: James Kass: "Re: FW: Private Use Area - Building Combining Classes"
Reply: James Kass: "Re: FW: Private Use Area - Building Combining Classes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

James:

>Uniscribe will provide glyph substitutions for glyphs encoded in the PUA
>under certain conditions. The Uniscribe looks for glyphs in a certain
range
>based on the script, so if the first glyph ID in the GSUB or GPOS table
is
>not within that script's range (i.e., it's in PUA or not mapped), the
Uniscribe
>will not look it up.
>
>But if the first glyph ID is in the target script range, it will trigger
the
>look-up.
>
>I've recently completed testing GSUB and GPOS for Unicode Bengali.
>
>For the conjunct form K-T-BA, the first step is to make the K-TA with
>U+0995 U+09CD U+09A4. We'll call the resulting glyph U+E001. The next
>step is to make the K-T-BA. U+E001 U+09CD U+09AC are the characters
>needed, but Uniscribe has already performed a re-ordering, so in the
>look-up table this string must appear as U+E001 U+09AC U+09CD. This
>seems to work just fine. Once the first substitution has occured, the
>resulting substitutions apparently consider that any new Glyph ID
>produced by a previous substitution *is* part of the target script range,
>and further substitutions will work.

It's not entirely clear to me what you are conveying, but I think you are
talking in terms of OpenType lookups, in which case these are all
operating on glyph IDs and not character codes. Uniscribe must first
operate on character codes. In the process you're describing (if I have
understood it), Uniscribe and the OT layout engine may look at a glyph
with an ID of 0xE001 (or maybe a glyph that is encoded in the cmap as
U+E001, but that is completely irrelevant since at this point it doesn't
matter if it is even encoded in the cmap or not), but they never look at
character U+E001; at the point you are referring to, a transformation to
glyph space has already occurred.

If we start with data containing < 0995, 09CD, 09A4 >, Uniscribe will
reorder where needed, do the cmap lookup to get the initial glyph IDs and
then apply certain feature tags. From there it will start processing
lookups on the tagged string of glyph IDs. For the situation that was
asked about, though, you need to consider a scenario in which you start
with data containing something like < E001, 09CD, 09A4 >. In that
situation, I believe Unscribe will not operate on U+E001.

- Peter

---------------------------------------------------------------------------
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>

Previous message: Michael Everson: "YO, ho ho, and a bottle of vodka"
Maybe in reply to: Magda Danish (Unicode): "FW: Private Use Area - Building Combining Classes"
Next in thread: James Kass: "Re: FW: Private Use Area - Building Combining Classes"
Reply: James Kass: "Re: FW: Private Use Area - Building Combining Classes"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Oct 30 2001 - 13:59:41 EST