Arabic Presentation Forms-A

From: Philippe Verdy (verdy_p@wanadoo.fr)
Date: Wed Dec 17 2003 - 10:08:57 EST

Next message: jcowan@reutershealth.com: "Re: [OT] CJK -> CJC (Re: Corea?)"

Previous message: Kent Karlsson: "RE: Case mapping of dotless lowercase letters"
Next in thread: Marco Cimarosti: "RE: Arabic Presentation Forms-A"
Maybe reply: Marco Cimarosti: "RE: Arabic Presentation Forms-A"
Maybe reply: Kenneth Whistler: "Re: Arabic Presentation Forms-A"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

I was validating some internal processing of strings, and I found these
intrigating decompositions for Arabic Presentation forms-A. I was surprised
to see that they are compatibility decomposed in (isolated) rows from bottom
to top, in a distinct reading order from normal Arabic reading order for
rows , but of coruse with the same right-to-left reading order:

#code;cc;nfd;nfkdFolded; # CHAR?; NFD?; NFKDFOLDED?;
# RIAL SIGN
fdfc;;;<isolated> 0631 06cc 0627 0644; # ??; ?; ?????;

The "Arial Unicode MS" font does not have a glyph for the Rial currency sign
so I won't comment lots about it, even if it's a special ligature of its
component letters:
- where the medial form of U+06CC ARABIC LETTER FARSI YEH (?) is shown on
charts only as two dots (and not with its "Arabic letter alef maksura" base
form, as the comment in Arabic chart suggests for Arabic letter yeh), which
is
- located on below-left of the medial form of U+0627 (?) ,
- and where the initial form of U+0631 (?) kerns below its next two
characters (sometimes with an aditional kashida below its next three
characters). However the general layout is still one row, so the
decomposition seems very quite reasonable; it's just regrettable that it's
not found in Arial Unicode MS (unless this Rial sign is traditional and no
more in actual use today).

I'm not sure that the compatibility decomposition gives the accurate form
for rendering the traditional glyph coded for the currency symbol...

------------------

Now I have this one:

#code;name;cc;
# nfd;nfkdFolded;
# #CHAR?; NFD?; NFKDFOLDED?;
FDFA;ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM;0;
FDFA;<isolated> 0635 0644 0649 0020 0627 0644 0644 0647 0020 0639
0644 064a 0647 0020 0648 0633 0644 0645;
# ??; ??; ??? ???? ???? ?????;

#code;name;cc;
# nfd;nfkdFolded;
# #CHAR?; NFD?; NFKDFOLDED?;
FDFB;ARABIC LIGATURE JALLAJALALOUHOU;0;
FDFB;<isolated> 062c 0644 0020 062c 0644 0627 0644 0647;
# ??; ??; ?? ??????;

I note that the Unicode charts show them with their complex and highly
ligated form, that correspond to the Arabic tradition in Quran. This is
apparently not implemented in Microsoft fonts which just render only the
first two on only 2 bottom-to-top rows.

The compatibility decomposition creates 4 space-separated words WORD1,
WORD2, WORD3, WORD4 that would be rendered normally either in one row as:
        WORD4 WORD3 WORD2 WORD1
i.e.
        ??? ???? ???? ?????
or on multiple narrow rows as:
        WORD1 or WORD2 WORD1
        WORD2 WORD4 WORD3
        WORD3
        WORD4
i.e.
        ??? or ??? ????
        ???? ???? ?????
        ????
        ?????
using the top-to-bottom normal layout of plain-text rows in Arabic.

I can understand that it's difficult to make them fit more ideally like this
(with kashidas noted by underscores) :
        WORD2
        _______WORD1
        W_______ORD3
        W___OR____D4
i.e. actually this order:
        ????
                ???
        ????
        ?????

to better match the actual glyph in charts which also uses kashidas, given
the height constraints in fonts, and the difficulty to create the
traditional complex kerning between rows, but the current presentation of
the alternate glyph chosen in Arial Unicode MS does not seems intuitive.
Isn't there some requirement in Unicode to not change the common layout
which is part of the character identity and structural for the script? Such
interpretation problem does not occur in the presentation of U+FDFB (which
also has two rows in the representative glyph of Arabic Presentation Forms-A
charts). Is there an error here?

---------------------------

Now with this one:

#code;name;cc;
# nfd;nfkdFolded;
# #CHAR?; NFD?; NFKDFOLDED?;
FDFB;ARABIC LIGATURE JALLAJALALOUHOU;0;
FDFB;<isolated> 062c 0644 0020 062c 0644 0627 0644 0647;
# ??; ??; ?? ??????;

The decomposition into WORD1 WORD2 follows the same principles but is less
complex, and it uses this layout:
        WORD2 WORD1
or:
        WORD1
        WORD2
The second layout is used in Arial Unicode MS to render the ligature.

---------------------------

Now I don't know why the last very complex but marvelous ligature U+FDFD in
Unicode does not have a compatiblity decomposition. In fact I can't decipher
clearly to what Arabic letters the ligature corresponds (this is not
documented in Unicode, except through its English name, which is probably
too far from the Arabic name to allow this identification)

More generally, my question is related to the allowed modification of
layouts for ligature glyphs in fonts: are they allowed, and how could they
be acceptably be represented when the plain-text character is not
compatibility-decomposed but rendered with a single glyph...

__________________________________________________________________
<< ella for Spam Control >> has removed Spam messages and set aside
Newsletters for me
You can use it too - and it's FREE! http://www.ellaforspam.com

application/ms-tnef attachment: winmail.dat

Next message: jcowan@reutershealth.com: "Re: [OT] CJK -> CJC (Re: Corea?)"
Previous message: Kent Karlsson: "RE: Case mapping of dotless lowercase letters"
Next in thread: Marco Cimarosti: "RE: Arabic Presentation Forms-A"
Maybe reply: Marco Cimarosti: "RE: Arabic Presentation Forms-A"
Maybe reply: Kenneth Whistler: "Re: Arabic Presentation Forms-A"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Wed Dec 17 2003 - 11:00:36 EST