Re: Questions on the Unicode BiDirectional (BIDI) Algorithm from Philippe Verdy on 2014-07-08 (Unicode Mail List Archive)

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Tue, 8 Jul 2014 10:58:22 +0200

I do agree but the user was already confused by the EN->AN "substitution"
which is not a character substitution but just a change of Bidi class
during resolutions steps needed to order thngs correctly.
The final goal of UBA is just to compute the "correct" visual reordering,
and my sentence was already saying that (the "unless" part was effectively
incomplete as it was not clear that this was not about UBA itself).
Indeed, the replacement of digits by "national digits" is described in the
stadnard even if it is now deprecated (but it is maintained because we also
have some deprecated / non-recommanded formating controls to define this
behavior for a few softwares that still need it).

And the UBA algoritm also includes some minimal support for these formating
controls, so this comment is not completely out of topic (but even in this
case UBA does not perform the substitutions itself). These contols are not
needed when composing an Arabic plain-text directly. This is just an old
compatibility facility for cases where a software is formating numbers
using ASCII digits (e.g. with printf("%d %f"...) in C) without knowing
which other set of digits it should better use (no support of a more
precise locale).

These substitutions are problematic anyway because they are almost blind of
the context of use (are these used for numbers expressing quantities, or
are there codes and identifiers like phone numbers or social security
numbers or car registration numbers, or postal codes in a foreign
country?). They could also generate problems if they cause a change of
interpretation in dates and times or in currency amounts (because such
subtitutions are lossy), and in fact more poblems than keeping digits
untouched (even if they are not the prefered ones for a given locale).

What is really important is that substitution of ASCII digits is not
possible only at the character encoding level, used by UBA, because it
requires some other knowledge about language (or style for the Arabic
script). Typically such substitution are handled in the context of a
specific font used by the renderer; which will need such formating controls
in order to know if it can substitute *glyphs* (not characters) and UBA
also does not work at the glyph level. It is possible at the font level It
is even more difficult to do that for the Arabic script rather than for the
Indic scripts because the Arabic script has two distinct sets of "national"
digits.

Digits are significntly important enough in critical cases that performing
automatic substitution of them without good knowledge of their context of
use will cause severe security problems. UBA is used so broadly by default
that it is certainly not the algorithm in which sush substitutions will
occur (and I think this is for the same reason that the use of digit
formatting controls is also strongly discouraged)

2014-07-07 19:36 GMT+02:00 Doug Ewell <doug_at_ewellic.org>:

> Philippe Verdy <verdy underscore p at wanadoo dot fr> wrote:
>
> >>> It just changes the direction behavior, it does not replace
> >>> characters, except in the preference mode using "national digits".
> >>
> >> It does not replace characters, period. There is no "preference mode"
> >> in the UBA that replaces digits. The terms "preference mode" and
> >> "national digits" do not appear in UAX #9.
> >
> > Yes, my sentence was incomplete. But such prefernece mode is
> > implemented in common softwares (and without even needing UBA itself).
>
> William asked specifically about the UBA, and cited a passage from the
> UBA, and the Subject line of this thread refers to the UBA. To avoid
> confusing him with unrelated information, my responses at least are
> intentionally confined to the UBA.
>
> --
> Doug Ewell | Thornton, CO, USA
> http://ewellic.org | @DougEwell
>
>

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Tue Jul 08 2014 - 04:00:11 CDT

This archive was generated by hypermail 2.2.0 : Tue Jul 08 2014 - 04:00:12 CDT