Re: Default bidi ranges

From: Asmus Freytag <asmusf_at_ix.netcom.com>
Date: Wed, 09 Nov 2011 09:30:49 -0800

On 11/9/2011 1:18 AM, "Martin J. Dürst" wrote:
> I tried to find something like a normative description of the default
> bidi class of unassigned code points.
>
> In UTR #9, it says
> (http://www.unicode.org/reports/tr9/tr9-23.html#Bidirectional_Character_Types):
>
> Unassigned characters are given strong types in the algorithm. This is
> an explicit exception to the general Unicode conformance requirements
> with respect to unassigned characters. As characters become assigned
> in the future, these bidirectional types may change. For assignments
> to character types, see DerivedBidiClass.txt [DerivedBIDI] in the [UCD].
>
> The DerivedBidiClass.txt file, as far as I understand, is mainly a
> condensation of bidi classes into character ranges (rather than giving
> them for each codepoint independently as in UnicodeData.txt). I.e. it
> can at any moment be derived automatically from UnicodeData.txt, and
> is as such not normative.
>
> Why is it then that the default class assignments are only given in
> this file (unless I have overlooked something)? And why is it that
> they are only given in comments?

Because the UnicodeData.txt file has no header (for historical
compatibility).

Because, like the practice of putting <style> in HTML inside comments,
these things (@missing) are in comments to protect older parsers.
> I'm trying to create a program that takes all the bidi assignments
> (including default ones) and creates the data part of a bidi algorithm
> implementation, but I don't feel confident to code against stuff
> that's in comments. Any advice? Is it possible that this could be
> fixed (making it more normative, and putting it in a form that's
> easier to process automatically)?

I've confidently parsed these comments for years now.

The one things that's worse than parsing these comments is to move to an
incompatible scheme.

That said, apparently, for some properties the default information is
contained in the PropertyValuieAliases.txt file, where it is
inconveniently located for people who want to parse just one property,
but conveniently located for those who want to assemble the whole database.
(And, worse, where it adds a code-point dependency to the information in
that file that wasn't there from the beginning - but at least the
@missing syntax hasn't changed too much).

A./
Received on Wed Nov 09 2011 - 11:39:38 CST

This archive was generated by hypermail 2.2.0 : Wed Nov 09 2011 - 11:40:01 CST