Re: Designing a format for research use of the PUA in a RTL mode (from Re: RTL PUA?)

From: Asmus Freytag <>
Date: Tue, 23 Aug 2011 11:02:31 -0700

On 8/23/2011 7:22 AM, Doug Ewell wrote:
> Of all applications, a word processor or DTP application would want to
> know more about the properties of characters than just whether they are
> RTL. Line breaking, word breaking, and case mapping come to mind.
> I would think the format used by standard UCD files, or the XML
> equivalent, would be preferable to making one up:

The right answer would follow the XML format of the UCD.

That's the only format that allows all necessary information contained
in one file, and it would leverage of any effort that users of the main
UCD have made in parsing the XML format.

An XML format shold also be flexible in that you can add/remove not just
characters, but properties as needed.

The worst thing do do, other than designing something from scratch,
would be to replicate the UnicodeData.txt layout with its random, but
fixed collection of properties and insanely many semi-colons. None of
the existing UCD txt files carries all the needed data in a single file.

Received on Tue Aug 23 2011 - 13:05:57 CDT

This archive was generated by hypermail 2.2.0 : Tue Aug 23 2011 - 13:06:09 CDT