Re: CLDR and ICU

From: Richard Wordingham <>
Date: Fri, 27 Jul 2012 21:24:42 +0100

On Fri, 27 Jul 2012 09:01:13 -0700
Mark Davis ☕ <> wrote:

> The key term is 'open interchange'.

XML documents are textual objects. It is therefore reasonable to look
at them using tools for displaying textual objects. However,
> "<snip> noncharacters are <snip>
> permanently reserved (unassigned) and have no interpretation
> whatsoever outside of their possible application-internal private
> uses."

> For CLDR collation data - *not open interchange, but specific to use
> in CLDR collation data* - these characters have specified use as
> sentinel characters, marking the boundaries for CJK 'buckets' for use
> in indexes.

I hope you're addressing a complaint I haven't made. I haven't
complained about tailoring involving non-characters, though it
does strike me as a least evil. Are you perhaps arguing that I become
part of some CLDR application when I read CLDR XML files?

> This is described in
> The
> noncharacters are chosen specifically so that they do not overlap
> with publicly interchanged private use characters. Of course,
> implementations of LDML can tailor the collations to remove them, or
> replace by other mechanisms.

I was going to ask when the LDML element suppress_contractions took
effect. At least I now have some idea of the answer.

> Unfortunately, some restrictions that were perfectly reasonable for
> use in document interchange become annoying flaws in a general
> structured data interchange format. The inability to interchange all
> Unicode scalar values is one.

The restrictions improve legibility. As it is, many of the
character-level elements in CLDR XML files tend to be unreadable. It
would be better for them not to require genuinely complex text
rendering. In a related matter, it was very inconvenient to have to
treat collation test files as binary data because they could not be DOS
text files - ctrl/Z in the comments cut the files short.

Received on Fri Jul 27 2012 - 15:29:23 CDT

This archive was generated by hypermail 2.2.0 : Fri Jul 27 2012 - 15:29:24 CDT