Parsers for the UnicodeSet notation?

From: Eric Muller <emuller_at_adobe.com>
Date: Wed, 23 Jul 2014 15:23:46 -0700

I would like to work with the exemplarCharacters data in the CLDR. That
uses the UnicodeSet notation. Is there somewhere a parser for that
notation, that would return me just the list of characters in the set?
Something a bit like the UnicodeSet utility at
<http://unicode.org/cldr/utility/list-unicodeset.jsp>, but for use in
apps/shell.

I suspect that the exemplarCharacters use a restricted form of the
UnicodeSet notation (e.g. do not use property values). Is that correct,
and if so, what's the subset?

Incidentally, I copy/pasted the punctuation exemplar characters for
he.xml into the utility, and it reported that the set contains 8,130
code points, including the ascii letters. Somehow, that seems incorrect.
What did I do wrong?

Thanks,
Eric.

_______________________________________________
Unicode mailing list
Unicode_at_unicode.org
http://unicode.org/mailman/listinfo/unicode
Received on Wed Jul 23 2014 - 17:24:13 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 23 2014 - 17:24:13 CDT