RE: Are there Unicode processors?

From: Phillips, Addison <addison_at_lab126.com>
Date: Mon, 7 Jan 2013 16:00:47 -0800

"Unicode processor"??

If what you're looking for is code that breaks text into grapheme clusters/words/lines/etc., that's called "text segmentation" and is described in:

   http://www.unicode.org/reports/tr29/

But you go on to talk about characters and their properties...... if you're looking for APIs that provide access to stuff like Unicode character properties, programming languages or libraries provide such capabilities (Java, perl, Python, ICU...) in various appropriate ways. See, for example:

http://docs.oracle.com/javase/7/docs/api/java/lang/Character.html

Or:

http://perldoc.perl.org/5.14.0/perlunicode.html#Unicode-Character-Properties

Or:

http://userguide.icu-project.org/strings/properties

Addison

Addison Phillips
Globalization Architect (Lab126)
Chair (W3C I18N WG)

Internationalization is not a feature.
It is an architecture.

> -----Original Message-----
> From: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org] On
> Behalf Of Costello, Roger L.
> Sent: Monday, January 07, 2013 2:35 PM
> To: unicode_at_unicode.org
> Subject: Are there Unicode processors?
>
> Hi Folks,
>
> An "XML processor" breaks up an XML document into its parts -- here's a start
> tag, here's element content, here's an end tag, etc. -- and then makes those
> parts (along with information about each part such as "this part is a start tag"
> and "this part is element content") available to XML applications via an API.
>
> Are there "Unicode processors"?
>
> That is, are there processors that break up Unicode text into its parts -- here's a
> character, here's another character, here's still another character, etc. -- and
> then makes those parts (along with information about each part such as "this
> part is the Latin Capital Letter T" and "this part is the Latin Small Letter o")
> available to Unicode applications (such as XML processors) via an API?
>
> I did a Google search for "Unicode processor" and came up empty so I am
> guessing the answer is that there are no Unicode processors. Or perhaps they
> go by a different name? If there are no Unicode processors, why not?
>
> /Roger
>
Received on Mon Jan 07 2013 - 18:05:21 CST

This archive was generated by hypermail 2.2.0 : Mon Jan 07 2013 - 18:05:21 CST