Re: Internal Representation of Unicode

From: Doug Ewell ([email protected])
Date: Fri Sep 26 2003 - 01:44:40 EDT

Next message: John Hudson: "RE: About that alphabetician..."

Previous message: Doug Ewell: "Unicode 4.0 book (was: Re: About that alphabetician...)"
In reply to: [email protected]: "Re: Internal Representation of Unicode"
Next in thread: Peter Kirk: "Re: Internal Representation of Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Johann <myrkraverk at users dot sourceforge dot net> wrote:

> That does not have to be a problem, as long as there are no more than
> 255 accents and combinations of them. As for vietnamese, I just don't
> know how many there are, or how many characters they use.

You'll need UTF-8 and a fairly comprehensive font to read the following.

For Vietnamese, you should count on supporting the following vowels:

a à ả ã á ạ ă ằ ẳ ẵ ắ ặ â ầ ẩ ẫ ấ ậ e è ẻ ẽ é ẹ ê ề ể ễ ế ệ i ì ỉ ĩ í ị
o ò ỏ õ ó ọ ô ồ ổ ỗ ố ộ ơ ờ ở ỡ ớ ợ u ù ủ ũ ú ụ ư ừ ử ữ ứ ự y ỳ ỷ ỹ ý ỵ

the following consonant (in addition to most other English consonants):

and this currency sign:

₫

For purposes of your mechanism, you can think of each vowel as having up
to 2 accents: (upper, right-attached, or none) plus (upper, lower, or
none). The way Vietnamese think of it is that the circumflex, breve,
and horn are part of the base letter (making a total of 12 base vowels),
whereas the grave, hook above, tilde, acute, and dot below are
considered diacritics (6 × 12 = 72 total vowels). All combinations are
possible.

Of course, all of the letters (not the dong sign) come in both uppercase
and lowercase.

-Doug Ewell
Fullerton, California
http://users.adelphia.net/~dewell/

Next message: John Hudson: "RE: About that alphabetician..."
Previous message: Doug Ewell: "Unicode 4.0 book (was: Re: About that alphabetician...)"
In reply to: [email protected]: "Re: Internal Representation of Unicode"
Next in thread: Peter Kirk: "Re: Internal Representation of Unicode"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Sep 26 2003 - 02:52:32 EDT