Re: Internal Representation of Unicode

From: Doug Ewell (dewell@adelphia.net)
Date: Fri Sep 26 2003 - 01:44:40 EDT

  • Next message: John Hudson: "RE: About that alphabetician..."

    Johann <myrkraverk at users dot sourceforge dot net> wrote:

    > That does not have to be a problem, as long as there are no more than
    > 255 accents and combinations of them. As for vietnamese, I just don't
    > know how many there are, or how many characters they use.

    You'll need UTF-8 and a fairly comprehensive font to read the following.

    For Vietnamese, you should count on supporting the following vowels:

    a à ả ã á ạ ă ằ ẳ ẵ ắ ặ â ầ ẩ ẫ ấ ậ e è ẻ ẽ é ẹ ê ề ể ễ ế ệ i ì ỉ ĩ í ị
    o ò ỏ õ ó ọ ô ồ ổ ỗ ố ộ ơ ờ ở ỡ ớ ợ u ù ủ ũ ú ụ ư ừ ử ữ ứ ự y ỳ ỷ ỹ ý ỵ

    the following consonant (in addition to most other English consonants):

    đ

    and this currency sign:


    For purposes of your mechanism, you can think of each vowel as having up
    to 2 accents: (upper, right-attached, or none) plus (upper, lower, or
    none). The way Vietnamese think of it is that the circumflex, breve,
    and horn are part of the base letter (making a total of 12 base vowels),
    whereas the grave, hook above, tilde, acute, and dot below are
    considered diacritics (6 × 12 = 72 total vowels). All combinations are
    possible.

    Of course, all of the letters (not the dong sign) come in both uppercase
    and lowercase.

    -Doug Ewell
     Fullerton, California
     http://users.adelphia.net/~dewell/



    This archive was generated by hypermail 2.1.5 : Fri Sep 26 2003 - 02:52:32 EDT