Re: Tamil

From: Doug Ewell (doug@ewellic.org)
Date: Mon Feb 14 2011 - 10:00:39 CST

  • Next message: anbu@peoplestring.com: "Re: Characters"

    <anbu at peoplestring dot com> wrote:

    > Tamil letters ஙா(0B99+0BBE), ஙி(0B99+0BBF), ஙீ(0B99+0BC0),
    > ஙு(0B99+0BC1), ஙூ(0B99+0BC2), ஙெ(0B99+0BC6), ஙே(0B99+0BC7),
    > ஙை(0B99+0BC8), ஙொ(0B99+0BCA), ஙோ(0B99+0BCB), ஙௌ(0B99+0BCC),
    > ஞி(0B9E+0BBF), ஞீ(0B9E+0BC0), ஞு(0B9E+0BC1), ஞூ(0B9E+0BC2),
    > ஞெ(0B9E+0BC6), ஞே(0B9E+0BC7), ஞை(0B9E+0BC8), ஞொ(0B9E+0BCA),
    > ஞோ(0B9E+0BCB), ஞௌ(0B9E+0BCC) are almost unused and most Tamil symbols
    > less used. We can assign them to more bits instead of the 16 bits they
    > are assigned to, as they are occupying space with almost no use.

    I'm not sure how these combinations of 2 characters can be considered as
    "assigned to... 16 bits" unless one is using SCSU or some other encoding
    which can represent a Tamil character as an 8-bit byte. Unicode code
    points in the Tamil range take 16 bits each in UTF-16, and 24 bits each
    in UTF-8.

    If these are "almost unused," then it's probably a good thing they were
    not assigned to a single code point each. (There are some people who
    want every Tamil syllable to have its own code point.)

    --
    Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
    RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Mon Feb 14 2011 - 10:03:05 CST