Re: minimizing size (was Re: allocation of Georgian letters)

From: Sinnathurai Srivas (sisrivas@blueyonder.co.uk)
Date: Sat Feb 09 2008 - 12:21:18 CST

  • Next message: Michael S. Kaplan: "Re: minimizing size (was Re: allocation of Georgian letters)"

    >>>
    The current Tamil Unicode need not be depricated. The current Tamil Unicode
    can become the true Tamil Unicode, without CTL, yes the current without CTL
    in say 100 years time, as the current is in sync with ancient scientific
    Grammar (without CTL). This is what I meant by FORWARD COMPATIBLE, not even
    backward compatible.

    What we need now is making canonical forms, making canonical forms as
    secondary while keeping Current as primary. ie, both will work in harmony.
    Canonical forms will make it possible for contemporary Tamil to work without
    CTL.

    Sinnathurai

    >>
    From: "Bala" <bala@cse.mrt.ac.lk> Wrote

    However Unicode were very clear in the Chennai meeting that dual encoding is
    not possible and present encoding cannot be deprecated as well.

    Sinnathutai

    ----- Original Message -----
    From: "Sinnathurai Srivas" <sisrivas@blueyonder.co.uk>
    To: "Sinnathurai Srivas" <sisrivas@blueyonder.co.uk>; "Unicode Discussion"
    <unicode@unicode.org>
    Sent: 09 February 2008 16:09
    Subject: Re: minimizing size (was Re: allocation of Georgian letters)

    >>>>>>
    > A backward compatible solution would include allowing the current (CTL)
    > encoding to work while canonical forms would allow non-CTL encoding.
    > Both can work seemlesly.
    >
    > Tamil already has a canonical form (though wrongly) defined already. We
    > need to expand on providing canonical forms in a logical manner.
    >
    > Forward Compatible:
    > One more thing, criteria used for making Devanagari nor Bengali may be or
    > may not be correct in making them CTL encodings. However, the criteria
    > used to make Tamil a CTL is not correct. Ofcourse backward compatibility
    > (even forward compatibility can be considered) must be maintained. Is a
    > non-CTL solution, that is made forward-compaticle to CTL is a possibility?
    > Can I expect a solution for rigidly fixedwidth requirements using
    > canonical forms? Can I expect an IMMEDIATE solution to use Unicode in
    > publishing, utilising canonical forms? The existing will encoding stays
    > primary, while canonical forms assist in special circumstances?
    >
    > Sinnathurai
    >
    >>>>>
    > On Feb 8, 2008 4:12 PM, Sinnathurai Srivas <sisrivas@blueyonder.co.uk>
    > wrote:
    >> Again what is the criteria for stopping Tamil using workable solution and
    >> what is the criteria for enforcing non-working solution?
    >
    > Unicode will change its encoding of Tamil in a non-backward compatible
    > way when hell freezes over. This system may be suboptimal, but it is
    > the same system as used for Devanagari and Bengali, and does work, if
    > not as well or in as many systems as you may hope.
    >
    >
    >
    > ----- Original Message -----
    > From: "Sinnathurai Srivas" <sisrivas@blueyonder.co.uk>
    > To: "Unicode Discussion" <unicode@unicode.org>
    > Sent: 08 February 2008 22:02
    > Subject: Re: minimizing size (was Re: allocation of Georgian letters)
    >
    >
    >>
    > John H. Jenkins wrote in a mail,
    > "
    > Even if Unicode had used an encoding model for South Asian scripts
    > that didn't require complex rendering, the current problem would exist
    > because then text would display correctly but, for example, databases
    > would have to be substantially rewritten to convert the glyph stream
    > back into a series of letters for the operations that they typically
    > support.
    > "
    >>>
    > John, could you expand on the above,
    > (Additionally, please include what effect canonical forms would have on
    > databses.)
    > My initial thinking is non-CTL Tamil would work in databases without
    > additional interventions.
    >
    > Sinnathurai
    >
    >>>>
    >
    > I'm not sufficiently familiar with Tamil to comment intelligently on
    > it. My responses were aimed at the more general issue of why complex
    > scripts are required and to try to clarify the reasons why some
    > scripts require complex rendering and others don't. If your questions
    > have to do with Tamil specifically, it would be better for them to be
    > answered by someone more familiar with the script.
    >
    > =====
    > John H. Jenkins
    > jenkins@apple.com
    > ----- Original Message -----
    > From: "Sinnathurai Srivas" <sisrivas@blueyonder.co.uk>
    > To: "Unicode Discussion" <unicode@unicode.org>
    > Sent: 08 February 2008 21:12
    > Subject: Re: minimizing size (was Re: allocation of Georgian letters)
    >
    >
    > 1/
    > My question was what is the criteria used to class a language as
    > a/ That requires complex rendering
    > b/ That does not require complex rendering.
    >
    > Tamil need not be a CTL script. It can work 100% and work better than CTL
    > enabled Tamil. Why then is Tamil classed as CTL script? What is the
    > criteria?
    >
    > For example Tamil could easily be implemented without the need for any
    > complex rendering
    > However, Tamil is currently implemented using complex rendering.
    > This was one of the main discussions and I have not seen a viable answer
    > that catergorically states for such and such TECHNICAL reasons Tamil was
    > made one that requires Complex rendering.
    >
    > as for fixedwidth,
    >
    > For example, in Tamil, currently the one cell fixedwidth font is acheived
    > using 8bit encoding. It can not be obtained using Unicode as it stands.
    > Though, introducing cannonical forms can resolve this by enabling single
    > width for all necessary glyps.
    > Not only Terminal emulators, there are many electronic devices, such as
    > settop box, etc uses RIGID fixedwidth. I do not know how CJK handles
    > settop box, etc... Anyway, why move into unnesessary complexities, when
    > Tamil can work perfectly well as a non-CTL script or alternatively by
    > defining canonical forms to do away with complex rendering requirements!!
    >
    > I think regid-fixedwidth for Tamil with it's rendered form is NOT
    > ACHEIVABLE.
    >
    > As for publishing, attempt to use Unicode Tamil fails. If it is
    > acheivable,
    > when will it be ready?
    >
    > Again what is the criteria for stopping Tamil using workable solution and
    > what is the criteria for enforcing non-working solution?
    >
    > I think we can atleast move fast, if we introduce all necessary canonical
    > forms now, most of the publishing s/w may work with canonical forms.
    >
    > Sinnathurai
    >
    >
    > ----- Original Message -----
    > From: "Ed Trager" <ed.trager@gmail.com>
    > To: "Unicode Discussion" <unicode@unicode.org>
    > Sent: 08 February 2008 20:02
    > Subject: Re: minimizing size (was Re: allocation of Georgian letters)
    >
    >
    > Hi, everyone,
    >
    > Just a few brief comments on this thread:
    >
    >>
    >> Having flown halfway around the world to talk to people who for whatever
    >> reasons, both valid and invalid (and not really distinguishing which is
    >> which on their list of concerns), are unhappy with a language encoding
    >> that
    >> in their view doubles or worse the amount of bytes used to store their
    >> language in Unicode, I can tell you that this as very real concern on
    >> some
    >> people's minds.
    >>
    >> True or false, it is on their minds. They can all add and multiply, and
    >> it
    >> certainly looks like a 2x or 3x situation to them.
    >>
    >
    > Of course it is on their minds! Judging from the titles of emails in
    > my spam box, size really does matter. But apparently what humanity
    > really wants to do is MAXIMIZE the size, not minimize it. So a 2x or
    > 3x situation should be good. :-)
    >
    > On Feb 8, 2008 5:52 AM, Sinnathurai Srivas <sisrivas@blueyonder.co.uk>
    > wrote:
    >> 2/
    >> My question was, mostly all proper publishing softwares do not yet
    >> support
    >> complex rendering. How many years since Unicode come into being?
    >> When is this going to be resolved, or do we plan on choosing an
    >> alternative
    >> encoding as Unicode is not working.
    >>
    >
    > Unicode does in fact work very well. Implementing good Unicode
    > support for complex text layout (CTL) scripts like Tamil is
    > achievable. Not sure what "proper publishing software" includes --
    > For example, would that include http://ta.wikipedia.org/ ?
    >
    > From an economic perspective, when the markets in South and Southeast
    > Asia that require complex text layout look enticing enough to the
    > software vendors, then the problem will be solved. Is it possible
    > that rampant piracy of commercial software throughout Asia actually
    > contributes to the problem of poor support for many Asian scripts in
    > heavy-weight commercial software like Adobe InDesign? This question
    > might be a great topic of some student's research paper.
    >
    > Clearly the commercial players like Adobe InDesign and Quark XPress
    > and the non-commercial players like Scribus (http://www.scribus.net/)
    > are all working on providing support for CTL scripts. In this arena,
    > the Open Source players are influenced by a different set of driving
    > criteria than the commercial vendors: Does being Open Source encourage
    > faster development of non-Latin script support? This question might
    > be a great topic for some other student's research paper.
    >
    > In any case, the transparency of development in the Open Source world
    > allows one to find out exactly how things stand. For example, here is
    > the link to Scribus' "Support for Non-Latin Languages" meta-bug page:
    >
    > http://bugs.scribus.net/view.php?id=3965
    >
    > And in the case of Scribus, for example, one is welcome to contribute
    > well-documented test cases (sample Unicode text along with references
    > to fonts that are know to work correctly in other software) which the
    > developers can use for testing the software.
    >
    >> 3/
    >> As for bitmap, I meant the "Rigidly-fixed-width-character" requirements.
    >> At present, the complex rendering (which is not working yet in these
    >> systems) will produce extremly large width glyphs which will not be
    >> accomodated by "rigidly-fixedwidth- requirements. What is the plan to
    >> resolve this?
    >>
    >
    > The only place where "rigidly fixed width" characters are normally
    > required
    > that I can think of is in terminal emulators. Once upon a time I
    > investigated the idea of creating a terminal emulator --along with a
    > bitmap font-- that would support scripts like Myanmar (Burmese),
    > Tamil, etc. (Actually, from time to time, I still return to this
    > idea).
    >
    > In existing terminal emulators, Latin glyphs take up one character
    > cell each, while CJK glyphs are "double-width" and take up 2 character
    > cells each. The GNU Unifont BMP bitmap font originally designed by
    > Roman Czyborra (http://en.wikipedia.org/wiki/GNU_Unifont) provides a
    > good example of how this works: most of the glyphs are 8 pixels wide
    > by 16 pixels high, but the CJK glyphs are 16 pixels wide by 16 pixels
    > high.
    >
    > In the hypothetical system as I had envisioned it, glyphs other than
    > CJK glyphs could also be double-width. And, in fact, why limit
    > ourselves to widths of 1 and 2 character cells? When I was
    > investigating Myanmar, I thought that it actually would be *better* to
    > allow some glyphs to stretch across 3 or even 4 character cells.
    >
    > We can think of this hypothetical terminal emulator as having a
    > cartesian grid and glyphs of all scripts need to fit into discrete
    > "quantum" cells : 1, 2, 3, or 4. (Maybe one could even make an
    > argument for some glyph using up 5 quantum cells?)
    >
    > An experienced font designer (or team of designers) would then take up
    > the challenge of creating a font to use with this terminal emulator.
    > The font need not be a bitmap font -- it could just as easily be a
    > vector font. For the sake of argument, let's say we allow this
    > hypothetical terminal to use vector fonts (i.e., we could just make a
    > special kind of OpenType font which could even have embedded bitmaps
    > if desired).
    >
    > So for the various Latin blocks of Unicode we could start out with a
    > suitable "monospaced" font. In a Latin monospaced font, all letters
    > fit into fixed-width cells so that the advance distances on all glyphs
    > are the same. This obviously requires some special aesthetic
    > compromises, especially on the wide Latin letters like "m" and "w".
    >
    > To this originally "monospaced" font, we would now add additional
    > blocks of Unicode. We could pretty much continue working within our
    > "monospaced" design mantra through many blocks of Unicode -- until, of
    > course, we hit scripts like Devanagari, Tamil, Myanmar, Khmer, and so
    > on. Arabic too. At this point, our originally "monospaced" font
    > becomes no longer "monospaced". Let's give it a new name -- how about
    > "quantized font" or "quantum spaced font"? Or simply "quantum font" ?
    >
    > In this new quantum font, whenever an individual glyph became too
    > horribly "squished" to fit inside one quantum character cell, then we
    > would automatically try a 2-cell approach, and if even that did not
    > work, then go for a 3-or 4-cell approach.
    >
    > As a quick and familiar example, let's use Arabic script. On Linux,
    > the mlterm folks (http://mlterm.sourceforge.net/) have actually
    > produced a "multilingual" terminal that even handles RTL Arabic. This
    > is pretty cool. Mlterm uses GNU unifont for its Arabic glyphs.
    > Arabic in mlterm is readable, which is nice, but it is really ugly.
    > For example, terminal ARABIC LETTER SHEEN ش looks almost unbearably
    > *squished*. Clearly, wide arabic letters like isolated or terminal
    > ARABIC LETTER SHEEN ش or ARABIC LETTER SAAD ص would probably end up
    > looking *much* nicer if we just allowed them to occupy 2 character
    > cells. So, in this quantum font, most Arabic letters would still
    > occupy just one character cell, but a few would occupy up to 2
    > character cells.
    >
    > A similar principle would apply for the creation of the necessary
    > glyphs for scripts like Myanmar and Tamil -- except in these cases
    > there would be some glyphs that would necessarily take up 3 or even 4
    > character cells.
    >
    > Well that's my idea, for what it is worth. I even tried my hand at
    > creating a set of bitmap glyphs for Myanmar which could be added to
    > GNU Unifont. But after wasting a lot of time on this, I realized I
    > did not know how to write a terminal emulator. So, maybe someday I
    > will return to this outlandish project. After I have learned how to
    > write a terminal emulator.
    >
    > - Ed Trager
    >
    >
    >
    >
    >
    >
    >
    >
    >
    >



    This archive was generated by hypermail 2.1.5 : Sat Feb 09 2008 - 12:24:07 CST