RE: [indic] Re: Feedback on PR-104

From: Peter Constable (
Date: Sun Aug 19 2007 - 08:31:04 CDT

  • Next message: Michael Everson: "[indic] Re: Feedback on PR-104"


    You’ve been writing a number of messages with statements like “Unicode breaks Tamil Grammar and linguistic science” or “Unicode does not understand pulli… or virama either”. I believe you have a mistaken expectation.

    In your message below, you say that a spectrum analysis will reveal that there is no such thing as a conjunct and that /ksh/ is two consonants. Spectra analyses and Unicode have no connection whatsoever. A spectrum analysis is applied to a wave phenomenon in nature. In this case, you are relating it to speech utterances. Unicode does not deal with speech sounds. Rather, Unicode pertains to the real of graphic characters – the visual realm, not the audio realm.

    The notion of conjunct may not be relevant to phonetics and phonology, but it certainly is relevant in descriptions of writing, and even more relevant in descriptions of computer implementations of text.

    I will not debate the assertion that /ksh/ in Tamil phonology is a single phoneme. But note that Unicode does not encoded phonemes. In encodes abstract text units that can be used to provide an encoded representation of text. In Unicode, the entity of Tamil *writing* “க்ஷ” is given an encoded representation as a sequence of three encoding elements: 0B95 + 0BCE + 0BB7. That encoded representation is not intended to be a commentary on Tamil grammar or phonology. It is nothing more than an encoded representation of the writing element for use in electronic text.

    In this case, the elements in Unicode’s encoded representation for an entity of Tamil writing, “க்ஷ”, do not correspond in a one-to-one manner with the Tamil phoneme represented by that written entity. But there is no requirement that Unicode does so. The only requirement on Unicode is that it be able to represent that written entity and to distinguish it from other written entities.

    Now, Unicode *could* have used a different encoded representation in which there is a one-to-one relationship; but because of pre-existing standards, it doesn’t. That doesn’t mean that Unicode is wrong, or that it breaks linguistic science. It is what it is, and it is perfectly adequate for its intended purposes in relation to representation of Tamil *text*.

    As long as you expect Unicode to reflect Tamil phonology or grammar, you will be disappointed, and you will have misunderstood the purpose. It is a wrong expectation.


    From: [] On Behalf Of Sinnathurai Srivas
    Sent: Thursday, August 16, 2007 5:02 PM
    To: N. Ganesan
    Cc: Unicode Mailing List;
    Subject: [indic] Re: Feedback on PR-104

    There is no conjunct in Tamil. There is no conjunct in science. Run an spectrum analysis and it will reveal that there is no such thing as conjunct. ksh is two consonants, will naturally occure with vowels. x is another one. So Tamil can write what islam or English or sanskrit tries to write.

    There is no conjunct in Tamil. Unicode breaks Tamil Grammar and lingustic science.

    "N. Ganesan" <> wrote:
    > There is no conjunct in Tamil

    The Grantha conjunct, exists in Tamil like all other Indian scripts.
    It is made up of two consonants, KA & SSA, hence a conjunct.
    We use ZWNJ to break the conjunct for Islamic, English loan words
    into Tamil.

    Pl. refer to TUS for the conjunct in India's scripts.

    N. Ganesan

    Which email service gives you unlimited storage<*http:/*http:/>?

    This archive was generated by hypermail 2.1.5 : Sun Aug 19 2007 - 08:37:18 CDT