RE: Coloured diacritics (Was: Transcoding Tamil in the presence of markup)

From: jon@hackcraft.net
Date: Tue Dec 09 2003 - 11:04:54 EST

  • Next message: Michael Everson: "RE: Glottal stops (bis) (was RE: Missing African Latin letters (bis))"

    > > You might as well say that C code is not plain text because it too is
    > > subject to special canons of interpretation.
    >
    > C, C++ and Java source files are not plain text as well (they have their own

    C, C++ and Java source files are plain text.

    > "text/*" MIME type, which is NOT "text/plain" notably because of the rules

    I've seen text/cpp and text/java, but really there are no such types. I've also
    seen text/x-source-code which is at least legal, if of little value to
    interoperability.

    The correct MIME type for C and C++ source files is text/plain. I'd be prepared
    to give good odds that that is the case with Java source files as well.

    > associated with end-of-lines, notably in presence of comments).

    As source files (that is, at the stage in processing at which a human user can
    see the source and edit it) the only handling required for end-of-lines is
    converstion of new line function characters, the same as for any other use of
    plain text.

    The treatment of end-of-lines as significant when processed (for example
    following one-line // comments) is a matter of what an application chooses to do
    with a particular character. This is no different than an indexer deciding that
    a plain text file contains a particular word, or for that matter in my putting
    coffee filters into my basket if I see "coffee filters" written on my shopping
    list.

    > > But both XML/HTML/SGML and the various programming languages are plain
    > text.
    >
    > See "text/xml", "text/html" and "text/sgml" MIME types. They also aren't
    > "text/plain" so they have their own interpretation of Unicode characters
    > which is not the one found in the Unicode standard.

    They have their own interpretation of tne Unicode characters which is *in
    addition to* the one found in the Unicode standard. As to all but the simplest
    applications that use Unicode (as interesting as many of them are, characters
    are of little use on their own).



    This archive was generated by hypermail 2.1.5 : Tue Dec 09 2003 - 11:59:01 EST