Re: 32'nd bit & UTF-8

From: Hans Aberg (haberg@math.su.se)
Date: Thu Jan 20 2005 - 14:46:57 CST

  • Next message: Hans Aberg: "Re: Subject: Re: 32'nd bit & UTF-8"

    On 2005/01/20 16:15, Arcane Jill at arcanejill@ramonsky.com wrote:

    >> As a standard, Unicode will have to fight for recognition.
    >
    > Mebbe, but it doesn't have a lot of competition.

    It is dangeorus for Unicode to assume that it does not have any competion,
    and arrogantly ignore the issues that the users put forth. UNIX'es will
    strip out the BOM anyway, the seems clear, becuse it deos not fit into thier
    file and streams model.

    >> Just as I, and others will, oppose the UTF-8 BOM requirement for good
    >> reasons.
    >
    > Are we all clear about what the BOM requirement actually /is/, by the way?
    > Unicode does NOT require that all UTF-8 text files must begin with a BOM; it
    > only requires that conformant processes can recognize and handle the BOM
    > character /if/ it should be found.

    So UNIX processes are not, and will not be, Unicode UTF-8 process conformant
    as long as the BOM requirement remains.

    >> You are drawing this analogue too far, because it is fairly easy to fix the
    >> \r\n problem, whereas the BOM problem runs deeper. The latter changes the
    >> very paradigm for file representation.
    >
    > I don't see why. What is the difference between discarding U+000Ds and
    > discarding U+FEFFs ?

    This has widely discussed in other posts. In fine, it runs a great deeper
    into the UNIX OS. See the posts by Marcin 'Qrczak' Kowalczyk, or
    <http://www.cl.cam.ac.uk/~mgk25/unicode.html>.

      Hans Aberg



    This archive was generated by hypermail 2.1.5 : Thu Jan 20 2005 - 14:50:19 CST