Re: folding UTF-8

From: Doug Ewell (dewell@adelphia.net)
Date: Fri Aug 25 2006 - 00:49:38 CDT

Next message: Rick McGowan: "Unicode FAQ pages updated"

Previous message: Oliver Block: "folding UTF-8"
In reply to: Oliver Block: "folding UTF-8"
Next in thread: Kenneth Whistler: "Re: folding UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Oliver Block <lists at block dash online dot eu> wrote:

> definition C12a of Unicode Standard Version 4.0 mentions so "mangled"
> text caused by folding (last paragraph of C12a).
>
> Having the definition in mind (italic text at the top of C12a) I
> understand mangled text as ill-formed text, that is not according to
> table 3-6. Would you agree/disagree?

It is ill-formed text of a special type: it would have been well-formed
if not for an easily recognized, external process or layer -- the
example mentions inserting a CR/LF pair every 80 bytes -- that can
easily and unequivocally be reversed.

Definition C12a states that a process may interpret such data, but goes
on to say, "However, such repair of mangled data is a special case, and
it must not be used in circumstances where it would cause securtiy
problems." I think it is clear that the intent of C12a is not to allow
a conformant process to interpret just any old random junk as if it were
well-formed UTF-8.

> Further, what about combining character sequences? Inserting a CRLF
> between a base character and a combining charcter or between one of
> the combining characters would not produce an ill-formed
> byte-sequence. Would you agree/disagree?

I would agree, but I have the feeling this was intended to be relevant
to the "mangled text" question above and I don't see the connection.

> (As every specification that requires folding does also require
> unfolding, this would probably be more a semantic issue.)

I do not agree that every specification that requires folding also
requires unfolding.

--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/

Next message: Rick McGowan: "Unicode FAQ pages updated"
Previous message: Oliver Block: "folding UTF-8"
In reply to: Oliver Block: "folding UTF-8"
Next in thread: Kenneth Whistler: "Re: folding UTF-8"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Fri Aug 25 2006 - 01:00:02 CDT