Re: UTF-8 Corrigendum, new Glossary

From: G. Adam Stanislav (adam@whizkidtech.net)
Date: Tue Dec 05 2000 - 10:47:38 EST


On Thu, Nov 30, 2000 at 05:28:51PM -0800, David Starner wrote:
>Is that your rule in all cases, to try and guess what they meant and do
>that?

Not in all cases. But this particular Ister interpreter is designed
to run CGI scripts. When it comes to CGI languages, I have the philosophy
of graceful degradation: If I can interpret it, I will. Otherwise,
the user (the person who is browsing, not the webmster) might be confused.

> It'll be hell on anyone who has to try and interpret Ister if
>there's a large chunk of code that follows no standards, but was read by
>the original interpreter.

If it follows no standards, my interpreter will throw up hands. But if
the source code says it is in UTF-8, and I can decode it, I will. If it
says it is in Latin1 (or some other encoding), I will convert it to UTF-8.
In either case, my output will always be legal UTF-8.

 (Or even later versions of that interpreter -
>I've hung around the gcc lists long enough to know that people don't
>like "that's no longer supported" or even "that was never officially
>supported.")

I have been programming since 1965, and I have never said that. I have
always went to great pains to make sure later versions of my software
could handle the data expected by older versions. Or, in some cases,
I supplied conversion software, so old files could be converted to a
new format.

>Even if it works fine in the case of your interpreter, it'll come to
>problems when it gets fed through a UTF-8 conformant (or non-multi-byte
>aware) text tool that won't interpret over-long sequences. Especially
>non-multi-byte aware tools, since they will seem to work and silently
>get stuff wrong. It seems better just to refuse it, and force the buggy
>software to get fixed, than have a bunch of obscure bugs show up latter.

Well, the worst bug this particular language will produce is HTML with
the wrong text or tags. Presumably, any webmaster worth his keep will check
the output of his code before posting it on the web, and will fix his
source code.

All it does is convert from one mark-up language (Ister) to another (HTML/
SGML/XML), e.g., it will convert;

^p (Hello, World!^/br ^b (Here I come!))

to:

<p>Hello, World!<br /><b>Here I come!</b></p>

No big security issues at stake here. :)

Cheers,
Adam

-- 
When two do the same, it's not the same
		-- Slovak proverb



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:21:17 EDT