From: Mark Davis (email@example.com)
Date: Tue Jan 06 2009 - 16:23:22 CST
I'll second Asmus.
We cannot forget that when we set out to design Unicode, pragmatic
interoperability was a key goal. We never had a "pure" standard, divorced
from such concerns.
(I hear various comments about "industrial Goliaths" on these threads -- but
without the support and active involvement of industry and governmental
organizations, Unicode would have been an interesting academic exercise, but
no more than an academic exercise. And if anyone on this list is interested
in academic exercises, they are free to start their own.)
Instead, our goal was to produce a standard that would allow us to have as
consistent an architecture as possible -- to enable effective and efficient
implementations -- *and* to interoperate well with a host of standards and
practices in widespread use: national, international, and vendor. The
Unicode Standard has added misc symbols many times before: Dingbats, Misc
technical symbols, ARIB symbols, etc., just for that reason.
Of course the scope of Unicode changed over time. Initially, for example, we
were not really aiming at encoding archaic scripts. I think at one time we
had excluded encoding Braille as well. But it continues to be driven by
pragmatic concerns of interoperability. And having the emoji symbols encoded
will be far more useful to many more people than, say, the Phaistos disk
On Tue, Jan 6, 2009 at 08:42, Asmus Freytag <firstname.lastname@example.org> wrote:
> On 1/5/2009 9:57 PM, James Kass wrote:
>> Peter Constable wrote,
>>> But you're blurring the lines between plain text and markup: what you're
>>> suggesting *is* markup, but you're just calling it plain text.
>> HTML source files are plain-text files. The mark-up
>> contained within consists of plain-text strings. Using
>> plain-text strings as mark-up blurs no lines, it's done
>> every day.
> On the source level, HMTL can be edited as plain text. But to be HTML, it
> has to be recognized as such. Files are tagged with htm(l) extensions, or
> with other means of externally identifying their type, and internally, they
> contain announcers of their HTML nature and how strictly they observe the
> protocol definition.
> That's different from having data that should be 'raw' plain text contain
> unannounced markup sequences.
>> On the other hand, encoding icons as text crosses the
>> line between plain-text and rich-text.
> That's a matter of opinion on which there simply isn't the iron-clad
> consensus in each instance that you are proclaiming in this discussion.
> The support of HEART, SMILING FACE etc. in plain text data streams precedes
> Unicode and goes back nearly thirty years (to the original IBM PC character
> set for one). Extending the existing list of "icons" in Unicode is not the
> dramatic step that you and other have made it out to be. Yes, doing so uses
> code space, and yes, some of the proposed symbols have novel renderings in
> some contexts, all good reasons to have a discussion and proceed with
> deliberation, but its not the sea-change that it's made out to be.
> Adding 40,000 rare ideographs and spelling out 11,192 Hangul syllables, on
> the other hand, *was* an example of a game changer: it required going from
> UCS-2 to UTF-16 or UTF-32. Yet the current discussion is dragged out to the
> degree that it's beginning to rival the discussions of that earlier, far
> more significant step.
This archive was generated by hypermail 2.1.5 : Tue Jan 06 2009 - 16:25:42 CST