Re: Tag characters and in-line graphics (from Tag characters) from Martin J. Dürst on 2015-06-02 (Unicode Mail List Archive)

From: Martin J. Dürst <duerst_at_it.aoyama.ac.jp>
Date: Wed, 3 Jun 2015 10:09:09 +0900

On 2015/06/03 07:55, Chris wrote:

> As you point out, "The UCS will not encode characters without a demonstrated usage.”. But there are use cases for characters that don’t meet UCS’s criteria for a world wide standard, but are necessary for more specific use cases, like specialised regional, business, or domain specific situations.

Unicode contains *a lot* of characters for specialized regional,
business, or domain specific situations.

> My question is, given that unicode can’t realistically (and doesn’t aim to) encode every possible symbol in the world, why shouldn’t there be an EXTENSIBLE method for encoding, so that people don’t have to totally rearchitect their computing universe because they want ONE non-standard character in their documents?

As has been explained, there are technologies that allow you to do (more
or less) that. Information technology, like many other technologies,
works best when finding common cases used by many people. Let's look at
some examples:

Character encodings work best when they are used widely and uniformly. I
don't know anybody who actually uses all the characters in Unicode
(except the guys that work on the standard itself). So for each
individual, a smaller set would be okay. And there were (and are)
smaller sets, not for individuals, but for countries, regions, scripts,
and so on. Originally (when memory was very limited), these legacy
encodings were more efficient overall, but that's no longer the case. So
everything is moving towards Unicode.

Most Website creators don't use all the features in HTML5. So having
different subsets for different use cases may seem to be convenient. But
overall, it's much more efficient to have one Hypertext Markup Language,
so that's were everybody is converging to.

From your viewpoint, it looks like having something in between
character encodings and HTML is what you want. It would only contain the
features you need, and nothing more, and would work in all the places
you wanted it to work. Asmus's "inline" text may be something similar.

The problem is that such an intermediate technology only makes sense if
it covers the needs of lots and lots of people. It would add a third
technology level (between plain text and marked-up text), which would
divert energy from the current two levels and make things more complicated.

Up to now, such as third level hasn't emerged, among else because both
existing technologies were good at absorbing the most important use
cases from the middle. Unicode continues to encode whatever symbols that
gain reasonable popularity, so every time somebody has a "real good use
case" for the middle layer with a symbol that isn't yet in Unicode, that
use case gets taken away. HTML (or Web technology in general) also
worked to improve the situation, with technologies such as SVG and Web
Fonts.

No technology is perfect, and so there are still some gaps between
character encoding and markup, some of which may in due time eventually
be filled up, but I don't think a third layer in the middle will emerge
soon.

Regards, Martin.
Received on Tue Jun 02 2015 - 20:09:54 CDT

This archive was generated by hypermail 2.2.0 : Tue Jun 02 2015 - 20:09:54 CDT