Properties

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue May 17 2011 - 14:01:18 CDT

Next message: fantasai: "Re: which scripts are written vertically"

Previous message: Michael Everson: "Re: which scripts are written vertically"
In reply to: Ken Whistler: "Re: which scripts are written vertically"
Next in thread: fantasai: "Re: Properties"
Reply: fantasai: "Re: Properties"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

On 5/16/2011 4:44 PM, Ken Whistler wrote:
> On 5/16/2011 3:33 PM, fantasai wrote:
>> .... If we had such a property it would need to state for
>> each script:
>
> First of all, the Unicode Standard does not have any formal notion of
> properties for *scripts*. The properties in the Unicode Character
> Database
> are properties for *characters* (and/or code points).

In many cases, what seem to be properties of scripts (like being written
RTL) turn out to apply differently to different characters. Where that
is the case, enumerating the property on a per character base is not
only useful, but necessary.

For properties that (if they apply) apply to the whole script (equally
for all characters) an enumeration by character would seem to have the
disadvantage of hiding that fact. Hence my suggestion, picked up by
fantasai, to phrase his suggestion in terms of scripts rather than
characters.

>
> That doesn't mean that there isn't a need for such, or that people don't
> in practice talk about scripts as if they had enumerated properties,
> both in the text of the standard (i.e., talking about alphabets versus
> abjads versus abugidas versus logosyllabaries, etc., talking about
> complex scripts versus simple scripts, talking about script layout
> directionality, talking about "Indic scripts") and in informal discussion
> about the standard and its implementation.

Some such "properties" of scripts are merely derived properties in the
sense that an "RTL script" would be "any script containing at least one
RTL character".

From a purely implementation point of view, it depends on whether
there's any determination of script runs that exists and whether
decisions can be made knowing only the script run and nothing else. For
bidi, to give an example, the script run could be sufficient in deciding
whether the run needs to be analyzed for bidi layout - an optimization
technique, that would benefit from creating such a derived property.

Many other classifications of scripts may not have the same
implementation relevance as character properties do. That is, they may
rarely be evaluated at run-time, even though they may be reflected in
the design of implementations geared towards certain scripts.

In any case, it's trivial to create derived script classifications if an
underlying character property exists.

>
> But the pushback you are encountering in this thread stems in part
> from the fact that the Unicode Standard is *not* a writing system
> or orthographical standard, and does not attempt to standardize issues
> of text layout, beyond the minimum required for plain text layout
> legibility. Hence the systematic lack of properties relating to
> any issues of vertical layout.

I thought the responses on this thread were, in some instances, less
than helpful, given that that request comes from an organization that
has the mandate to standardize these very issues, while Unicode
represents the necessary expertise of knowing *what* was encoded in it.
This includes general information about each of the scripts.

> ...
>
>> - If it has vertical directionality, how the text is transformed from
>> horizontal to vertical, i.e. are grapheme clusters rotated (laid
>> sideways wrt horizontal) or translated (kept upright like CJK).
>
> I don't think that is sufficient. You also need to answer the question
> of how scripts which on their own are always laid out in horizontal
> lines behave when mixed laid out in vertical text as text inclusions.
> This is the problem of which way to rotate Latin (or other) text
> when incorporated in Japanese (or Chinese) text laid out vertically.

The important aspect that was missing in the original request is what
effect the status of a script is expected to have in the context of CSS.
One thing that Unicode has discovered over the years is that it's nearly
impossible to define any character properties that are truly independent
of the algorithm(s) that plan to use them. To make the property useful,
certain assumptions must be made about how they are used, in some cases,
by publishing a single algorithm.

Ken's question highlights one possible use of this information. A more
detailed presentation of the requirements and how CSS would apply any
such classification would help in getting more useful answers.

A./

Next message: fantasai: "Re: which scripts are written vertically"
Previous message: Michael Everson: "Re: which scripts are written vertically"
In reply to: Ken Whistler: "Re: which scripts are written vertically"
Next in thread: fantasai: "Re: Properties"
Reply: fantasai: "Re: Properties"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Tue May 17 2011 - 14:05:53 CDT