Properties

From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Tue May 17 2011 - 14:01:18 CDT

  • Next message: fantasai: "Re: which scripts are written vertically"

    On 5/16/2011 4:44 PM, Ken Whistler wrote:
    > On 5/16/2011 3:33 PM, fantasai wrote:
    >> .... If we had such a property it would need to state for
    >> each script:
    >
    > First of all, the Unicode Standard does not have any formal notion of
    > properties for *scripts*. The properties in the Unicode Character
    > Database
    > are properties for *characters* (and/or code points).

    In many cases, what seem to be properties of scripts (like being written
    RTL) turn out to apply differently to different characters. Where that
    is the case, enumerating the property on a per character base is not
    only useful, but necessary.

    For properties that (if they apply) apply to the whole script (equally
    for all characters) an enumeration by character would seem to have the
    disadvantage of hiding that fact. Hence my suggestion, picked up by
    fantasai, to phrase his suggestion in terms of scripts rather than
    characters.

    >
    > That doesn't mean that there isn't a need for such, or that people don't
    > in practice talk about scripts as if they had enumerated properties,
    > both in the text of the standard (i.e., talking about alphabets versus
    > abjads versus abugidas versus logosyllabaries, etc., talking about
    > complex scripts versus simple scripts, talking about script layout
    > directionality, talking about "Indic scripts") and in informal discussion
    > about the standard and its implementation.

    Some such "properties" of scripts are merely derived properties in the
    sense that an "RTL script" would be "any script containing at least one
    RTL character".

     From a purely implementation point of view, it depends on whether
    there's any determination of script runs that exists and whether
    decisions can be made knowing only the script run and nothing else. For
    bidi, to give an example, the script run could be sufficient in deciding
    whether the run needs to be analyzed for bidi layout - an optimization
    technique, that would benefit from creating such a derived property.

    Many other classifications of scripts may not have the same
    implementation relevance as character properties do. That is, they may
    rarely be evaluated at run-time, even though they may be reflected in
    the design of implementations geared towards certain scripts.

    In any case, it's trivial to create derived script classifications if an
    underlying character property exists.

    >
    > But the pushback you are encountering in this thread stems in part
    > from the fact that the Unicode Standard is *not* a writing system
    > or orthographical standard, and does not attempt to standardize issues
    > of text layout, beyond the minimum required for plain text layout
    > legibility. Hence the systematic lack of properties relating to
    > any issues of vertical layout.

    I thought the responses on this thread were, in some instances, less
    than helpful, given that that request comes from an organization that
    has the mandate to standardize these very issues, while Unicode
    represents the necessary expertise of knowing *what* was encoded in it.
    This includes general information about each of the scripts.

    > ...
    >
    >> - If it has vertical directionality, how the text is transformed from
    >> horizontal to vertical, i.e. are grapheme clusters rotated (laid
    >> sideways wrt horizontal) or translated (kept upright like CJK).
    >
    > I don't think that is sufficient. You also need to answer the question
    > of how scripts which on their own are always laid out in horizontal
    > lines behave when mixed laid out in vertical text as text inclusions.
    > This is the problem of which way to rotate Latin (or other) text
    > when incorporated in Japanese (or Chinese) text laid out vertically.

    The important aspect that was missing in the original request is what
    effect the status of a script is expected to have in the context of CSS.
    One thing that Unicode has discovered over the years is that it's nearly
    impossible to define any character properties that are truly independent
    of the algorithm(s) that plan to use them. To make the property useful,
    certain assumptions must be made about how they are used, in some cases,
    by publishing a single algorithm.

    Ken's question highlights one possible use of this information. A more
    detailed presentation of the requirements and how CSS would apply any
    such classification would help in getting more useful answers.

    A./



    This archive was generated by hypermail 2.1.5 : Tue May 17 2011 - 14:05:53 CDT