The Unicode Consortium Discussion Forum

The Unicode Consortium Discussion Forum

 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Wed Jul 23, 2014 12:47 am

All times are UTC - 6 hours [ DST ]





Post new topic Reply to topic  [ 5 posts ] 
Author Message
 Post subject: Feedback on Unicode Technical Report #50
PostPosted: Wed Apr 04, 2012 9:01 am 
Offline

Joined: Thu Feb 11, 2010 4:58 am
Posts: 4
A. General comments

"The Default Vertical Orientation (short name dvo) property is intended to be used for vertical lines in those parts of the world where characters are mostly upright."

I do not know what "mostly upright" means here. Could this be defined more precisely, and some examples provided of what these parts of the world are?

"A number of scripts, such as Mongolian or Phags-pa, are used primarily in vertical lines, and have not developed a tradition of usage in horizontal lines. Similarly, some characters such as U+3031 VERTICAL KANA REPEAT MARK or the characers of the Vertical Forms block are intended for use primarily in vertical lines. For those scripts and characters, the Unicode code charts show the characters in the orientation and shape they have in vertical lines. It is beyond the scope of this report to describe how those scripts and characters are displayed in horizontal lines (for example, in discursive texts)."

I think that this should be within scope of this report, as it is very simple to specify compared with vertical layout, and would provide valuable information for implementors. I believe that Mongolian and Phags-pa are the only two primarily vertical scripts that have glyphs oriented for vertical layout in the code charts, so for Mongolian and Phags-pa the orientation for horizontal layout would be expressed by a new property denoting rotation 90 degrees counterclockwise wrt the code charts, but for all other characters (including U+3031 and the characters of the Vertical Forms block) the orientation for horizontal layout would be "U".

B. Yi syllables and radicals

A000..A48F ; S ; S ; S
A490..A4CF ; S ; S ; S

My experience is that the modern standardised Liangshan Yi script encoded in Unicode is written vertically with no glyph rotation, and so the properties should be:

A000..A48F ; U ; U ; U
A490..A4CF ; U ; U ; U

In my opinion the orientation of Old Yi is not relevant to the orientation of Unicode Yi as they are different scripts. See this thread on the Unicode mailing list for detailed discussion:

<http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/0102.html>

C. Mongolian

1800..18AF ; U ; U ; U

These properties are correct, and should not be changed.

D. Phags-pa

A840..A87F ; U ; S ; S

The S property is incorrect, and would result in an upsidedown glyph orientation compared with normal horizontal glyph orientation (which is 90 degrees counterclockwise wrt to the Unicode charts). Phags-pa follows Mongolian layout and orientation and should have the same properties as Mongolian, namely:

A840..A87F ; U ; U ; U

E. Ogham

1680..169F ; S ; S ; S

I guess this is correct, but I'm not sure. When Ogham is written vertically, it is normally written bottom-to-top with glyphs rotated counterclockwise 90 degrees wrt the Unicode code charts. The S property would result in the correct orientation for the less common top-to-bottom vertical layout, but may be confusing (and in many cases ambiguous) for readers expecting bottom-to-top orientation. On the other hand, defining a "rotate 90 degrees counterclockwise" property in place of "S" would not work as the whole text run has to be rotated counterclockwise, not the individual glyphs.

Andrew


Top
 Profile  
 
 Post subject: Re: Feedback on Unicode Technical Report #50
PostPosted: Tue May 01, 2012 8:49 am 
Offline

Joined: Mon Apr 30, 2012 9:53 am
Posts: 4
As Andrew had already pointed out as general issue, the expected usecase of
the default value should be clarified. I'm afraid the preferences can be
varied in the cases like:

A) it is easy to rotate the material to harmonize the writing direction
and the eye-moving for natively laid-out texts, and the amount of the
strings to be red is small.
Like the maps, the engineering drawings, the circuit diagrams, etc.

B) it is difficult to rotate the material, but the amount of the strings
to be red is small.

B-1) The recognization is required to be quick.
Like roadway signs, etc.

B-2) The recognization is not required to be so quick.
Like spines of the books.

C) it is difficult to rotate the material, and the amount of the strings
to be red is large.
Like main text in the books.

Taking the current draft, the figure 1 and figure 2 may have different
background, the figure 1 is questionable if it is good example to show
the usage of default value. In fact, the vertical Latin like figure 1
is popular, but we should not expect as if it is common to find the
English books including whole texts are vertically typesetted as figure 1.
The situation where vertical texts like figure 1 is preferred would be
clarified.

Stability of UTR#50.
----------------------
The current draft of UTR#50 does not clarify the policies for the
stabilization. It should be clarified which values are stabilized,
which values are unstabilized, or just placeholders.

As Yi case (see my comments in below), the values in current UTR#50 have
different background. Some values are discussed with the experts who use
them, and other values are supposed without concrete evidences. Sometimes
the evidences were selectively sampled and some evidences are rejected.
In addition, the possibility should be considered that the preference of
the glyph rotation is not stabilized in the users community of some scripts.
Or, when the members of the user community is changed and the preference
can changed.

The values determined by the discussions and the concrete evidences are
worthful to be kept compatible in future versions, but the guessed or
unstabilized values may be changed in future, so they should be excluded
from the stabilization.

Digraphs and Alphabetic ligatures
--------------------------------------
It seems that current draft (50-3) has no special rule for the digraphs and
alphabetic ligatures; U+01C4 - U+01CC, U+01F1 - U+01F3, U+FB00 - U+FB06.

I'm not sure if these characters are rendered with the values "U; S; S;".
Especially for the alphabetic ligatures, there is a possibility that these
characters are rendered by 2 alphabetic glyphs in vertical mode.
I'm suspicious about the hypothesis that the convention of the upright
Latin characters in vertical Japanese text was established with the
considerations of the digraphs, alphabetic ligatures, etc.

In fact, recently there was a discussion about the different requirement of
the alphabetical ligature in horizontal & vertical writing mode in OpenType
and ISO/IEC 14496-22 mailing list, raised by
http://lists.w3.org/Archives/Public/www ... /0650.html
It was told that "fi" ligature should not appear in the vertical writing
mode for Japanese market.

Hiragana / Katakana voiced sound mark
---------------------------------------------

The "uncombining" voiced sound mark U+309B, U+309C are rendered with the
values "U; U; U;". Modern orthography in Japan expects the voiced/semi-
voiced marks are put to the upper-light corner of the base characters.
In horizontal writing mode, the base character is rendered first, then,
the voiced/semi-voiced marks are rendered, so, usually the glyphs of
U+309B & U+309C designed for horizontal writing mode puts the marks on
the upper-left corners. If such (designed for horizontal writing mode)
glyphs are used in vertical writing mode, the result would be looking
like "a voiced/semi-voiced mark attached to the lower-left corner of
the previous character". I don't think it is expected result.

In fact, Adobe-Japan1 character collection has different glyphs for
U+309B & U+309C for vertical writing mode; CID+8171 & CID+8172. Thus,
the expected value would be "T; T; T;".

In addition, the glyph reordering or ligature formation is needed to render
with expected result in vertical writing mode. The voiced/semi-voiced marks
are expected to be put on the UPPER-right corner of the base character.
Thus, during top-to-bottom writing mode, the rendering of the voiced/semi-
voiced mark BEFORE the base character is expected.

However, the regular characters with voiced/semi-voiced characters are
already coded, so, I'm not sure what is the most popular usecase of the
uncombining voiced/semi-voiced marks. Some linguists may use them to
express the sound that cannot be expressed by standard kana, others may
use the irregular combinations just for joking.

N'Ko
-----
According to the rotated N'Ko strings in maps:
http://catalogingafricana.files.wordpre ... apada1.jpg
http://catalogingafricana.files.wordpre ... cript1.pdf
it is questionable if the values for N'Ko "U; S; S;" are appropriate.
There is a possibility that "S; S; S;" might be preferred.

* Yi
About UCS Yi, I support Andrew's comment. Some scanned images and photos
for Liangshan syllabicalized Yi in vertical writing mode without glyph
rotation are provided, but no strict evidences of the vertical text with
glyph rotation are given. If the samples by Andrew and me should be excluded,
the criteria for selective sampling should be clarified.

Hanunoo
----------

Hanunoo script is often mentioned with its special writing mode; from
bottom to top, but the glyph is always fitting to the writing direction
http://scriptsource.org/cms/scripts/pag ... Hano&_sc=1
http://www.aa.tufs.ac.jp/i-moji/tenji/s ... mg/B03.jpg
If these description are true (and valid discussion for modern users),
the values for Hanunoo should be "S; S; S;" instead of "U; S; S;".
However I'm questionable if these scripts are used with same preferences
as ever.

Arabic
--------

It is unclear if the glyph shaping of Arabic scripts is needed when
the rendering is done with default "upright" mode. If the glyphs are
disconnected from previous/next consonants, they should be rendered
with isolated form? If so, how the codepoints in Arabic presentation
form blocks should be dealt?

Also I note I could not find the book spines with the Arabic string
with upright mode.

Brahmic scripts (Indic, South-East Asian)
----------------------------------------------

It is unclear if the clustering is needed when the rendering is done
with default "upright". If clustering is required, how the definition
of the grapheme is expected. The consonant and combining/enclosing
vowel signs should not be separated, but spacing signs & marks can
be separated? For example, tone marks of Tai Le are spacing characters.
They are expected to be put to the right neighborhood of previous
letters? Or, we can put them as different/isolated character?


Top
 Profile  
 
 Post subject: Re: Feedback on Unicode Technical Report #50
PostPosted: Fri May 04, 2012 4:14 am 
Offline

Joined: Mon Apr 30, 2012 9:53 am
Posts: 4
My previous post was to revision 3, but most parts are applicable to revision 4.
In revision 4, there might be some inconsistency about the value descriptions.
In Table 1, U, R, T, Tu, Tr are listed. In 5.2, there is a note using values in older revisions,
"with the value SB interpreted as equivalent to S. "


Top
 Profile  
 
 Post subject: Re: Feedback on Unicode Technical Report #50
PostPosted: Fri May 18, 2012 9:33 am 
Offline
Unicode Guru

Joined: Fri Dec 04, 2009 9:25 pm
Posts: 76
BabelStone wrote:
A. General comments

"The Default Vertical Orientation (short name dvo) property is intended to be used for vertical lines in those parts of the world where characters are mostly upright."

I do not know what "mostly upright" means here. Could this be defined more precisely, and some examples provided of what these parts of the world are?


I think the introduction addresses that, with figure 2.

BabelStone wrote:
"A number of scripts, such as Mongolian or Phags-pa, are used primarily in vertical lines, and have not developed a tradition of usage in horizontal lines. Similarly, some characters such as U+3031 VERTICAL KANA REPEAT MARK or the characers of the Vertical Forms block are intended for use primarily in vertical lines. For those scripts and characters, the Unicode code charts show the characters in the orientation and shape they have in vertical lines. It is beyond the scope of this report to describe how those scripts and characters are displayed in horizontal lines (for example, in discursive texts)."

I think that this should be within scope of this report, as it is very simple to specify compared with vertical layout, and would provide valuable information for implementors. I believe that Mongolian and Phags-pa are the only two primarily vertical scripts that have glyphs oriented for vertical layout in the code charts, so for Mongolian and Phags-pa the orientation for horizontal layout would be expressed by a new property denoting rotation 90 degrees counterclockwise wrt the code charts, but for all other characters (including U+3031 and the characters of the Vertical Forms block) the orientation for horizontal layout would be "U".


Done in rev. 5, with the Horizontal Orientation property.


BabelStone wrote:
B. Yi syllables and radicals

A000..A48F ; S ; S ; S
A490..A4CF ; S ; S ; S

My experience is that the modern standardised Liangshan Yi script encoded in Unicode is written vertically with no glyph rotation, and so the properties should be:

A000..A48F ; U ; U ; U
A490..A4CF ; U ; U ; U

In my opinion the orientation of Old Yi is not relevant to the orientation of Unicode Yi as they are different scripts. See this thread on the Unicode mailing list for detailed discussion:

<http://www.unicode.org/mail-arch/unicode-ml/y2012-m03/0102.html>


Done in revision 5.

BabelStone wrote:
D. Phags-pa

A840..A87F ; U ; S ; S

The S property is incorrect, and would result in an upsidedown glyph orientation compared with normal horizontal glyph orientation (which is 90 degrees counterclockwise wrt to the Unicode charts). Phags-pa follows Mongolian layout and orientation and should have the same properties as Mongolian, namely:

A840..A87F ; U ; U ; U


Noted in the consolidated feedback. Sorry I missed this in rev 5.

BabelStone wrote:

E. Ogham

1680..169F ; S ; S ; S

I guess this is correct, but I'm not sure. When Ogham is written vertically, it is normally written bottom-to-top with glyphs rotated counterclockwise 90 degrees wrt the Unicode code charts. The S property would result in the correct orientation for the less common top-to-bottom vertical layout, but may be confusing (and in many cases ambiguous) for readers expecting bottom-to-top orientation. On the other hand, defining a "rotate 90 degrees counterclockwise" property in place of "S" would not work as the whole text run has to be rotated counterclockwise, not the individual glyphs.

Andrew


I read this as: "when Ogham is written vertically, the lines are formed as if they were horizontal and the whole line is rotated". Is that correct?

Thanks,
Eric.


Top
 Profile  
 
 Post subject: Re: Feedback on Unicode Technical Report #50
PostPosted: Fri May 18, 2012 11:00 am 
Offline
Unicode Guru

Joined: Fri Dec 04, 2009 9:25 pm
Posts: 76
mpsuzuki wrote:
Taking the current draft, the figure 1 and figure 2 may have different
background, the figure 1 is questionable if it is good example to show
the usage of default value. In fact, the vertical Latin like figure 1
is popular, but we should not expect as if it is common to find the
English books including whole texts are vertically typesetted as figure 1.
The situation where vertical texts like figure 1 is preferred would be
clarified.


I that the renaming of "default vertical orientation" to "stacked vertical orientation" should help clarify the situation.

There is another thing I have not been able to put in words yet: there is on the one hand the "mode of formation" of a line, which is mostly about how characters are arranged inside a line, and on the other hand the orientation of the line as a whole in whatever page it appears. The properties are really about the "mode of formation". In my opinion, the examples of Canadian Syllabics which have been shown on unicode@unicode.org are really "horizontally formed lines" which happen to be arranged on the page so that their main axis is vertical. Similarly in most maps. With that in mind, figure 1 (now 2 in rev 5) is a good example of the "stacked vertical orientation".


mpsuzuki wrote:
Stability of UTR#50.
----------------------
The current draft of UTR#50 does not clarify the policies for the
stabilization. It should be clarified which values are stabilized,
which values are unstabilized, or just placeholders.


There is a clear statement today: "The properties and algorithms presented in this report are informative." Of course, we can decide to do something else.


mpsuzuki wrote:
In fact, recently there was a discussion about the different requirement of
the alphabetical ligature in horizontal & vertical writing mode in OpenType
and ISO/IEC 14496-22 mailing list, raised by
http://lists.w3.org/Archives/Public/www ... /0650.html
It was told that "fi" ligature should not appear in the vertical writing
mode for Japanese market.


The discussion in the context of OpenType is a bit different. The issue is that the "mode of formation" needs to be accessible to the font, but currently is not.

mpsuzuki wrote:
N'Ko
-----
According to the rotated N'Ko strings in maps:
http://catalogingafricana.files.wordpre ... apada1.jpg
http://catalogingafricana.files.wordpre ... cript1.pdf
it is questionable if the values for N'Ko "U; S; S;" are appropriate.
There is a possibility that "S; S; S;" might be preferred.


I interpret those cases as "horizontally formed lines" which are then oriented in a meaningful way for the context.



mpsuzuki wrote:
Brahmic scripts (Indic, South-East Asian)
----------------------------------------------

It is unclear if the clustering is needed when the rendering is done
with default "upright". If clustering is required, how the definition
of the grapheme is expected. The consonant and combining/enclosing
vowel signs should not be separated, but spacing signs & marks can
be separated? For example, tone marks of Tai Le are spacing characters.
They are expected to be put to the right neighborhood of previous
letters? Or, we can put them as different/isolated character?



The properties of UTR50 differ from the other Unicode properties in that they don't come with a "built-in" notion of grapheme cluster (e.g. the CM, JL, JV, JT property values in linebreak). This is on purpose. It is likely that in Indic scripts, the proper unit for orientation is of the order of an akshara, but there is no such definition in Unicode today.

Eric


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 5 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com