The Unicode Consortium Discussion Forum

The Unicode Consortium Discussion Forum

 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Fri Apr 18, 2014 8:54 pm

All times are UTC - 6 hours [ DST ]





Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: Some additional UTR #50 comments
PostPosted: Mon Oct 31, 2011 10:53 am 
Offline

Joined: Wed Feb 10, 2010 5:00 pm
Posts: 17
Location: San Jose, CA, USA, Earth
I jotted down a few miscellaneous thoughts about UTR #50 over the weekend. Some of these have been posted in other threads. This thread includes the remaining thoughts.

1) When declaring that a glyph is to be rotated 90 degrees clockwise for vertical writing (Categories S and SB), the operation may not be so simple. Depending on which coordinate is used as the pivot point, and the relative baselines of the scripts covered by the font, there may be X-axis shifting that is necessary after rotation. In other words, applications may need to dig into the font to figure out the parameters to use for any shifting that is necessary. This is one reason why vertical variants of what appear to be glyphs that were mechanically rotated are included in fonts. It is also one of the reasons why the 'vrt2' GSUB feature was defined in the first place.

2) There is an expectation that the same text can be purposed for both writing directions, which is another reason why substitutions are used, and in some cases may seem abusive, meaning that the result can be considered a completely different character. If it is unknown whether the text will be set in horizontal or vertical orientation, this makes sense. The most abusive vertical substitutions are for Chinese, as described in GB 15834-1995, specifically that U+2018, U+2019, U+201C, and U+201D become rotated versions of U+300C, U+300D, U+300E, and U+300F, respectively.

3) In cl-05, I think that U+FF1A should be changed from U to S, at least for Japanese. JIS X 0213 defines that the vertical version of this character is rotated.

4) In cl-06, I think that U+FF0E should be changed from U to T. Likewise, in cl-07, I think that U+FF0C should be changed from U to T.

5) There are some regional differences. In China, for example, U+FF01 and U+FF1F (cl-04), and U+FF1A and U+FF1B (cl-05) should be shifted up and to the right, similar to small kana in Japanese, and should thus be changed from U to T, at least for Chinese.

6) Should there be a way to indicate region-specific differences, such as #5 above? Add a column for region in the datafiles?

7) Should U+3000 (cl-14) be S instead of U? I suggest this, because for fonts that include proportional or non-full-width glyphs by default, such as for kana, hangul, or ideographs, the glyph for U+3000 is likely not to be full-width, and will have a width that works well with the glyphs from those scripts. This suggests that in order to capture the same glyph width in vertical that the glyph should be rotated, and not set upright.

I hope this helps...


Top
 Profile  
 
 Post subject: Re: Some additional UTR #50 comments
PostPosted: Mon Oct 31, 2011 11:14 pm 
Offline
Unicode Guru

Joined: Fri Dec 04, 2009 9:25 pm
Posts: 76
lunde wrote:
1) When declaring that a glyph is to be rotated 90 degrees clockwise for vertical writing (Categories S and SB), the operation may not be so simple.


There are many adjustments which are beyond the scope of Unicode. For example, the use of baselines, even in horizontal text, is not in scope, so I don't think we need to address that wrt. orientation. So far, I believe that the T orientation captures all the cases where some shifting is needed in plain text (or equivalently, in a piece of text with a single, somewhat simple, style); example: the small kanas. Do we need more cases?

lunde wrote:
It is also one of the reasons why the 'vrt2' GSUB feature was defined in the first place.


My memory is that 'vrt2' was primarily motivated by the @fonts in Windows, and desire to not have any table such as TR#50 in ATM, when it was extended to cover OpenType fonts. I am also not aware of any implementation that uses 'vrt2', other than ATM on Windows.

lunde wrote:
2) There is an expectation that the same text can be purposed for both writing directions


JLREQ gives clear cases of different text content between horizontal and vertical; e.g. section 3.2.1, Note 2:

Quote:
In vertical writing mode, symbols for units are usually described with katakana (cl-16), such as センチメートル (centimeter) or センチ(abbreviation of centimeter in katakana, "senchi"). In horizontal writing mode, the International System of Units (SI) is usually used, such as "cm".


It's clear that some machinery is needed in documents which are intended to let the reader choose the display orientation. For more algorithmic transformations such the 2018->300C you describe, CSS could incorporate the transformation via text-transform. In any case, I don't think that such transformations belong to rendering engines.

lunde wrote:
3) In cl-05, I think that U+FF1A should be changed from U to S, at least for Japanese. JIS X 0213 defines that the vertical version of this character is rotated.


JIS does not account for the fullwidth characters. The approach taken in the current draft, for those characters which exist as a pair "regular"/fullwidth, is to use U for the fullwidth and S for the other. AFAICT, this is consistent with the implementations which recognize both kinds of characters, such as InDesign.

lunde wrote:
4) In cl-06, I think that U+FF0E should be changed from U to T. Likewise, in cl-07, I think that U+FF0C should be changed from U to T.


Same situation as above.

lunde wrote:
5) There are some regional differences. In China, for example, U+FF01 and U+FF1F (cl-04), and U+FF1A and U+FF1B (cl-05) should be shifted up and to the right, similar to small kana in Japanese, and should thus be changed from U to T, at least for Chinese.


Noted.


lunde wrote:
6) Should there be a way to indicate region-specific differences, such as #5 above? Add a column for region in the datafiles?


Definitely a possibility, but I am not sure we have good data at this point.

lunde wrote:
7) Should U+3000 (cl-14) be S instead of U?


Noted.


Top
 Profile  
 
 Post subject: Re: Some additional UTR #50 comments
PostPosted: Tue Nov 01, 2011 11:09 pm 
Offline

Joined: Wed Feb 10, 2010 5:00 pm
Posts: 17
Location: San Jose, CA, USA, Earth
emuller wrote:
lunde wrote:
1) When declaring that a glyph is to be rotated 90 degrees clockwise for vertical writing (Categories S and SB), the operation may not be so simple.


There are many adjustments which are beyond the scope of Unicode. For example, the use of baselines, even in horizontal text, is not in scope, so I don't think we need to address that wrt. orientation. So far, I believe that the T orientation captures all the cases where some shifting is needed in plain text (or equivalently, in a piece of text with a single, somewhat simple, style); example: the small kanas. Do we need more cases?
My point is that while these characters clearly rotate 90 degrees clockwise for vertical writing, in order for the resulting (rotated) glyphs to be positioned correctly relative to the other characters in the vertical run, some amount of additional shifting is necessary that is mechanical and font-dependent. It is different from the T cases where the designer must correctly position the vertical glyph.

emuller wrote:
lunde wrote:
It is also one of the reasons why the 'vrt2' GSUB feature was defined in the first place.


My memory is that 'vrt2' was primarily motivated by the @fonts in Windows, and desire to not have any table such as TR#50 in ATM, when it was extended to cover OpenType fonts. I am also not aware of any implementation that uses 'vrt2', other than ATM on Windows.
I believe that you are correct.

emuller wrote:
lunde wrote:
2) There is an expectation that the same text can be purposed for both writing directions


JLREQ gives clear cases of different text content between horizontal and vertical; e.g. section 3.2.1, Note 2:

Quote:
In vertical writing mode, symbols for units are usually described with katakana (cl-16), such as センチメートル (centimeter) or センチ(abbreviation of centimeter in katakana, "senchi"). In horizontal writing mode, the International System of Units (SI) is usually used, such as "cm".


It's clear that some machinery is needed in documents which are intended to let the reader choose the display orientation. For more algorithmic transformations such the 2018->300C you describe, CSS could incorporate the transformation via text-transform. In any case, I don't think that such transformations belong to rendering engines.
Still, I believe that there are use cases whereby the same text is expected to be purposed for both writing directions. It is worth raising this as a possible requirement, at least for some implementations or use cases. If the appropriate method is to handle such cases via text-transform, it seems prudent to at least mention that in this UTR so that the proper implementation is made clearer.

emuller wrote:
lunde wrote:
3) In cl-05, I think that U+FF1A should be changed from U to S, at least for Japanese. JIS X 0213 defines that the vertical version of this character is rotated.


JIS does not account for the fullwidth characters. The approach taken in the current draft, for those characters which exist as a pair "regular"/fullwidth, is to use U for the fullwidth and S for the other. AFAICT, this is consistent with the implementations which recognize both kinds of characters, such as InDesign.
All of the Japanese implementations of which I am aware rotate U+FF1A for vertical writing. It is worth raising this as a possible issue that requires further clarification.

emuller wrote:
lunde wrote:
4) In cl-06, I think that U+FF0E should be changed from U to T. Likewise, in cl-07, I think that U+FF0C should be changed from U to T.


Same situation as above.
These two characters are used in Japanese text somewhat rarely, but if they do occur, and if the writing mode is vertical, the expected positioning suggests category T, not U. Much of this is due to their full-width property.


Top
 Profile  
 
 Post subject: Re: Some additional UTR #50 comments
PostPosted: Wed Nov 02, 2011 3:12 pm 
Offline

Joined: Wed Feb 10, 2010 5:00 pm
Posts: 17
Location: San Jose, CA, USA, Earth
About the following, I meant to recommend Category SB, not S:
Quote:
3) In cl-05, I think that U+FF1A should be changed from U to S, at least for Japanese. JIS X 0213 defines that the vertical version of this character is rotated.
Modern and mainstream Japanese fonts include a glyph for the rotated version of U+FF1A, and the 'vert' GSUB feature provides coverage for the substitution.


Top
 Profile  
 
 Post subject: Re: Some additional UTR #50 comments
PostPosted: Sun Nov 20, 2011 3:57 am 
Offline

Joined: Mon Feb 01, 2010 6:18 pm
Posts: 77
emuller wrote:
lunde wrote:
6) Should there be a way to indicate region-specific differences, such as #5 above? Add a column for region in the datafiles?


Definitely a possibility, but I am not sure we have good data at this point.


I definitely don't have a lot of background here, but is there an architectural/scope-of-standard reason why this shouldn't be handled by CLDR? Just a thought.


Top
 Profile  
 
 Post subject: Re: Some additional UTR #50 comments
PostPosted: Sat Jan 07, 2012 3:24 am 
Offline

Joined: Mon Feb 01, 2010 6:18 pm
Posts: 77
Some quick thoughts on PUA codepoints and their East Asian Orientation property.

I am wondering if PUA characters shouldn't be cl-19.3 instead of 19.1. The are both eao=U, but 19.3 seems to be a more neutral classification.

My other, completely contradicting, thought is that I am also wondering if there isn't an advantage to having PUA codepoints assigned eao=T, so that an implementation can define for itself the behaviour of the characters according to the particular private use agreement. I assume that if an alternate glyph is not specified, the horizontal glyph would be used for an eao=T codepoint, allowing supplemental ideographs to simply be assigned to the PUA without comment, but letting MUFI implementations define an alternate mediaeval Latin glyph that is, in essence, an eao=S, but implemented as an eao=T, while still letting a CSUR implementation define the alternate vertical forms of Tengwar and Klingon however they want.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com