The Unicode Consortium Discussion Forum

The Unicode Consortium Discussion Forum

 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Sat Aug 02, 2014 3:38 am

All times are UTC - 6 hours [ DST ]





Post new topic Reply to topic  [ 6 posts ] 
Author Message
 Post subject: Per cent, per mille, per ten-thousand
PostPosted: Mon Jun 11, 2012 1:26 pm 
Offline

Joined: Tue Oct 11, 2011 4:34 pm
Posts: 19
The per cent sign (%) appears in ASCII, and also has a fullwidth variant. So U+0025 has MVO=R and U+FF05 has MVO=U. However the per mille and per ten-thousand signs are only encoded once. Currently, they have MVO=U, but Murakami-san argues they should be MVO=R:

Quote:
"Used in East Asian codepages" is not a good reason to MVO=U. ‰ is used as「最大勾配は480‰(25.6°)。」sideways digits+upright ‰ are strange


Should these be changed to MVO=R?


Top
 Profile  
 
 Post subject: Re: Per cent, per mille, per ten-thousand
PostPosted: Mon Jun 11, 2012 3:16 pm 
Offline

Joined: Wed Dec 07, 2011 3:01 am
Posts: 71
I prefer keeping them as U.

As you pointed out, these code points are ambiguous, so the only options we have are:
  1. Introduce Ambiguous model just like EAW or UAX#14 does, and let applications decide how to resolve the ambiguity.
  2. Make a trade-off to whichever.

I'm ok to take #1 if all agrees to do so. But if we were going to #2 and make a trade-off for ambiguous code points, I wish UTR#50 to define a good value for East Asian documents, where the majority use of vertical flow occurs.

Since we're making a trade-off, one can find bad examples whichever we choose. I think it's a question of for whom and for what kind of documents UTR#50 is designed.


Top
 Profile  
 
 Post subject: Re: Per cent, per mille, per ten-thousand
PostPosted: Mon Jun 11, 2012 10:14 pm 
Offline

Joined: Sat Jan 14, 2012 4:10 am
Posts: 29
We have to note that all characters can appear as upright in vertical East Asian documents. Usually, single letter, digit or symbol between Kanji or Kana text is set upright. As a result, only foreign words or phrases, math expressions etc. are set sideways and usually they are relatively rare.

Many Japanese books, magazines, newspapers in vertical layout basically use only upright letters and numbers. For such context the Stacked Vertical Orientation will be preferable.

On the other hand, the Mixed Vertical Orientation should be defined based on the context where Western letters and digits are sideways. 「最大勾配は480‰(25.6°)。」 is a good example.
UTR#50 MVO cannot be compatible with legacy Shift JIS era's vertical orientation. Greek and Cyrillic letters are now sideways. If we define some ambiguous characters as U and some others as R it will make a confusion, difficult to know which characters are U or R, why ‰ is upright? why μ is sideways? why § is upright? etc. I want more simple policy: ambiguous characters which are often used with MVO=R letters or digits are also MVO=R, easy to understand when they are set sideways by default and whether specifying upright orientation is needed.


Shinyu Murakami
Antenna House


Top
 Profile  
 
 Post subject: Re: Per cent, per mille, per ten-thousand
PostPosted: Mon Jun 11, 2012 10:55 pm 
Offline

Joined: Wed Dec 07, 2011 3:01 am
Posts: 71
Ambiguities cannot be resolved without compromise, so UTR#50 has to make a choice, or introduce ambiguous model as I posted in another thread. If we were making a choice, the choice should be good for the majority; East Asian usages.

I'm repeating this story several times to some people so sorry if you're tired to read again, but Word 2.0 tried to resolve this issue by using code point-based model, and all ambiguities were resolved to U, because what it tried to be was a good East Asian word processor.

To be really a good multi-lingual word processor, Word 6.0 introduced ambiguous model, by adding a flag to indicate whether a run of text is East Asian or not. You can find a "hint" flag in OOXML/ECMA-376 spec today; it's an enum of 3 values; "cs" (complex script,) "eastAsia", and "default." I don't think the ambiguity issue can be resolved without having such mechanism, and therefore I'm not trying to resolve every single case in this revision of UTR#50.

Until we have a complete solution, we have to make a trade-off, and I wish the trade-off be good for the majority use; i.e., East Asian usages.

We can add ambiguous model later on. It won't be a breaking change. Both CSS Writing Modes and UTR are still the first version, we can improve them to support multi-lingual scenario better to make more people happy. What I'm asking is to make the majority people happy first.


Top
 Profile  
 
 Post subject: Re: Per cent, per mille, per ten-thousand
PostPosted: Wed Jun 13, 2012 11:15 am 
Offline

Joined: Sat Oct 29, 2011 12:06 pm
Posts: 35
"The per cent sign (%) appears in ASCII, and also has a fullwidth variant. So U+0025 has MVO=R and U+FF05 has MVO=U." This is correct in the sense that it is compatible with the current usages of the encoded characters.

But I doubt that this simply means that the Upright posture is the correct, most typical East Asian posture of the per cent character, because no character set standards define how to position this type.

One thing clear about this is just that because of the double (full-width and proportional) encoding, there can be its upright full-width glyph used for both horizontal and vertical lines, as well as its proportional glyph that is rotated in vertical lines. But, for the per cent character, it's okay. Both shapes work.

However, I think what is clear about the characters that lack the above-mentioned situation of "per cent", such as "per mill" etc., is only that they are Western characters.


Top
 Profile  
 
 Post subject: Re: Per cent, per mille, per ten-thousand
PostPosted: Thu Jun 14, 2012 6:25 pm 
Offline

Joined: Sat Oct 29, 2011 12:06 pm
Posts: 35
As I already discussed unit symbols under the topic of "Letterlike Symbols", I think I can repeat it here.

Here, let's pick the example of the "per mill" character. I pointed out (1) that katakana characters, instead of unit symbols, should be used when using Chinese numbers in vertical lines, and (2) that the "per mill" glyph's width may exceed the EM body of the parent line, when set in the UPRIGHT posture.

There can be various styles. You can use Arabic numbers with the unit symbol, if you set it horizontally. You may use Tate-Chu-Yoko with Arabic numbers plus the unit symbol, if the number of digits is smaller than three. Very rarely, you may use Chinese numbers with the unit symbol. However, in Japanese typography, the most widely accepted convention of composing numbers with a unit name in vertical lines is to set it in Chinese characters and rewrite the unit name with katakana characters corresponding to the pronunciation of the unit name. This is the most typical usage, as shown in the following example.

「二十パーミル勾配というのは二パーセント勾配と同じですね」

I think this applies to every unit symbol that has its origin in a Western word.
So, Western unit symbols should NOT be UPRIGHT by default in vertical lines, because the UPRIGHT posture is not genuinely "EAST ASIAN". (I agree many people may use it in the UPRIGHT posture. It may have been influenced by the ambiguity in handling JIS full-width characters. Yes, it is possible, but the style is NOT the most typically "East Asian", for the reasons mentioned above).

About the "percent" character also, the same discussion applies, but the character has full-width and proportional code points. So, it is free from this question. We don't need to discuss it at all.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 4 guests


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
cron
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com