Re: RTL PUA?

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 25 Aug 2011 09:37:18 +0200

2011/8/25 Peter Constable <petercon_at_microsoft.com>:
> From: verdyp_at_gmail.com [mailto:verdyp_at_gmail.com] On Behalf Of Philippe Verdy
>
>>2011/8/22 Jo dm <adam_at_jooadam.hu>:
>>> Speaking of actual implementation, Im convinced that this format
>>> should be the same as it is for encoded characters ...
>
>> As well, the small properties files can be embedded, in a very compact
>> form, in the PUA font.
>
> In one sense having data regarding PUA character properties embedded within a font could make sense since the interpretation of instances of those PUA characters will be tied to particular fonts.
>
> However, I don't see this as really being workable: rendering implementations will typically do certain types of processes without access to any font data.

Remove the future "will in your sentence... you're assuming how future
implementations will work.

And the "certain types of process" element is extremely fuzzy. Those
that want to use PUA as RTL characters will never be satisfied, they
want an access to some properties data that are not only those from
the UCD.

But you're right in one thing: the font is not expected to contain all
those properties. I am still convinced that this is the best place for
BC property values which are tied to the font, for rendering purpose.
Only the properties for PUA characters that have absolutely no use in
rendering should not be in fonts (for example collation weights, case
mappings, custom character name aliases if one wants).

Some other properties may be needed for rendering purpose: notably
text segmentation data for handling line breaks (many PUA are
currently used for custom sinograms in the Han script, that allows
linebreak to occur before and after each of them; but this behavior
would not be perceived as correct for most scripts.

However, I don't think that line breaking properties data are very
well fitting in fonts, because such segmentation is not needed only
for rendering. However for most of those non-rendering purpose (e.g.
plain-text search), we genenrally don't want to have the search result
depending on soft line breaks. Soft line breaks are only meant for
rendering purpose, and so this breakability may become also under the
control of the font.

On the opposite, hard line breaks are controlled by existing non-PUA
control characters, so they are not a problem and don't need to be
overriden. Those hard line breaks are very often expected to be
searchable, unlike soft line breaks which should remain invisible in
plain-text searches as they are only the result of some rendering
process.
Received on Thu Aug 25 2011 - 02:40:35 CDT

This archive was generated by hypermail 2.2.0 : Thu Aug 25 2011 - 02:40:46 CDT