Re: VS characters, default ignorable property and text search and collation

From: Mark Davis ☕ (mark@macchiato.com)
Date: Mon Jul 26 2010 - 14:41:36 CDT

Next message: Asmus Freytag: "Re: ? Reasonable to propose stability policy on numeric type = decimal"

Previous message: Kenneth Whistler: "Re: VS characters, default ignorable property and text search and collation"
In reply to: Shriramana Sharma: "VS characters, default ignorable property and text search and collation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

Mark

*— Il meglio è l’inimico del bene —*

On Mon, Jul 26, 2010 at 09:40, Shriramana Sharma <samjnaa@gmail.com> wrote:

> Hello list.
>
> I have a question about VS characters and the default ignorable property.
>
> TUS 5.2 ch 16.4 clearly states that VS characters are default ignorable. Ch
> 5.21 states that default ignorable characters are to be ignored in rendering
> (except in specialized modes which show hidden characters).
>

That is incorrect. What it actually says is (my bold):

"Default ignorable code points are those that should be ignored by default
in rendering *unless explicitly supported.* "

Or to put it in other terms:

If your rendering system doesn't explicitly support character X, it should
be ignored by default (as if it hadn't been in the string to be rendered).

So if you *do *support a given variation sequence, then this clause doesn't
apply; as a matter of fact, supporting it means that it is not ignored; that
it has a visible impact on the rendering.

>
> The paragraph in p 171 on default ignorable characters under ch 5.3 states
> that "these characters are also ignored except with respect to specific,
> defined processes; for example, zero width non-joiner is ignored by default
> in collation."
>
> This seems to suggest to me that despite ch 5.21 speaking only about
> rendering, the default ignorable property also has or at least can have a
> part in other processes such as collation. I would however like to have a
> confirmation on this:
>
> Are all default ignorable characters ignored not only in rendering

incorrect assumption, see above.

> but in other processes also?
>

Yes, in that in processing they should be ignored unless they are relevant
to the kind of processing involved. Note that other characters may also be
ignored, depending on the processing. So there is not a hard-and-fast rule.

   - For example, in collation any of the characters in
   http://unicode.org/Public/UCA/6.0.0/allkeys-6.0.0d1.txt with weights
   starting "[.0000.0000.0000." are ignorable by default, and include
   characters that are not default-ignorable.
   - For word-segmentation Extend and Format characters are ignored (except
   for edge cases): see
   http://unicode.org/reports/tr29/#Default_Word_Boundaries Those include
   many more characters than just the default-ignorables, and exclude 5
   characters (Hangul fillers and ZWSP). See also
   http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[:Word_Break:Format:][:Word_Break:Extend:]&g=di
   .

In other words, default-ignorables should usually be ignored by
non-rendering processes, but there will be exceptions. And other characters
may also be ignored, depending on the process.

> Or is it that they are ignored by default in rendering and whether they are
> ignored in other processes or not is variable?
>
> Specifically, are VS characters ignored in rendering only (i.e. rendering
> them, not the characters they apply to of course) or are they ignored even
> in other processes such as text search and collation?
>
> --
> Shriramana Sharma
>
>

Next message: Asmus Freytag: "Re: ? Reasonable to propose stability policy on numeric type = decimal"
Previous message: Kenneth Whistler: "Re: VS characters, default ignorable property and text search and collation"
In reply to: Shriramana Sharma: "VS characters, default ignorable property and text search and collation"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jul 26 2010 - 14:43:22 CDT