|
|
Page 1 of 1
|
[ 5 posts ] |
|
| Author |
Message |
|
yellowantphil
|
Post subject: Unicode property utility Posted: Tue Feb 01, 2011 3:02 pm |
|
Joined: Tue Jan 11, 2011 9:29 pm Posts: 5
|
|
Searching in the Unicode property utility for [:General_Category=Unassigned:][:General_Category=Control:] returns what looks like a random set of characters at the top of the results: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=[%3AGeneral_Category%3DUnassigned%3A][%3AGeneral_Category%3DControl%3A]&g=
Below that the results appear to be what I expected: all of the unassigned and control characters. Is there a bug in the way the initial list of code points is generated?
|
|
| Top |
|
 |
|
asmus
|
Post subject: Re: Unicode property utility Posted: Tue Feb 01, 2011 3:13 pm |
|
 |
| Unicode Guru |
Joined: Tue Dec 01, 2009 2:49 pm Posts: 172
|
|
The list of characters at the top should all be square boxes, because fonts should not have glyphs for unassigned characters. But, as you will see, many fonts do place glyphs at those locations. That's why the visual list of characters is seemingly full of random entries.
In an attempt to "show" you some characters, the browser may go out of its way to locate fonts that have some (non-default) glyph at these locations, so in a way, you get the worst behaving fonts.
|
|
| Top |
|
 |
|
yellowantphil
|
Post subject: Re: Unicode property utility Posted: Tue Feb 01, 2011 4:37 pm |
|
Joined: Tue Jan 11, 2011 9:29 pm Posts: 5
|
|
I can use wget to download the URL I listed above, and I have a text editor that will identify the code point of the character under the cursor. The list of characters at the top does contain characters that are not in the set I searched for. Copying and pasting characters from the list in my browser into the character properties utility also shows that they are characters that I didn't search for. One random example is U+0B30 ର which is in general category Other_Letter.
I should mention that I don't have this problem when searching only for [:General_Category=Unassigned:] or [:General_Category=Control:]. I get the unexpected characters only when I search for both categories together. Searching for the unassigned general category prints a long string of replacement characters �, and checking the code points they are all U+FFFD, not unassigned code points being displayed with a � symbol in my browser.
|
|
| Top |
|
 |
|
mark
|
Post subject: Re: Unicode property utility Posted: Tue Feb 01, 2011 7:33 pm |
|
 |
| Forum Admin |
Joined: Fri Dec 04, 2009 9:13 pm Posts: 32
|
|
The list of characters at the top is a UnicodeSet, as used in Regular Expressions. The first character is a ^, indicating that it is all the characters that are _not_ listed:
[^\ -~ -ͷͺ-;΄-ΊΌΎ-ΡΣ-ԧ Ա-Ֆՙ-՟ա...
|
|
| Top |
|
 |
|
yellowantphil
|
Post subject: Re: Unicode property utility Posted: Tue Feb 01, 2011 11:57 pm |
|
Joined: Tue Jan 11, 2011 9:29 pm Posts: 5
|
|
OK, that makes sense. Thanks.
|
|
| Top |
|
 |
|
Page 1 of 1
|
[ 5 posts ] |
|
Who is online |
Users browsing this forum: No registered users and 0 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|
|