Re: Other Question, Problem, or Feedback

From: Richard Wordingham (richard.wordingham@ntlworld.com)
Date: Mon Jun 12 2006 - 21:15:27 CDT

Next message: Richard Wordingham: "Re: triple diacritic (sch with ligature tie in a German dialect writing document)"

Previous message: James Kass: "Re: PDFs of Unicode Standard Annex"
In reply to: Magda Danish \(Unicode\): "FW: Other Question, Problem, or Feedback"
Next in thread: Dean Harding: "RE: Other Question, Problem, or Feedback"
Reply: Dean Harding: "RE: Other Question, Problem, or Feedback"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

----- Original Message -----
From: "Magda Danish (Unicode)" <v-magdad@microsoft.com>
To: <unicode@unicode.org>
Cc: <wikoh@msn.com>
Sent: Monday, June 12, 2006 8:04 PM
Subject: FW: Other Question, Problem, or Feedback

>
> -----Original Message-----
> Date/Time: Sat Jun 10 14:54:43 CDT 2006
> Contact: wikoh@msn.com
> Name:
> Report Type: UTF-16 & UTF-32
>
> I haven't been able to find a an answer in the FAQ or googling the site to
> these questions...
>
> 1.Is it true that there are many ways of encoding the same character in
> UTF-16?

No. There is exactly one way of encoding each character in UTF-16. See TUS
4.0 Section 2.5 'Encoding Forms', especially p29.

> Do you know if common regular expression search functions like those of
> .NET or Perl will find a character regardless of in what fashion it was
> encoded?

This problem therefore does not arise.

> 2.Why is there now UTF-32?

Binarism. A 27-bit word is perfectly capable of representing any valid
codepoint. Anything that can be validly done with UTF-32 can be done with
any word size from 21 bits upwards. (Any one contemplating using a
non-binary representation should consult the final part of TUS 4.0 Section
2.4 for the implications on Unicode data tables :-).

> Are there even that many characters in the world that they need 32-bit
> representation?

If everyone invented a character and it were accepted, despite the alleged
rule on not encoding novel or idiosyncratic characters ('Note, however, that
the Unicode Standard does not encode idiosyncratic, personal, novel, or
private-use characters, nor does it encode logos or graphics.' - TUS 4.0
Section 1.1 Paragraph 3), 32 bits would not be enough. However, it is
currently strenuously maintained that 21 bits will suffice. The range of
values is 0 to 0x10FFFF (TUS 4.0 Section 2.4 Paragraph 3).

Next message: Richard Wordingham: "Re: triple diacritic (sch with ligature tie in a German dialect writing document)"
Previous message: James Kass: "Re: PDFs of Unicode Standard Annex"
In reply to: Magda Danish \(Unicode\): "FW: Other Question, Problem, or Feedback"
Next in thread: Dean Harding: "RE: Other Question, Problem, or Feedback"
Reply: Dean Harding: "RE: Other Question, Problem, or Feedback"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Mon Jun 12 2006 - 21:30:11 CDT