RE: japanese xml

From: Peter_Constable@sil.org
Date: Fri Aug 31 2001 - 00:55:24 EDT


On 08/30/2001 12:14:49 PM Marco Cimarosti wrote:

>Yes, yes. XML documents can represent characters in at least two
ways:

>2) By representing them with numeric references in the form "Ӓ"
etc.
>The numeric references themselves are sequences of characters ("&" +
"#" +
>one or more of "0".."9" + ";") expressed in the underlying plain text
>encoding. The meaning of a numeric reference for an XML parser is the
single
>Unicode character whose code is written between "&#" and ";".
>
>In the context of Unicode and, more generally, plain-text encoding
"to
>encode" means only point 1 above, and "&1234;" is just a
six-character
>string...

>Point 2, in Unicode speech, is defined a "higher level protocol", and
it is
>considered out of the scope of the standard.

I haven't followed the entire thread closely, but I'd agree with this.

So, it comes down to a question of how we define "encode", and of the
usage context that determines our definition. Marco was assuming a
definition as it would be used internal to Unicode. Misha apparently
was using a broader definition that is valid in other contexts, though
not internally to Unicode.

So, they were both right in relation to the assumptions they were
making. The question, though, is what definition or context Viranga
was assuming when the question was asked.

If you use the term "encoding"

- Peter

----------------------------------------------------------------------
-----
Peter Constable

Non-Roman Script Initiative, SIL International
7500 W. Camp Wisdom Rd., Dallas, TX 75236, USA
Tel: +1 972 708 7485
E-mail: <peter_constable@sil.org>

#############################################################
This message is sent to you because you are subscribed to
  the mailing list <UnicodeSIL@lists.sil.org>.
To unsubscribe, E-mail to: <UnicodeSIL-off@lists.sil.org>
Send administrative queries to <UnicodeSIL-request@lists.sil.org>



This archive was generated by hypermail 2.1.2 : Fri Aug 31 2001 - 02:04:22 EDT