Subject: Comments on Unicode 3.0 Draft dated 1998-Dec-15

From: Edwin Hart

Comment Affecting Multiple Chapters

Problem: In at least Chapter 2 and Appendix C, the book is inconsistent in how it references and names "the second edition of ISO/IEC 10646-1". In some places it is "republished", and in others, "the second edition". Personally, I prefer "the second edition of ISO/IEC 10646-1" and dislike "republished". In some places the year 1999 is used and in others, 2000. Until the publication year of the second edition of 10646-1 becomes clear, the book needs a variable, which can be globally changed to the correct year.

Specific Comments

Chapter 1

  1. P. 6, 1.5 The Unicode Consortium, paragraph 2
    1. Sentence 1: Correct the description of WG 2 and add one for SC 2.
      SC 2: "the subcommittee within ISO responsible for computer character sets and coding"
      WG 2: "the SC 2 working group responsible for ISO/IEC 10646"
    2. Sentence 2: Add "US" before "National Committee for ". Add a comma after the NCITS committee name and before "Technical Committee" because the two names run together and it is unclear that L2 is under NCITS.

Chapter 2

  1. P. 8, Text Processes and Encoding, 2nd bullet, last sentence
    Replace "both encodings" with "both approaches" or "both encoding approaches".
  2. P. 9, Text Processes and Encoding, Last paragraph, first sentence
    Add "In summary," to the beginning.
  3. P.9, 2.2 Unicode Design Principles, first sentence
    Replace the text after "reflects the" with "ten fundamental principles in Table 2-1."
  4. P. 10, Sixteen-Bit Character Codes, paragraph 2.
    The words "where 8-bit codes are needed" grossly fails to describe the real issues with why UTF-8 was developed. I suggest wording like: "UTF-8 is intended to be used where the behavior of some existing text processing systems was designed for only 8-bit characters and/or is sensitive to certain 8-bit code values."
  5. P. 10, Characters, Not Glyphs, first sentence
    Replace "that have semantic value" with "with semantic value". The word "that" references "components" rather than "language".
  6. P. 14, Unification, paragraph 2, last sentence
    Replace "duplicating" with "rather it duplicates" or "rather Unicode duplicates" because the reference for "duplicating" is "Unicode" rather than the preceding word, "language".
  7. P. 15, top paragraph, last sentence
    Replace sentence with "These mappings would be better used to provide the correct equivalences for searching and sorting rather than for transcoding."
  8. P. 18, Figure 2-5, Unicode Allocations
    Add the area for the CJK Extension A.
  9. P. 19, Code Space Assignment for Graphic Characters,
    1. bullet 2
      Add "IRV" after "ISO/IEC 646".
    2. bullet 4
      Add "depending on script" at the end.
  10. P. 20, Non-Graphic Characters, Reserved and Unassigned Codes, bullet 4
    The last sentence is valid but should be moved somewhere else; it is unnecessary here.
  11. P. 20, footnote
    Add "Hangul" between "Korean" and "characters".
  12. P. 24, Byte Order Mark, first paragraph, first sentence
    For transmission across a network, the order of the bytes in a multiple-byte entity as Network Byte Order. Network Byte Order is big endian.
    Remove "or transferring across a network".

Appendix C

  1. P. 293, paragraph 1, sentence 1
    Replace "during October of 1991" with "during the summer and fall of 1991" or "during 1991" because we met in May in San Francisco (ad hoc), in August in Geneva (WG 2), and finally in October in Paris (WG 2).
  2. P. 294, C.1 Timeline,
    1. Change the title from "Timeline" to "Comparison of Versions of Unicode and ISO/IEC 10646" or "Versions of Unicode and ISO/IEC 10646". Then add the "Timeline" title to the table.
    2. Table
      The last two references to 10646 are confusing, and may or may not be correct. See the very first comment in the document. My understanding is that ISO will publish the second edition of 10646-1 to include the approved amendment at the time of publication. What is the "republication" of 10646 (in 1999) versus the "second edition" in 2000. Also, do you want to include part 2 of 10646?
    3. paragraph 1
      This paragraph is confusing and reads like gobbledygook to me.
    4. Paragraphs 3 and on page 295, 1 and 2
      Decide "republished version" versus "second edition".
      Specify either "1999" or "2000" as the year of publication of the second edition of 10646.
  3. P. 295, C.2 Structure of ISO/IEC 10646, paragraph 1
    This paragraph does not describe the structure and should be deleted. It duplicates material from section C.1.
  4. P. 295, C.2 Structure of ISO/IEC 10646, paragraph 2 to the end
    This paragraph does not describe the structure and should be moved to section C.1.
  5. P. 295, C.2 Structure of ISO/IEC 10646, Table C-1
    I strongly agree with Joan to widen the column to keep all of the 32 bits of the UCS-4 character on the same line.
  6. P. 296, UTF-8, last paragraph
    Replace "ISO/IEC 10646 AM2" with "ISO/IEC 10646-1 Amendment 2" or "ISO/IEC 10646-1 AMD[??] 2"
  7. P. 297, UTF-16
    Make the notation consistent for showing ranges. In some cases 3 periods are used, in others, 2 periods.
  8. P. 297, UTF-16, last paragraph
    "planes 1 to 16" reads better than "planes 1..16". This discusses a range of planes rather than a range of code positions, for which the ".." or "" is a reasonable notation.
  9. P. 297, C.4 The Unicode Standard and ISO/IEC 10646, title
    Would a better title be: "Compliance between the Unicode Standard and ISO/IEC 10646"
  10. P. 297, C.5 The Unicode Standard as a Profile of 10646, paragraph 1, sentence 2
    The term "Profile" has a special meaning within the ISO/IEC standards community and this concept is not defined here. As I recall, ISO defines a profile as a specification of which options are implemented. Im not sure if the Unicode book needs to provide such a definition or if something like adding "(a specification of the 10646 parameters)" after "a profile" in the second sentence would resolve my concern.
  11. P. 298, paragraph 1
    ISO/IEC 10646-1 also defines an ISO/IEC 2022 control sequence to introduce the use of 10646.
  12. P. 298, C.6 Character Names, last paragraph, last sentence
    Add "on the CD-ROM" to the end of the sentence.
  13. P. 298, C.7 Character Functional Specifications, paragraph 1, last sentence
    Replace "as much information as possible" with "the necessary information".
  14. P. 298, footnote
    Replace "even if other language versions are published by ISO" with "rather than versions in other languages".
  15. P. 299, C.7 Character Functional Specifications, last paragraph,
    1. sentence 2
      Replace "makes it implementable" with "guide implementations".
    2. sentence 3
      The sentence is awkward. Replace "Also necessary to a complete standard is" with "The Unicode Standard also adds".
    3. Last sentence
      Replace the sentence with "Although compliant implementations of the Unicode Standard will also be compliant with ISO/IEC 10646 at Level 3, compliant implementations of ISO/IEC 10646 may not necessarily be compliant with the Unicode Standard."