Hello, this has grown rather longish, hence I'll add an abstract: - "German-Library collation" (aka RAK) isn't a "collation", - RAK vs. DIN 5007, - how both RAK and DIN 5007 treat non-alphabetic characters, - sources for both RAK and DIN 5007, - common misconceptions about DIN 5007. On Mar 6, 14:22, Tex Texin wrote: > [...] a sort ordering known as German-Library Collation [...] > Of course if I could actually get a copy that would be outstanding. You probably are referring to "Regeln für die alphabetische Katalogisierung : RAK" which is a 9-volume work, covering all aspects of German library catalogs in great detail. Volume 1 is "Regeln für wissenschaftliche Bibliotheken : RAK-WB", ISBN 3-88226-165-X (hard-cover); 3-88226-166-8 (bound); 3-87068-471-2 (hypertext on diskettes); 3-87068-436-4, 3-87068-474-7, 3-87068-494-1 (unbound sheets with update service) You can find related works by entering, in the "Title" field of the WWW form found unter , the terms "Titelaufnahme" and, in another instance, "Regeln Katalogisierung". Alphabetic ordering of the catalogue is covered by RAK-WB §801 through §822; however, these sections cannot be understood without referring to the other several hundred sections throughout the tome. On Mar 6, 14:22, Tex Texin wrote: > The questions are about non-alphabetic characters such as space, double > quote, and other punctuation marks and whether the standard specifies an > ordering for them. The ordering is highly specific to library catalogues (it could be used to order a bibliography, as well, but for virtually no other sort of data). The ordering is definitely not an ordering for arbitrary character strings, as the question apparently is presupposing; it is only defined in the context of bibliographic material, and it needs interpretation and judge- ment as discussed in the sequel. The ordering is based on fields, such as the author, or the title of a book. In some (specific) cases, the meaning of a field will be used to determine the ordering, e. g. a book by Caesar will precede a book about Caesar (when the book's title starts with Caesar's name). Thus, the ordering is not based on characters alone, but also on some interpretation of their meaning and context. Within a field, ordering is by words; hence word-boundaries are significant. Shorter words go before longer ones, whenever the shorter word is a beginning of the longer one, e. g. "Wie" < "Wien" < "Wiener" < "Wienerwald" < "Wienerwald-Gaststätte"; thus, word-boundaries effectively are treated as low-values (in the sense it was used in COBOL). RAK does treat all word- boundaries equally, no matter by which punctuation they are marked. On the other hand, punctuation not marking any word-boundary is treated as insignificant. Every word can have a secondary field attached, termed "Ordnungshilfe"; these are used to order otherwise identical fields. E. g. "New York" (Ala.) < "New York" (NY) < "New York" (State), where the Ordnungshilfen (in brackets) do not belong to the fields proper (e. g. book titles). This is another example for the fact that you need additional information beyond the mere character sequence to establish its ordering value. Word-ordering is based on Latin characters only; all other characters and marks (if significant, at all) are transliterated, or transcribed, as appropriate, before ordering; the transliteration scheme is language- specific (e. g. the letter ghe, U+0433, is transcribed as "g" from Russian but as "h" from Bielorussian). Even numbers are transscribed into words (of the pertinent language) before ordering, in many (specific) cases. This is a 3rd example for the fact that you need additional information beyond the mere character sequence to establish its ordering value. According to the 1983 edition of the rules I am having in front of me, even German Umlaute (such as "ä") are substituted (e. g. with "ae") before ordering. (I hope this has changed, or will change soon, as other orderings, including most dictionaries and encyclopaedias, treat them as their respective base-letters, e. g. "a", in first approximation.) Note that, in RAK ordering, a diaeresis is ignored, in contrast to an Umlaut! In other (specific) cases, sequences of digits are treated as numbers, and then they are ordered according to their numerical values. This pertains even to Roman numerals (or Greek ones, by the way) which look very much like characters. On Mar 6, 15:16, Nelson H. F. Beebe wrote: > I believe the ordering you are referring to is German Industrial > Standard DIN 5007. DIN 5007 (as of April 1991) specifies rules for ordering arbitrary character sequences. It has several options, so it does not establish one single "German ordering rule" but rather a set of variants. The most significant single option is how to treat an Imlaut: "ä" could be treated either as "a" (the normal rule) or as "ae" (the special rule for "lists of proper names"), in first approximation. Other variants pertain to the treatment of punctuation marks. The standard discriminates between three groups of punctuation marks: spaces (which always have the lowest ordering-value), pronounced marks (such as "-" in "10 - 15 Paar Socken", where "-" means "to"), and mute marks (such as "-" in "10-mm-Gabelschlüssel", where the hyphen "-" simply separates the con- stituents of the compound word). It is up to the application to decide, whether to ignore either of these two groups in the ordering, or not. If the application decides to pay regard to such marks, then all pronounced marks are to be treated as equivalent, and so are all mute ones; if both groups of marks are to be used in the ordering, it is up to the application to decide whether the pronounced ones go before the mute ones, or vice versa. DIN 5007 allows for character sequences from different scripts, the general rule being that non-Latin characters come after the Latin ones. On Mar 6, 15:16, Nelson H. F. Beebe wrote: > We have implemented this in the makeindex program, [...] > Here is the relevant portion of the manual page description: > The sequence in German word ordering is: symbols, lowercase letters, > uppercase letters, numbers. [...] Additionally, this option > enables recognition of the German TeX commands {"a, "o, "u and "s} > as {ae, oe, ue and ss} during the sorting of the entries, This is only the 1st approximation, see below. Also, it is only valid for sorting of "proper names", see above. On Mar 6, 15:16, Nelson H. F. Beebe wrote: > [...] case is ignored when grouping letters, so the ordering is [...] > a A b B c C [...], not A B C ... a b c ... or a b c ... A B C ... Either the makeindex program does *not* implement DIN 5007, or the man page does *not* aptly describe its effects. There is no single collating-sequence that could be assigned to characters which would result in a DIN-5007- conforming ordering. Rather, DIN 5007 mandates a 3-level ordering process (which can, of course, be implemented as a conventional binary ordering of suitably constructed 3-part ordering keys). - In the 1st level, only the base character of any letter is considered, i. e. all of "A", "a", "À", "à", "Á", "á", "Â", "â", "Ã", "ã", "Å", and "å" are treated as equivalent, in 1st approximation; "Æ" and "æ" are treated as "ae", in this level; "ä" and "Ä" are either treated as "a", or as "ae", depending on the option chosen. - Only if two entire keys are equivalent in level 1, the 2nd level is applied: in this level, the diacritical marks are considered. For these, DIN 5007 mandates a particular order. - Only if two entire keys are equvalent in both level 1 and 2, the 3rd level applies; here case is considered. On Mar 6, 15:16, Nelson H. F. Beebe wrote: > A search on the Web at http://altavista.digital.com/ for "DIN 5007" > turned up [...]: This is a catalogue of copies of DIN standards available at a particular professoral chair, in Germany. A better source for DIN standards is , with its search capability under . On Mar 6, 15:16, Nelson H. F. Beebe wrote: > http://os.inf.tu-dresden.de/L4/refman/node4.html [in German] This is a rather lengthy opus. The relevant section under exhibits a similar error as the description of makeindex, above: the order described is just (one possible variant) of DIN 5007's 1st level; the other levels are not mentioned. The reason for this wide-spread misconception on DIN 5007 is probably the introductury remarks of German phone, and zip-code, directories: these state boldly that the entries were "ordered according to DIN 5007, i. e, 'Ä' is treated as 'ae', and so forth" (paraphrased from memory). While, for phone directories, this is only part of the truth, it is a blatant lie for zip-code directories, as city, and street, names are not proper names of persons, so the normal variant of DIN 5007 applies. (Still, the zip-code directories are sorted according to the proper-names variant, in contradiction to DIN 5007.) On Mar 6, 15:36, Joan Aliprand wrote: > You may be looking for: > Regeln fur die alphabetische Katalogisierung ... by the Verein Deutscher > Bibliothekare, Kommission fur Alphabetische Katalogisierung. Munchen : Der > Verein, 1975. This is an obsolete edition. As said above, the copy I borrowed from our library is of 1983. Even this copy contains numerous hand-written amendmends. So I recommend, whoever is earnestly interested in the current rules should obtain one of the unbound-sheets, or diskette, editions cited above. Best wishes, Otto Stolz