Re: Subfield mark in MARC records

From: Misha Wolf (misha.wolf@reuters.com)
Date: Thu Mar 26 1998 - 07:39:55 EST

Next message: Roman Czyborra: "Re: Naughty NL_ORAPOST"
Previous message: Kolbjørn Aambø : "RE: Subfield mark in MARC records"
Maybe in reply to: Chris White: "Subfield mark in MARC records"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

UTF-8 has no endianism problem as it is specified as a stream of octets.

To answer your specific question, each character in the range U+0000 to
U+007F is respresented in UTF-8 using the corresponding octet in the range
00 to 7F.

----------------------------------------------------------------------------
  Misha Wolf Email: misha.wolf@reuters.com 85 Fleet Street
  Standards Manager Voice: +44 171 542 6722 London EC4P 4AJ
  Reuters Limited Fax : +44 171 542 8314 UK
----------------------------------------------------------------------------
12th International Unicode Conference, 8-10 Apr 1998, Tokyo, www.unicode.org
   7th World Wide Web Conference, 14-18 Apr 1998, Brisbane, www7.conf.au

> All this can be inferred from earlier standards without deciding anything
> in addition.....but there is a catch, Endianness.
>
> Some relevant control codes:
>
> Name decimal hex bigendian hex
> littleendian
> (Status FEFF) (Status FFEF)
> --------------------------------------------------------------------------------
> Field Separator : dec 28 hex 001C hex C100
> Group Separator : dec 29 hex 001D hex D100
> Record Separator: dec 30 hex 001E hex E100
> Unit Separator : dec 31 hex 001F hex F100
>
> Intel x86 is littleendian
> PowerPC is mostly bigendian but have a littleendian mode.
> The rest of the prosessors sold today are bigendian.
>
> I assume, but are not sure, that UTF-8 represent ISO/IEC 6429 using one
> octet that will be regarded without any endian problem.
>
> Can anyone say something about this?
>
> Sorry I was misprinting ISO/IEC 6429 as 6420 in my previous post.
>
>
> >Chris White wrote:
> >> Those of you who work with bibliographic records, especially MARC
> >> records, will know that in the good old ASCII days the code point used
> >
> >> for subfield mark was Hex 1F, and when a visual representation was
> >> needed, the dollar character, $, was used, (at least in the UK).
> >>
> >> I am now endeavouring to ascertain if there is an emerging de facto
> >> standard among UNICODE users on what code point and glyph to use for
> >> the subfield mark.
> >>
> >> Any news of such a developing standard would be most welcome.
> >>
> >Subcommittees of the American Library Association's MARBI Committee are
> >working on such a standard. The mappings established to date can be
> >viewed at http://lcweb.loc.gov/marc/marc2ucs.html. Work is currently
> >underway on the mapping of CJK characters.
> >
> >In particular, the subfield delimiter has been assigned to U+001F, the
> >field terminator to U+001E, and the record terminator to U+001D. So far
> >as I know, no standard for visual representation has been proposed.
> >
> >Gary L. Smith
> >Senior Consulting Analyst
> >Database & Offline Products Development
> >OCLC
> >smithg@oclc.org
> >
>

------------------------------------------------------------------------
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of
Reuters Ltd.

Next message: Roman Czyborra: "Re: Naughty NL_ORAPOST"
Previous message: Kolbjørn Aambø : "RE: Subfield mark in MARC records"
Maybe in reply to: Chris White: "Subfield mark in MARC records"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT