Re: Subfield mark in MARC records

From: Misha Wolf (misha.wolf@reuters.com)
Date: Thu Mar 26 1998 - 07:39:55 EST


UTF-8 has no endianism problem as it is specified as a stream of octets.

To answer your specific question, each character in the range U+0000 to
U+007F is respresented in UTF-8 using the corresponding octet in the range
00 to 7F.

----------------------------------------------------------------------------
  Misha Wolf Email: misha.wolf@reuters.com 85 Fleet Street
  Standards Manager Voice: +44 171 542 6722 London EC4P 4AJ
  Reuters Limited Fax : +44 171 542 8314 UK
----------------------------------------------------------------------------
12th International Unicode Conference, 8-10 Apr 1998, Tokyo, www.unicode.org
   7th World Wide Web Conference, 14-18 Apr 1998, Brisbane, www7.conf.au

> All this can be inferred from earlier standards without deciding anything
> in addition.....but there is a catch, Endianness.
>
> Some relevant control codes:
>
> Name decimal hex bigendian hex
> littleendian
> (Status FEFF) (Status FFEF)
> --------------------------------------------------------------------------------
> Field Separator : dec 28 hex 001C hex C100
> Group Separator : dec 29 hex 001D hex D100
> Record Separator: dec 30 hex 001E hex E100
> Unit Separator : dec 31 hex 001F hex F100
>
> Intel x86 is littleendian
> PowerPC is mostly bigendian but have a littleendian mode.
> The rest of the prosessors sold today are bigendian.
>
> I assume, but are not sure, that UTF-8 represent ISO/IEC 6429 using one
> octet that will be regarded without any endian problem.
>
> Can anyone say something about this?
>
> Sorry I was misprinting ISO/IEC 6429 as 6420 in my previous post.
>
>
> >Chris White wrote:
> >> Those of you who work with bibliographic records, especially MARC
> >> records, will know that in the good old ASCII days the code point used
> >
> >> for subfield mark was Hex 1F, and when a visual representation was
> >> needed, the dollar character, $, was used, (at least in the UK).
> >>
> >> I am now endeavouring to ascertain if there is an emerging de facto
> >> standard among UNICODE users on what code point and glyph to use for
> >> the subfield mark.
> >>
> >> Any news of such a developing standard would be most welcome.
> >>
> >Subcommittees of the American Library Association's MARBI Committee are
> >working on such a standard. The mappings established to date can be
> >viewed at http://lcweb.loc.gov/marc/marc2ucs.html. Work is currently
> >underway on the mapping of CJK characters.
> >
> >In particular, the subfield delimiter has been assigned to U+001F, the
> >field terminator to U+001E, and the record terminator to U+001D. So far
> >as I know, no standard for visual representation has been proposed.
> >
> >Gary L. Smith
> >Senior Consulting Analyst
> >Database & Offline Products Development
> >OCLC
> >smithg@oclc.org
> >
>

------------------------------------------------------------------------
Any views expressed in this message are those of the individual sender,
except where the sender specifically states them to be the views of
Reuters Ltd.



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:40 EDT