Re: NLF (was Frank and Ed, was Plain Text)

From: Kevin Bracey (kbracey@e-14.com)
Date: Tue Jul 06 1999 - 10:16:21 EDT


In message <9907051432.AA29431@unicode.org>
          Peter_Constable@sil.org wrote:

>
>
> >I find myself dealing with Unicode text created by Windows and Windows
> applications quite frequently now, with line ends marked in little-endian
> fashion as
>
> 0D 00 0A 00
>
> Indeed, this practice has surprised me.
>
> Chris Pratley: can you comment on why Word 97 does this rather than using
> PS?
>

I think I can partially answer this from experience on our (non-MS)
environment. Our system continues to use our native line-ending type (LF
only) when dealing with Unicode data, for compatibility. In particular, when
converted to UTF-8, which is how Unicode is normally passed around our OS,
the data will have standard looking line endings - if PS or LS were used,
many non-UTF-8 aware parts of the system would get confused.

Also, a lot of Unicode data is converted from non-Unicode sources -
conversion will almost always leave C0 and C1 characters untouched. Changing
to PS and LS would need knowledge of the source data's line ending
conventions, which is hard to determine automatically. If you also need
round-trip conversion (eg Shift-JIS data in an HTML form -> Unicode browser
workings -> Shift-JIS submission to server), messing with line endings is
almost out of the question.

All other encodings use C0 controls for line endings - it's hard to
make a change for one particular encoding that does it differently.

-- 
Kevin Bracey, Senior Software Engineer
Pace Micro Technology plc                     Tel: +44 (0) 1223 725228
645 Newmarket Road                            Fax: +44 (0) 1223 725328
Cambridge, CB5 8PB, United Kingdom            WWW: http://www.acorn.co.uk/



This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:48 EDT