Re: Default character encoding for each operating system?

From: Philippe Verdy <verdy_p_at_wanadoo.fr>
Date: Thu, 15 Sep 2016 17:25:17 +0200

Not all internals. Many kernel drivers (notably bus drivers) still use an
OEM 8 bit encoding in their debugging log (based on an US English locale
most often even if the installed version if localized to another version;
but I've seen CP850 still used; and you can see some samples in the Event
Viewer). Those messages in fact are not localized at all and intended only
for debugging or analysis by developers, or displayed on a Windows console.

Many console tools on Windows still use the default 8-bit OEM charset and
won't display any Unicode output, even when the console is set to use an
Unicode codepage: I can still see some mojibake, even on Windows 10). When
those ouput messages are read from other UI tools, they won't be
interpreted in their codepage but in the default "ANSI" codepage (such as
Windows1252).

Filesystems still use legacy charsets in their basic directory structure
(e.g. when inserting a FAT or FAT32 volume, formated without the LFN
extensions for Windows which also stores filenames in UTF-16, such as a SD
card formatted on a digital camera; as the directories and filenames create
on those devices only use ASCII and uninformative names such as
IMG00001.JPG this generally does not cause a problem; but no Unicode name
is stored; I've seen however some digital cameras storing some filenames in
a legacy Chinese or Japanese charset, incorrectly rendered when viewing
their content on a non-Japanese/Chinese system).

2016-09-15 16:36 GMT+02:00 John W Kennedy <john.w.kennedy_at_gmail.com>:

> macOS, and its offspring, iOS, watchOS, and tvOS, use UTF-16LE for all
> internals, but readily import and export all versions of Unicode and a good
> many historic 8-bit and mixed-length codings.
>
> In the new Swift programming language, which is white-hot in the Apple
> community, Apple is moving toward a model of a transparent, generic Unicode
> that can be “viewed” as UTF-8, UTF-16, or UTF-32 if necessary, but in which
> a “character” contains however many code points it needs (“e” with a
> stacked macron, acute accent, and dieresis is algorithmically one
> “character” in Swift). Moreover, e-with-an-acute-accent and e followed by a
> combining acute accent, for example, compare as equal. At present, the
> underlying code is still UTF-16LE.
>
> --
> SKen Software, LLC
> Coming soon to an iPhone near you
>
> On Sep 15, 2016, at 9:19 AM, Philippe Verdy <verdy_p_at_wanadoo.fr> wrote:
>
> A better question is what is the default character encoding for the
> **installed** operating system.
>
> Unfortunately it has no single response, because there are several default
> encodings for several parts of the OS. An OS has lots of components, many
> of them don't are transparent to the encoding it uses.
>
> All the 3 OSes you cite support several default character encodings, and
> in addition they support them in several encoding forms. All three support
> Unicode internally, but not in all software components. that will run with
> one or the other.
>
> And defaults will change according to your distribution or OS
> configuration options, and to your own current user settings
>
> 2016-09-15 13:14 GMT+02:00 Costello, Roger L. <costello_at_mitre.org>:
>
>> Hi Folks,
>>
>> In a book that I am reading [1] the author mentions “the default
>> character encoding for the operating system.” What is the default character
>> encoding of:
>>
>> - Windows 10
>>
>> - Mac OS
>>
>> - Linux
>>
>>
>> /Roger
>>
>> [1] *Practical Common Lisp* by Peter Seibel, p. 165 (footnote 2).
>>
>
>
Received on Thu Sep 15 2016 - 10:26:13 CDT

This archive was generated by hypermail 2.2.0 : Thu Sep 15 2016 - 10:26:13 CDT