RE: Terminology question re ASCII

From: Shawn Steele <Shawn.Steele_at_microsoft.com>
Date: Tue, 29 Oct 2013 15:46:14 +0000

I would concur. When I hear “8 bit ASCII” the context is usually confusing the term with any of what we call “ANSI Code Pages” in Windows. (or similar ideas on other systems).

It’s also usually the prelude to a conversation asking the requestor to back up 5 or 6 steps and explain what they’re really trying to do because something’s probably a bit confused.

-Shawn

From: unicode-bounce_at_unicode.org [mailto:unicode-bounce_at_unicode.org] On Behalf Of Philippe Verdy
Sent: Tuesday, October 29, 2013 7:49 AM
To: Mark Davis ☕
Cc: Donald Z. Osborn; unicode
Subject: Re: Terminology question re ASCII

"8-bit ASCII" is not so clear !

The reason for that is the historic documentation of many softwares, notably for the BASIC language, or similar like Excel, or even more recent languages like PHP, offering functions like "CHR$(number)" and "ASC(string)" to convert a string to the numeric "8-bit ASCII" code of its first "character" or the reverse. The effective encoding of strings was in fact not specified at all and could be any 8-bit encoding used on the platform.

Only in more recent versions of implementtions of these languages, they specify that the encoding of their strings is now based on Unicode (most often UTF-16, so that 8-bit values now produce the same result as ISO-8859-1), but this is not enforced if a "compatibility" working mode was kept (e.g. in PHP which still uses unspecified 8-bit encodings for its strings in most of its API, or in Python that distinguishes types for 8-bit encoded strings and Unicode-encoded strings).


2013/10/29 Mark Davis ☕ <mark_at_macchiato.com<mailto:mark_at_macchiato.com>>
Normally the term ASCII just refers to the 7-bit form. What is sometimes called "8-bit ASCII" is the same as ISO Latin 1. If you want to be completely clear, you can say "7-bit ASCII".


Mark<https://plus.google.com/114199149796022210033>

— Il meglio è l’inimico del bene —

On Tue, Oct 29, 2013 at 5:12 AM, <dzo_at_bisharat.net<mailto:dzo_at_bisharat.net>> wrote:
Quick question on terminology use concerning a legacy encoding:

If one refers to "plain ASCII," or "plain ASCII text" or "... characters," should this be taken strictly as referring to the 7-bit basic characters, or might it encompass characters that might appear in an 8-bit character set (per the so-called "extended ASCII")?

I've always used the term "ASCII" in the 7-bit, 128 character sense, and modifying it with "plain" seems to reinforce that sense. (Although "plain text" in my understanding actually refers to lack of formatting.)

Reason for asking is encountering a reference to "plain ASCII" describing text that clearly (by presence of accented characters) would be 8-bit.

The context is one of many situations where in attaching a document to an email, it is advisable to include an unformatted text version of the document in the body of the email. Never mind that the latter is probably in UTF-8 anyway(?) - the issue here is the terminology.

TIA for any feedback.

Don Osborn

Sent via BlackBerry by AT&T



Received on Tue Oct 29 2013 - 10:48:37 CDT

This archive was generated by hypermail 2.2.0 : Tue Oct 29 2013 - 10:48:38 CDT