The Unicode Consortium Discussion Forum (CLOSED)

The Unicode Consortium Discussion Forum (CLOSED)

The forum has been closed, but prior postings are accessible for reading.
 Forum Home  Unicode Home Page Code Charts Technical Reports FAQ Pages 
 
It is currently Sun Dec 21, 2014 10:54 am

All times are UTC - 6 hours [ DST ]




Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 
Author Message
 Post subject: Unicode control characters from the ASCII block
PostPosted: Tue Oct 12, 2010 3:40 pm 
Offline

Joined: Tue Oct 12, 2010 3:35 pm
Posts: 7
Is it safe to use the control characters from the ASCII block for our own uses? For instance STX, ETX, DC1-DC4, SO, SI, etc.?

Thanks a lot!


Top
 Profile  
 
 Post subject: Re: Unicode control characters from the ASCII block
PostPosted: Tue Oct 12, 2010 4:04 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 182
If you need some special codes for use inside your application, then you are encouraged to use any of the noncharacter code points. The point with them is that they are not for interchange, so that normally, you would not be expected to accept them from another source. Hence, you can always express the full range of external character data, while still having a set of 66 "safe" code points for your own internal use.

If you need some special codes for use in a data (file) protocol that otherwise limits what is acceptable text, then the use of control codes may be OK. Unicode was designed to work with any of the many terminal protocols that were in wide use in the late 80's when Unicode was first conceived. These protocols would each make slightly different use of some control codes.

For the longest time, Unicode was officially designed to allow any use of all control codes, based on whatever protocol was in effect. Over the years, that turned out to be too general a position, because plain text files definitely use certain control codes (like LINE FEED) with a very limited range of interpretations. And, it turns out, in order to have some of the Unicode Algorithms work well, one has to have an understanding how to divide text into lines or segments (using e.g. TAB).

This affects only a small number of control codes. The remainder are still treated by default as if they were defined by ISO 6429, but other interpretations of them are also conformant.

So, formally, you can use these characters and give them your own interpretation - but be clear that you would be defining your own protocol. Don't expect processes and applications that expect "plain text" to handle these correctly or even sensibly.


Top
 Profile  
 
 Post subject: Re: Unicode control characters from the ASCII block
PostPosted: Sat Oct 16, 2010 5:37 am 
Offline

Joined: Tue Oct 12, 2010 3:35 pm
Posts: 7
Thanks a lot <i>asmus</i>. My usage of control codes is really naive. The idea is that I need to do some processing over the text and have some markers in it that have meaning to my app and is sure that they would be a product of its own processing and not from external sources. That is, I am using control codes only temporarily. I wondered only, whether it's probable that I should encounter them in a plain Unicode (UTF-8 in my case) text file.

Thanks again!


Top
 Profile  
 
 Post subject: Re: Unicode control characters from the ASCII block
PostPosted: Sat Oct 16, 2010 10:57 am 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 182
What you describe seems to fit the purpose for which the noncharacter code points were reserved. There are 2 such code points at then end of each plane (all codes ending in FFFE and FFFF) and 32 codes in a range from FDD0 to FDEF.

These are the correct codes to use when you need to create temporary markers inside your program in a way that doesn't interfere with incoming data. When you send out data, you would, of course, clean out these markers.


Top
 Profile  
 
 Post subject: Re: Unicode control characters from the ASCII block
PostPosted: Sat Oct 16, 2010 11:53 am 
Offline

Joined: Tue Oct 12, 2010 3:35 pm
Posts: 7
So, asmus, would you confirm that I have understood you correctly.
  1. I shouldn't use any control characters from ASCII
  2. Instead I'd better use the ones in the ranges you specified. These are usually showed in black in the Unicode tables I've seen and in the BMP are they are along with the Arabic glyphs.

Thanks a lot! You've been very helpful! :)


Top
 Profile  
 
 Post subject: Re: Unicode control characters from the ASCII block
PostPosted: Sat Oct 16, 2010 12:58 pm 
Offline
Unicode Guru

Joined: Tue Dec 01, 2009 2:49 pm
Posts: 182
You're welcome and your summary is indeed correct.


Top
 Profile  
 
Display posts from previous:  Sort by  
Forum locked This topic is locked, you cannot edit posts or make further replies.  [ 6 posts ] 

All times are UTC - 6 hours [ DST ]


Who is online

Users browsing this forum: No registered users and 1 guest


Quick-mod tools:
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group
Template made by DEVPPL.com