|
|
Page 1 of 1
|
[ 6 posts ] |
|
| Author |
Message |
|
dumbledore
|
Post subject: Unicode control characters from the ASCII block Posted: Tue Oct 12, 2010 3:40 pm |
|
Joined: Tue Oct 12, 2010 3:35 pm Posts: 7
|
|
Is it safe to use the control characters from the ASCII block for our own uses? For instance STX, ETX, DC1-DC4, SO, SI, etc.?
Thanks a lot!
|
|
| Top |
|
 |
|
asmus
|
Post subject: Re: Unicode control characters from the ASCII block Posted: Tue Oct 12, 2010 4:04 pm |
|
 |
| Unicode Guru |
Joined: Tue Dec 01, 2009 2:49 pm Posts: 172
|
|
If you need some special codes for use inside your application, then you are encouraged to use any of the noncharacter code points. The point with them is that they are not for interchange, so that normally, you would not be expected to accept them from another source. Hence, you can always express the full range of external character data, while still having a set of 66 "safe" code points for your own internal use.
If you need some special codes for use in a data (file) protocol that otherwise limits what is acceptable text, then the use of control codes may be OK. Unicode was designed to work with any of the many terminal protocols that were in wide use in the late 80's when Unicode was first conceived. These protocols would each make slightly different use of some control codes.
For the longest time, Unicode was officially designed to allow any use of all control codes, based on whatever protocol was in effect. Over the years, that turned out to be too general a position, because plain text files definitely use certain control codes (like LINE FEED) with a very limited range of interpretations. And, it turns out, in order to have some of the Unicode Algorithms work well, one has to have an understanding how to divide text into lines or segments (using e.g. TAB).
This affects only a small number of control codes. The remainder are still treated by default as if they were defined by ISO 6429, but other interpretations of them are also conformant.
So, formally, you can use these characters and give them your own interpretation - but be clear that you would be defining your own protocol. Don't expect processes and applications that expect "plain text" to handle these correctly or even sensibly.
|
|
| Top |
|
 |
|
dumbledore
|
Post subject: Re: Unicode control characters from the ASCII block Posted: Sat Oct 16, 2010 5:37 am |
|
Joined: Tue Oct 12, 2010 3:35 pm Posts: 7
|
|
Thanks a lot <i>asmus</i>. My usage of control codes is really naive. The idea is that I need to do some processing over the text and have some markers in it that have meaning to my app and is sure that they would be a product of its own processing and not from external sources. That is, I am using control codes only temporarily. I wondered only, whether it's probable that I should encounter them in a plain Unicode (UTF-8 in my case) text file.
Thanks again!
|
|
| Top |
|
 |
|
asmus
|
Post subject: Re: Unicode control characters from the ASCII block Posted: Sat Oct 16, 2010 10:57 am |
|
 |
| Unicode Guru |
Joined: Tue Dec 01, 2009 2:49 pm Posts: 172
|
|
What you describe seems to fit the purpose for which the noncharacter code points were reserved. There are 2 such code points at then end of each plane (all codes ending in FFFE and FFFF) and 32 codes in a range from FDD0 to FDEF.
These are the correct codes to use when you need to create temporary markers inside your program in a way that doesn't interfere with incoming data. When you send out data, you would, of course, clean out these markers.
|
|
| Top |
|
 |
|
dumbledore
|
Post subject: Re: Unicode control characters from the ASCII block Posted: Sat Oct 16, 2010 11:53 am |
|
Joined: Tue Oct 12, 2010 3:35 pm Posts: 7
|
So, asmus, would you confirm that I have understood you correctly. - I shouldn't use any control characters from ASCII
- Instead I'd better use the ones in the ranges you specified. These are usually showed in black in the Unicode tables I've seen and in the BMP are they are along with the Arabic glyphs.
Thanks a lot! You've been very helpful! :)
|
|
| Top |
|
 |
|
asmus
|
Post subject: Re: Unicode control characters from the ASCII block Posted: Sat Oct 16, 2010 12:58 pm |
|
 |
| Unicode Guru |
Joined: Tue Dec 01, 2009 2:49 pm Posts: 172
|
|
You're welcome and your summary is indeed correct.
|
|
| Top |
|
 |
|
Page 1 of 1
|
[ 6 posts ] |
|
Who is online |
Users browsing this forum: No registered users and 2 guests |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot post attachments in this forum
|
|
|