RE: A basic question on encoding Latin characters

From: Robert Brady (
Date: Tue Sep 28 1999 - 18:31:26 EDT

On Tue, 28 Sep 1999, Kenneth Whistler wrote:

> The strawman I was addressing was Frank's implication that some
> processes would fail in this realm because they would sit and hang
> waiting forever for the combining mark that never showed up for dinner,
> or that you would get false negative and false positive matches that
> couldn't be programmed around because of the placement of combining
> marks. -- Not the redisplay flicker issue on a terminal when rendering
> trailing combining marks.

Indeed. However, for the same reason the flicker happens, Unicode
combining characters break with 0-lookahead string matching [1] like that
in "expect". This obviously cannot be fixed.

In UTF-8 over a TTY, you cannot tell the difference between

  a <wait-forever>
  a <very-long-pause-indeed> <combining-ring-above>

unless you use a timeout-based system, which should be avoided for Very
Good Reasons.

This is especially bad for keypress-based apps whose users may wish to
define different commands for "a" and "a-ring-above". This is an actual
practical problem.

This cannot be fixed, but pressure to move to canonically-decomposed forms
of text will make the problem more noticable.

1. I'm not sure what the proper term for that type of matching is. You
   know what I mean anyway.


This archive was generated by hypermail 2.1.2 : Tue Jul 10 2001 - 17:20:53 EDT