RE: Utility to report and repair broken surrogate pairs in UTF-16 text

From: Doug Ewell (doug@ewellic.org)
Date: Wed Nov 03 2010 - 15:20:58 CST

  • Next message: Jim Monty: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"

    Jim Monty <jim dot monty at yahoo dot com> wrote:

    > Is there a utility, preferably open source and written in C, that inspects
    > UTF-16/UTF-16BE/UTF-16LE text and identifies broken surrogate pairs and illegal
    > characters? Ideally, the utility can both report illegal code units and "repair"
    > them by replacing them with U+FFFD.

    What's an "illegal" character, for purposes of this exercise? Do you
    mean a noncharacter, or something else?

    --
    Doug Ewell | Thornton, Colorado, USA | http://www.ewellic.org
    RFC 5645, 4645, UTN #14 | ietf-languages @ is dot gd slash 2kf0s ­
    


    This archive was generated by hypermail 2.1.5 : Wed Nov 03 2010 - 15:25:00 CST