Re: Utility to report and repair broken surrogate pairs in UTF-16 text

From: Martin J. Dürst (duerst@it.aoyama.ac.jp)
Date: Thu Nov 04 2010 - 05:16:06 CST

Next message: Bjoern Hoehrmann: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"

Previous message: Jim Monty: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
In reply to: Jim Monty: "Utility to report and repair broken surrogate pairs in UTF-16 text"
Next in thread: Doug Ewell: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

There is charlint (http://www.w3.org/International/charlint/), which is
based on UTF-8. It may be possible to adapt it to UTF-16/32.

Regards, Martin.

On 2010/11/04 4:37, Jim Monty wrote:
> Is there a utility, preferably open source and written in C, that inspects
> UTF-16/UTF-16BE/UTF-16LE text and identifies broken surrogate pairs and illegal
> characters? Ideally, the utility can both report illegal code units and "repair"
> them by replacing them with U+FFFD.
>
> Jim Monty
>
>
>
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Next message: Bjoern Hoehrmann: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Previous message: Jim Monty: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
In reply to: Jim Monty: "Utility to report and repair broken surrogate pairs in UTF-16 text"
Next in thread: Doug Ewell: "Re: Utility to report and repair broken surrogate pairs in UTF-16 text"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]
Mail actions: [ respond to this message ] [ mail a new topic ]

This archive was generated by hypermail 2.1.5 : Thu Nov 04 2010 - 05:21:32 CST