Re: Most commonly used characters not in BMP

From: Uriah Eisenstein (uriaheisenstein@gmail.com)
Date: Tue Jul 06 2010 - 18:29:36 CDT

  • Next message: CE Whitehead: "RE: UTS#10 (collation) : French backwards level 2, and word-breakers."

    Regarding characters in the SIP, maybe the Unihan IICore field could be
    useful? There are 62 Extension B characters which are listed as IICore. Of
    course, these may just be characters which *should* be supported by
    implementations, given that quite a lot of software has problems with
    supplementary characters in general...

    Uriah

    On Tue, Jun 15, 2010 at 3:15 AM, Mark Davis ☕ <mark@macchiato.com> wrote:

    > From a sampling of the web (about .7M docs), the most common supplementary
    > characters are, curiously, private use. Top is [?] U+FEB85. For Han, the
    > top few are: 𣿡, 𠀤, 𩇫, 𥑬, 𤥂, 𡛺, 𤎌, 𠜎,... There are also, oddly,
    > some Gothic and Shavian characters.
    >
    > However, the data gets pretty noisy; it would take a bigger sample to get
    > more reliable data.
    >
    > Mark
    >
    > — Il meglio è l’inimico del bene —
    >
    >
    > On Mon, Jun 14, 2010 at 09:10, John H. Jenkins <jenkins@apple.com> wrote:
    >
    >> Some characters in the SIP are more common in Chinese written in the HK
    >> SAR than any character in Extension A, either because they are Hong Kong
    >> toponyms (or the like), or are Cantonese-specific. (My own analysis of text
    >> on the Chinese Wikipediæ is that the most common are U+23D13, U+282E2,
    >> U+28B4E, and U+2A568, which occur seven times each.)
    >>
    >> I imagine that the best data would come from Google.
    >>
    >> And there are some Web sites out there in Deseret and Shavian, as well.
    >> (If nothing else, both Deseret and Shavian versions of xkcd are available.
    >> I'm not aware of any Linear B translations.)
    >>
    >> On 2010/6/14, at 上午8:48, Frédéric Grosshans wrote:
    >>
    >> > Is there any data on the most commonly used characters which are not in
    >> > BMP ?
    >> >
    >> > I have the impression that SMP characters are mainly used scholars
    >> > (historic scripts and math symbols). However, I have no idea whether the
    >> > SIP characters are mainly historical, or if they include not-so rare
    >> > characters needed for name and/or chinese dialects.
    >> >
    >> > Frédéric Grosshans
    >> >
    >> >
    >>
    >>
    >>
    >>
    >



    B85.gif

    This archive was generated by hypermail 2.1.5 : Tue Jul 06 2010 - 18:36:15 CDT