Re: Database missing/erroneous information

From: Eric Muller via Unicode <>
Date: Wed, 12 Jul 2017 07:22:28 -0700
In the .grouped.xml file, if a <char> does not have an attribute, it inherits it from its containing <group> element. The group containing the digits has  IDC="Y" OIDC="N" XIDC="Y", and so that applies to the digits as well.

If you don't want to deal with the inheritance mechanism, just use the .flat.xml files, the <char> elements carry all the attributes.


On 7/12/2017 6:35 AM, J Decker via Unicode wrote:
I started looking more deeply at the javascript specification.  Identifiers are defined as starting with characters with ID_Start and continued with ID_Continue attributes.
I grabbed the xml database (ucd.all.grouped.xml )  in which I was able to find IDS, IDC flags ( also OIDS,OIDC, XIDS,XIDC of which meaning I'm not entirely sure of)

but I started filtering out to find characters that are NOT IDS|IDC.... 

Something simple like numbers 0x30-0x39 are marked with IDS='N' but have no [ OX]IDC flags specified.  Is a lack of flag assumed N or Y? documentation on the XML file format doesn't specify.  I see 'ID_Continue characters include ID_Start characters, plus characters '

most languages do support identifiers like a1, a2, etc as valid identifiers, so certainly numbers should have IDC even though they're not IDS.  
Are there characters that are IDS without being IDC?  There are certainly characters that are IDC without IDS.

some examples.....

found  char { cp: '0034',  na: 'DIGIT FOUR',  gc: 'Nd',  nt: 'De',  nv: '4',  bc: 'EN',  lb: 'NU',  sc: 'Zyyy',  scx: 'Zyyy',  Alpha: 'N',  Hex: 'Y',  AHex: 'Y',  IDS: 'N',  XIDS: 'N',  WB: 'NU',  SB: 'NU',  Cased: 'N',  CWCM: 'N',  InSC: 'Number' }

(this has IDC notation but not IDS; since it says 'digit' I assume this is a number type, and should not be IDS.)
found  char { cp: '0F32',  na: 'TIBETAN DIGIT HALF NINE',  gc: 'No',  nt: 'Nu',  nv: '17/2',  Alpha: 'N',  IDC: 'N',  XIDC: 'N',  SB: 'XX',  InSC: 'Number' }

This might be not IDS but is IDC?
found  char { cp: '203F',
  na: 'UNDERTIE',
  gc: 'Pc',
  IDC: 'Y',
  XIDC: 'Y',
  Pat_Syn: 'N',
  WB: 'EX' }

this is sort of IDS but not IDC?
found  char { cp: '309B',  na: 'KATAKANA-HIRAGANA VOICED SOUND MARK',  gc: 'Sk',  dt: 'com',  dm: '0020 3099',  bc: 'ON',  lb: 'NS',  sc: 'Zyyy',  scx: 'Hira Kana',  Alpha: 'N',  Dia: 'Y',  OIDS: 'Y',  XIDS: 'N',  XIDC: 'N',  WB: 'KA',  SB: 'XX',  NFKC_QC: 'N',  NFKD_QC: 'N',  XO_NFKC: 'Y',  XO_NFKD: 'Y',  CI: 'Y',  CWKCF: 'Y',  NFKC_CF: '0020 3099',  vo: 'Tu' }

Received on Wed Jul 12 2017 - 09:22:56 CDT

This archive was generated by hypermail 2.2.0 : Wed Jul 12 2017 - 09:22:56 CDT