From: hiura@openi18n.org (Hideki Hiura) To: Lisa Moore/Santa Teresa/IBM@IBMUS, rick@unicode.org, kenw@sybase.com, asmusf@ix.netcom.com, michelsu@windows.microsoft.com, mark.davis@jtcsv.com, emuller@adobe.com, goldsmit@apple.com cc: ivs@unicode.org 11/06/2003 12:08 AM Subject: Re: [ideograph vs:00086] Another thought on Collection identifier Those whom attending UTC in person, is it possible to evaluate the following alternative idea I designed together with Michel Suignard and Anan at SunDance during reception of WG2? With this architecture, with reliable RA, we can stay away from relatively heavier process of registering indivisual variations(Mojikyo has 120,000 variations), and stay with extremely lightweight registration process of just registering variation collection without looking at inside of those collection. Best Regards, -- hiura@{freestandards.org,OpenI18N.org,li18nux.org,unicode.org,sun.com} Chair, OpenI18N.org/The Free Standards Group http://www.OpenI18N.org Architect/Sr. Staff Engineer, Sun Microsystems, Inc, USA eFAX: 509-693-8356 ----------------------------------------------------------------------------- Subject: [ideograph vs:00086] Another thought on Collection identifier From: hiura@openi18n.org (Hideki Hiura) To: ivs@unicode.org Date: Wed, 5 Nov 2003 23:50:47 -0800 (PST) Resent-Message-Id: <20031105.235047.68550688.hiura@openi18n.org (Hideki Hiura)> Tonight's possible ad-hoc may come up with different idea but as a fallback and seed for thought, I post possible alternative to address collection identifier, as I am not confident on that people will agree on single comprehensive variation set and collection identifier is needed in any form. > 2. Variation collection identifier > 3. Registration Authority > Concerns thrown are mostly on 2 and 3. The biggest concern posted on the 2. collection identifier is that it introduces state, and such announcer may be lost during the process, which I personally think it is not a big deal of processing correctly in any case ;-P..anyway.. To make the collection identifier stateless, we can take the route that RA to assign one Plane14 collection identifier value to the registered collection, and use it as single shift modifier of ideograph VS. To go with this route, we define the collection identifiers, say for now, U+E1000 to U+E2000 as EDEOGRAPH COLLECTION IDENTIFIERS For example, suppose Adobe-Japan1-5 and 6 are registered as Collection Identifier U+E0000, and U+E0001. Also suppose Mojikyo is registered as U+E0002. The variation of U-4E0E(—^) can be expressed in 3 unicode sequence. base VS # Collection # ------+-------+------- Adobe-Japan1-5 C-3881: U-4E0E U+E0170 U+E1000 Adobe-Japan1-5 C-20073: U-4E0E U+E0171 U+E1000 Mojikyo U-4E0E variation 1: U-4E0E U+E0170 U+E1002 Mojikyo U-4E0E variation 2: U-4E0E U+E0171 U+E1002 Mojikyo U-4E0E variation 3: U-4E0E U+E0172 U+E1002 Mojikyo U-4E0E variation 4: U-4E0E U+E0173 U+E1002 Mojikyo U-4E0E variation 5: U-4E0E U+E0174 U+E1002 In the sample variation chart on the proposal 0.7, the shape of U-4E0E U+E0170 U+E1000 is same as U-4E0E U+E0171 U+E1002, and U-4E0E U+E0171 U+E1000 is same as U-4E0E U+E0170 U+E1002. No matching shape exist in AJ1-5 for U-4E0E U+E0172 U+E1002, U-4E0E U+E0173 U+E1002, and U-4E0E U+E0174 U+E1002. however, it is not our business to identify which matches to which and which exists in which collection. We provide namespace and mechanism. This appraoch would remove the stateful collection announcer, therefore, it removes the problem Michael mostly concerned about in database, or cut&paste problem as well as the nightmare of involving with identifying which shape is duplicate or conflicts of opinion on which shape is the variation of which, etc. I think this approach is much lightweight than having RA to consolidate all han variation registrations.