Feedback on Coded Hashes of Arbitrary Images

L2/16-379

Date: Thu Dec 1 17:41:29 CST 2016
Name: Markus Scherer

Subject: Feedback on Coded Hashes of Arbitrary Images

Re L2/16-105R http://www.unicode.org/L2/L2016/16105r-unicode-image-hash.pdf

I like the approach but have feedback on the details:

2.1 Representing the emoji

The proposal envisions a single name field. At a minimum, it should provide a language tag. Better would be a map <language tag, name in language>.

2.2 Secure hash

"The implementation will identify the emoji by taking the SHA-256 hash of the emoji description."

... meaning, of the JSON description (as an example), that is, including metadata like the content-type and the name(s).

Including the name seems like a mistake; it should be possible to fix name typos and to add names in additional languages. Hashing the entire "description" would also be sensitive to adding any other metadata.

I suggest hashing only the binary image data. Including the content-type would be harmless but redundant and seems unnecessary. Hashing the base64 form rather than the binary is unnecessary as well.

2.3 Code points allocated to express the secure hash

How about a compromise? Set aside 0x100 or 0x1000 code points for this scheme. Maybe they could even remain "reserved" like noncharacters, with gc=Cn? Or use a range of Plane 16 PUA codes?

(Do avoid the actual xFFFE & xFFFF noncharacter code points -- the original proposal for using 64k code points would probably have used two of those.)

2.5 Receiving and rendering a CHAI

Please specify the display size of the image when rendered. I assume that it should always be shown in the aspect ratio of the image data. What about height or width?

Emoji are typically shown with square bounding boxes and something like "text height". (Not sure what that means -- as high as a CJK ideograph? What font/ascender/descender metric is or should be used? Maybe Unicode need not be that specific.)

What if the image is not square? Scale the image to that height and find out how wide it gets? (Someone could create a very wide image.)

I believe that stickers are often larger than Emoji. Do we need modifiers that specify a scaling factor relative to the text height, some number of some fractions of normal height?

markus