L2/14-195 Subject: Data on the usage of left-side spacing marks in New Tai Lue Source: Roozbeh Pournader, Google Inc. Date: August 5, 2014 In L2/14-090, Martin Hosken suggested various approaches to change the model of the encoding of New Tai Lue script in Unicode. The script subcommittee recommended also had recommendations about this in L2/14-129. UTC discussed the topic in meeting #139, but didn't arrive at a decision. The author tried to gather information based on the usage of New Tai Lue on the web. The whole Google web corpus of public web pages was investigated. The research was limited to HTML pages that didn't have any malformed or unassigned Unicode text. Non-HTML documents and HTML documents that had unpaired surrogates or unassigned Unicode characters were not considered. The research confirmed suspicions that the visual "wrong" encoding of New Tai Lue (Hosken's Solution 3), is much more widespread that the logical "correct" Unicode encoding (Hosken's solution 1). This can be verified by searching for perhaps the most frequent New Tai Lue word that exposes the encoding issue, ᦙᦵᦲᧂ (, instead of the standard <19B6>. The author doesn't wish to draw conclusions from this information, and wishes to note that Microsoft products have been supporting logical New Tai Lue since at least October 2009, and Google and Mozilla products (and various other products based on HarfBuzz) have done the same since at least February 2013. The amount of private data or public non-HTML data in the correct encoding is unknown to the author.