BETA UnicodeĀ® 11.0.0
            
              
                
                  | Note: The beta review period for Unicode 11.0.0 has closed,
                   as of 
                  April 23, 2018. Feedback received during the public review can be
                  referred to from PRI #372.
                  
                  This beta review page is
                  left active, however, for convenience of access to the prepublication versions
                  of the Unicode 11.0.0 data files and annexes, until the formal release
                  planned for June 5, 2018. | 
                
               
			  The next version of the Unicode Standard will be Version 11.0.0, planned for release on 
June 5, 2018. This version updates several annexes to deal with
segmentation issues and adds significant new repertoire.
A total of 684 new characters are encoded, including
66 new emoji characters,
7 new scripts, and multiple additions to existing blocks.
A beta version of the 11.0.0 Unicode Character Database files is available for public review. 
We strongly encourage implementers to review the summary description, 
download the beta 11.0.0 Unicode Character Database files, 
and test their programs with the new data, well before the end of the beta period. It is especially important
to review the Notable Issues for Beta Reviewers.
 
We encourage users to check the code charts carefully 
to verify correctness of the new characters added to Unicode 11.0.0 and to ensure
that there are no regressions 
in glyph shapes for previously encoded characters.
		
Related Unicode Technical Standards
                
		In addition to the Unicode Standard proper, four other Unicode Technical 
		Standards have significant text and data file updates that are 
		correlated with the new additions for Unicode 11.0.0. Review of that text 
		and data is also encouraged during the beta review period.
                
		
Review and Feedback
					  For guidance on how to focus your review, see the section
			  Notable Issues for Beta Reviewers.
			  Any feedback should be  
              reported using the contact form.
              Comments on the Unicode Standard Version 11.0.0
              or the Unicode Character Database data files should refer to the beta review
			  Public Review Issue #372.
              Comments on specific Version 11.0.0 UAXes and UTSes should refer to the respective
              Public Review Issue Numbers
              for each document, where available.
                          The comment period ends
			  April 23, 2018. 
            All substantive technical comments must have been received by that date for 
            consideration at the May UTC meeting. Editorial comments (typos, 
            etc.) may be still submitted after that date for consideration in the final 
            editorial work.
			  
				  Note: All beta files may be updated, replaced, or 
			  superseded by other files at any time. The beta files will be 
			  discarded once Unicode 11.0.0 is final. It is inappropriate to cite 
			  these files as other than a work in progress. No 
			  products or implementations should be released based on the beta 
			  UCD data files—use only the final, approved Version 11.0.0 data 
			  files, expected on June 5, 2018. 
				  
			  
			  The Unicode Consortium provides early access to updated versions of the data files 
and text to give reviewers and developers as much time as possible to ensure a problem-free adoption of 
Version 11.0.0.
			  The assignment of characters for Unicode 11.0.0 is 
          now stable. There will be no further 
                          additions or modifications of code points and no further changes to character names.
                          Please do not submit feedback requesting changes to code points
                          or character names for Unicode 11.0.0, as such feedback is not actionable.
			  
			  One of the main purposes of the beta review period is to verify and 
correct the preliminary character property assignments in the Unicode Character 
Database. Reviewers should check for property changes to existing Unicode 10.0.0 
characters, as well as the property values for the new Unicode 11.0.0 character 
additions. The Auxiliary
 HTML charts include the new characters highlighted in yellow, with names 
 appearing when hovering over a cell. These charts
 may be useful for reviewing information such as the default collation order,
 Script property assignments, and so forth during beta review.
 
To facilitate verification of the property changes and additions, 
diffable XML versions 
of the Unicode Character Database are available. These XML 
files are dated, so that people can check the details of changes that occurred 
during the beta review period. For more information, 
see the
diffs.readme.txt 
file.
			  The beta review period is a good opportunity to add support for the new 
Unicode 11.0.0 characters in internal versions of software, so that software can 
be tested to verify that the new characters and property assignments do not cause 
problems when upgraded to Version 11.0.0 of Unicode.
Notable Issues for Beta Reviewers
Changes to Unicode Standard Annexes
			  Some of the Unicode Standard Annexes have modifications for 
			Unicode 11.0.0, often in coordination with changes to character properties.
            Most notably for Unicode 11.0.0:
      
        - UAX #29 handling of grapheme cluster boundary determination has undergone
          a significant update, to better handle consonants linked by viramas, so as
          to provide better segmentation of Indic phonological syllables. Implementers
          of segmentation should carefully check their property classes and rules.
See the Modifications section of each Annex for details of the relevant changes.
Core Specification Update
      The core specification is undergoing extensive review, with
        numerous additions for Version 11.0.0. Although the draft text for Version 11.0.0
        is not yet available, specific reports of any technical or editorial
        issues in the currently published core specification 
        are also welcome during the beta review
        period. Such reports will be taken into consideration for corrections
          to the Version 11.0.0 draft. (Note: The Unicode Consortium has ongoing 
          opportunities for subject-matter volunteers: experts interested in contributing to or
          editing relevant parts of the core specification or other Unicode specifications.)
Script-specific Issues
      7 new scripts have been added in Unicode 11.0.0. Some of these scripts have
        particular attributes which may cause issues for implementations. The more
        important of these attributes are summarized here.
      
        - The Hanifi Rohingya script is a new RTL script, with numbers written LTR, as in Arabic.
- The tatweel (U+0640) has been extended for use in Hanifi Rohingya and Sogdian.
- There are two new sets of vigesimal (base 20) numerals, one for the Medefaidrin script, and another for Mayan. The Mayan numerals are added for specialty use, as for page numbers, in advance of the encoding of the full Mayan script.
- Indic Siyaq numerals have complex formatting requirements, when combined to
          represent large numbers.
New Data Files Added to the UCD
      
        - A new data file has been added to the UCD: EquivalentUnifiedIdeograph.txt.
          That data file contains the mapping values for the new property,
          Equivalent_Unified_Ideograph (EqUIdeo).
Casing Issues
      There has been a very significant change to casing behavior for the Georgian
        script. A new set of Mtavruli capital letters (U+1C90..U+1CBA, U+1CBD..U+1CBF)
        has been added to Unicode 11.0.0,
        with case mappings to the existing Mkhedruli letters (U+10D0..U+10FA, U+10FD..U+10FF).
        In prior versions of the Unicode Standard, Mkhedruli Georgian was considered to
        be a monocameral (non-casing) script, and the Mkhedruli Georgian letters were gc=Lo.
        Starting with Version 11.0.0, those Mkhedruli Georgian letters are now gc=Ll, and
        have uppercase mappings to Mtavruli Georgian capital letters. This change will
        have major implications for Georgian implementations, including changes for
        input methods, fonts, casing, and string matching. Existing implementations
        have treated Mtavruli headlines and other uses for textual emphasis as a text
        style, so there will also be significant issues for document conversion and
        upgrade.
      Another complication for Georgian is that the primary orthography does not use
        titlecasing, and the Mkhedruli Georgian letters do not have titlecase mappings to
        Mtavruli letters. This is unique among bicameral systems in the Unicode Standard,
        so casing implementations should be prepared for this exception.
General Character Property Issues
      There are a number of issues related to particular character properties:
      
        - New GCB and WB segmentation property values for the revised algorithms to better handle Indic phonological syllables (aksaras). (See also UAX #29.) A couple of emoji-related property values are no longer used for segmentation, as a consequence of the changes in UAX #29.
- GCB=Extend no longer matches Grapheme_Extend=Y, as a result of its partitioning to factor out a new class, GCB=Virama.  WB=Extend and SB=Extend are unaffected.
- In prior versions of the UCD, cursive joining scripts which had
          any Joining_Group values assigned included distinct values for all
          characters that participate in cursive joining, including all of
          the Joining_Group singletons (classes containing only a single
          character). Starting with Unicode 11.0.0 and going forward, 
          explicit Joining_Group values are assigned only to characters which
          do not constitute singleton classes. This new convention is applicable to 
          the two newly encoded cursive joining scripts: Hanifi Rohingya and Sogdian.
          Implementations may need to take into account this discontinuity in how
          Joining_Group values are assigned to cursive joining scripts.
- Bidi mirroring: Unicode 11.0.0 now adds formal recognition of a number of
        previously encoded mathematical 
          characters as forming mirroring pairs. This means that there is now a further 
          deviation between the mappings defined in BidiMirroring.txt and
          those defined in the OpenType mirroring list, which was frozen as of Unicode 5.1.
          Note that this does not change bidirectional formatting: there is no
          change to the Bidi_Mirrored binary property value here, but only to the listing
          of which pairs of encoded characters have nominally mirroring glyphs.
- Some property values have been added to the Indic_Syllabic_Category property.
The following assignments of Line_Break property values deserve careful review.  Implementers and specialists are invited to provide feedback on these assignments.
      
        - U+0C84 KANNADA SIGN SIDDHAM (lb=BB)
- Historic punctuation in the range U+2E43..U+2E4E (mostly lb=BA)
Additionally, implementers should take note of the following special Line_Break
        property values associated with a subset of the emoji additions to UCD 11.0:
        
      
        - New emoji base characters: U+1F9B5, U+1F9B6, U+1F9B8, U+1F9B9 (lb=EB). Note that
          most new emoji characters have the value lb=ID.
Unihan-related Issues
      All Unihan
        properties should be reviewed carefully. Additionally, the following
        deserve special attention:
      
        - Additional CJK unified ideographs, which push the end of range for assigned characters in the main CJK block. (The same issue applies for Tangut, which also had a few new ideographs added at the end of the main Tangut block.)
- 5 new provisional Unihan properties have been added.
- In addition, the kHangul property values underwent a major revision.
Standardized Variation Sequences
      One additional new standardized variation sequences has been added, to represent a short diagonal stroke form of U+FF10 FULLWIDTH DIGIT ZERO.
Code Charts
      As always, careful review of the updated code charts for Version 11.0.0 is advised,
        especially for all newly added scripts.
        Particular issues to take note of include:
      
        - The use of characters beyond the range of Latin-1 is now allowed in
          annotations in the names list. (See NamesList.html for details.) Some
          other adaptations have been made in the use of fonts in the names list
          part of the code charts.
Collation-related Issues
      The Default Unicode Collation Element Table (DUCET) was updated to the Unicode 11.0
        repertoire for UCA 11.0. For the most part, the additions for new scripts and other
        characters are unremarkable, but implementations should be checked to ensure
        the new additions do not cause problems.
Other Issues
			Please also check the following specific items carefully:
			
          - The versioning for the emoji data release associated with
            UTS #51 was bumped from 5.0 directly to 11.0, to enable a
            less confusing synchronization with the UCD proper.
- Data for the 66 new emoji character associated with the
            Unicode 11.0 repertoire was officially released on February 7,
            in order to meet the long setbacks involved in rolling out
            new emoji support. UCD 11.0 beta reviewers should note that
            property values for characters that depend in any direct way on the Emoji 11.0
            data cannot now be changed for Unicode 11.0, because of
            stability requirements.
The following blocks are new in Unicode 11.0.0. Check implementations
				carefully for any range or property value assumptions regarding
				these new blocks. See also the single-block delta charts.
			
			
				
					| Range | Block Name | 
        
          | 1C90..1CBF | Georgian Extended | 
        
          | 10D00..10D3F | Hanifi Rohingya | 
        
          | 10F00..10F2F | Old Sogdian | 
        
          | 10F30..10F6F | Sogdian | 
        
          | 11800..1184F | Dogra | 
        
          | 11D60..11DAF | Gunjala Gondi | 
        
          | 11EE0..11EFF | Makasar | 
        
          | 16E40..16E9F | Medefaidrin | 
        
          | 1D2E0..1D2FF | Mayan Numerals | 
        
          | 1EC70..1ECBF | Indic Siyaq Numbers | 
        
          | 1FA00..1FA6F | Chess Symbols | 
			
			 
      Some blocks have also had font updates; see the 
        single-block delta charts for details. 
        In such cases, careful review of the blocks in question
        is advised, to ensure that there have not been any
        regressions in representative glyph display.
General Issues
			For current proposed updates to the particular UAXes, see
			  Proposed Updates for Standard Annexes
                        or use the links in the navigation bar on this page.
			Particular issues in the UAXes may also be the focus of specific
			  Public Review Issues. 
			Each proposed textual change in a UAX is highlighted, so that you can focus 
			your review on those sections if you have limited time. The changes 
			are also listed in detail in the Modifications sections (linked from the table 
			of contents of each document), and are summarized in
                        UAX changes,
                        so you can check on those areas that might be of most 
			interest.
                        Some links between beta documents and the proposed
			updates for UAXes will not work correctly during the
			beta review period. This is a known problem which does
			not need to be reported, as such links point to
			the eventual final names or revision numbers for the
			released versions.
                        
Stability
                        
Certain character properties for newly assigned characters cannot be
changed after the formal release of each version of the standard, because of the
Character Encoding Stability Policy.
Such character property values need special attention during the beta review process, as they
cannot be corrected after publication. These include:
  - Any property affecting Unicode Normalization, including Decomposition_Mapping, Canonical_Combining_Class, and Composition_Exclusion.
- The determination of whether a character is included in identifiers (XID_Start, XID_Continue).
- Case mappings and case foldings.