Path: Home > GB/T > Page221 > GB/T 13000-2025 Home > Standard_List > GB/T > Page221 > GB/T 13000-2025

GB/T 13000-2025 English PDF

GB/T 13000: Evolution and historical versions

Standard ID	Contents [version]	USD	STEP2	[PDF] delivered in	Standard Title (Description)	Status	PDF
GB/T 13000-2025	English	RFQ	ASK	3 days [Need to translate]	Information technology - Universal coded character set(UCS)	Valid	GB/T 13000-2025
GB 13000-2010	English	RFQ	ASK	20 days [Need to translate]	[GB/T 13000-2010] Information technology -- Universal multiple-octet coded character set (UCS)	Valid	GB 13000-2010
GB 13000.1-1993	English	RFQ	ASK	20 days [Need to translate]	Information technology. Universal multiple. Octet coded character set (UCS). Part 1: Architecture and basic multilingual plane	Obsolete	GB 13000.1-1993

PDF similar to GB/T 13000-2025

Standard similar to GB/T 13000-2025

GB/T 17710 GB 18030 GB/T 37036.1 GB 13000 GB/T 22320 GB/T 22321.1

Basic data

Standard ID	GB/T 13000-2025 (GB/T13000-2025)
Description (Translated English)	Information technology - Universal coded character set(UCS)
Sector / Industry	National Standard (Recommended)
Classification of Chinese Standard	L71
Classification of International Standard	35.040
Word Count Estimation	2938,279
Date of Issue	2025-01-24
Date of Implementation	2025-08-01
Issuing agency(ies)	State Administration for Market Regulation, China National Standardization Administration

GB/T 13000-2025: Information technology - Universal coded character set(UCS)

---This is a DRAFT version for illustration, not a final translation. Full copy of true-PDF in English version (including equations, symbols, images, flow-chart, tables, and figures etc.) will be manually/carefully translated upon your order.
ICS 35.040 CCSL71 National Standard of the People's Republic of China Replace GB/T 13000-2010 Information technology Universal Coded Character Set (UCS) (ISO /IEC 10646.2020,MOD) Released on 2025-01-24 2025-08-01 Implementation State Administration for Market Regulation The National Standardization Administration issued

Preface VII 1 Scope 1 2 Normative references 1 3 Terms and Definitions 2 4 Compliance 8 4.1 General requirements 8 4.2 Compliance of information exchange 8 4.3 Equipment compliance 8 5 Electronic Data Annex 9 6 Overall structure of UCS 10 7 Basic structure and nomenclature 10 7.1 Structure 10 7.2 Character Encoding 11 7.3 Code Type 12 7.4 Character Naming 13 7.5 Character Aliases 14 7.6 Code Point Short Identifier (UID) 14 7.7 UCS sequence identifier 15 7.8 Octet Sequence Identifier 15 8 Revision and Update of UCS 15 9 Subset 15 9.1 Overview 15 9.2 Finite subsets 15 9.3 Selecting a subset 15 10 UCS encoding form 16 10.1 Overview 16 10.2 UTF-8 16 10.3 UTF-16 17 10.4 UTF-32 17 11 UCS encoding scheme 18 11.1 Overview 18 11.2 UTF-8 18 11.3 UTF-16BE 18 11.4 UTF-16LE 18 11.5 UTF-16 18 11.6 UTF-32BE 19 11.7 UTF-32LE 19 11.8 UTF-32 19 12 Control functions and UCS combined use 19 13 Declaration of identification characteristics 20 13.1 Purpose and reason of labeling 20 13.2 Identification of the UCS coding scheme 20 13.3 Identification of graphic character subsets 21 13.4 Identification of control function sets 21 13.5 Identification of the ISO /IEC 2022 coding system 21 14 Structure of code tables and character name lists 22 15 Block and collection names 22 15.1 Block Name 22 15.2 Collection Name 22 16 Mirror Characters in Bidirectional Contexts 23 16.1 Mirror Characters 23 16.2 Directionality of Bidirectional Context 23 17 Special characters 23 17.1 Overview 23 17.2 Spacer Characters 23 17.3 Currency Symbols 24 17.4 Format Characters 24 17.5 Ideographic Descriptors 26 17.6 Variant Selectors and Variant Sequences 27 18 Character appearance 29 19 Compatible characters 29 20 Character order 29 21 Combining characters 29 21.1 Sequence of combining characters 29 21.2 Combination categories and regular order 30 21.3 Display in the code table 30 21.4 Alternative Code Representations 30 21.5 Multiple Combination Characters 30 21.6 Collections containing combining characters 31 21.7 Combining grapheme connectors 31 22 Normalized form 31 23 Characteristics of individual words and symbol vocabulary 32 23.1 The syllable compounding of Hangul (Korean) 32 23.2 Characteristics of the writing system in India and some South Asian countries 32 23.3 Byzantine musical notation 33 23.4 Etymological references for graphic symbols 33 24 CJK Chinese character etymology reference 33 24.1 Etymology Reference List 33 24.2 CJK Chinese Character Etymology Reference File 37 24.3 CJK unified Chinese character etymology reference information representation method 40 24.4 CJK compatible Chinese character etymology reference information representation method 42 25 Etymology of Xixia characters 43 25.1 Etymology Reference List 43 25.2 Etymology of Xixia characters Reference file 43 25.3 The representation of etymological reference information of Xixia characters 44 26 Etymology of Nüshu characters 45 26.1 Etymology Reference List 45 26.2 Etymology of Nüshu characters Reference document 45 27 Character names and notes 46 27.1 Entity Name 46 27.2 Name structure 46 27.3 Single name 47 27.4 Name Invariance 47 27.5 Name Uniqueness 47 27.6 Character names of CJK Chinese characters 48 27.7 Names of Xixia characters 49 27.8 Names of characters in Nüshu 49 27.9 Character names of Khitan small characters 49 27.10 Character names of Hangul (Korean) syllables 49 28 Named UCS sequence identifier 51 29 The structure of the Basic Multilingual Plane (BMP) 52 30 Structure of the Supplementary Multilingual Plane (SMP) for the Coding of Texts and Symbols 54 31 Structure of the Supplementary Ideographic Plane (SIP) 57 32 Structure of the Third Ideographic Plane (TIP) 57 33 Auxiliary Special Purpose Plane (SSP) Structure 58 34 Code table and character name list 58 34.1 Overview 58 34.2 Code Table 58 34.3 List of character names 59 34.4 Standardized Variant Sequence Summary 60 34.5 Code table and character name list 60 Appendix A (Informative) Characters used in identifiers 72 Appendix B (Informative) External References to Character Vocabularies 73 B.1 Character Vocabulary and Its Encoding Method 73 B.2 Identification of ASN.1 Character Abstract Syntax 73 B.3 Identification of ASN.1 Character Transfer Syntax 74 Appendix C (Informative) Notation for Eight-bit Value Representation 75 Appendix D (informative) Recommendation 76 for combined receiving and transmitting equipment with internal memory Appendix E (Normative) Collection of graphic characters for subsets 77 E.1 Collection of coded graphic characters 77 E.2 Block Name List 91 E.3 Fixed collections of the entire UCS (except the Unicode collection) 99 E.4 CJK Collection 104 E.5 Other collections 106 E.6 Unicode Collection 113 Appendix F (informative) Alphabetical list of character names 114 Appendix G (Informative) Format Characters 115 G.1 Overview 115 G.2 General format characters 115 G.3 Format characters applicable to specific text 117 G.4 Interline comment characters 118 G.5 Reverse format characters 118 G.6 Shorthand format characters 118 G.7 Invisible mathematical operators 119 G.8 Western Music Notation 119 G.9 Language notation using tag characters 120 Appendix H (Informative) Ideographic Descriptors 122 H.1 Overview 122 H.2 Syntax of ideographic character description sequences 122 H.3 Individual definitions of ideographic descriptors 122 Appendix I (Informative) CJK Chinese Character Recognition and Sorting Rules 125 I.1 Overview 125 I.2 Approval Procedure125 I.3 Sorting Procedure 128 I.4 Source Font Separation Example 129 I.5 Disagreement with Example 134 I.6 Supplementary Notes on “Recognition and Sorting Rules of CJK Chinese Characters” 135 Appendix J (Informative) Character Naming Guidelines 137 Appendix K (Informative) Names of Hangeul (Korean) Syllables 140 Appendix L (Informative) Supplementary Notes on Some Chinese Characters and Symbols 141 L.1 Overview141 L.2 Names of some Chinese characters and symbols and their English translations in this document 141 L.3 Explanation of some Chinese characters in the code table 142 Appendix M (Informative) Additional Notes on CJK Unified Chinese Characters 144 Appendix N (Normative) Code table and character name list 147 References 2926

Foreword

This document was drafted in accordance with the provisions of GB/T 1.1-2020 "Guidelines for standardization work Part 1.Structure and drafting rules for standardization documents". This document replaces GB/T 13000-2010 "Information Technology Universal Multiple-Octet Coded Character Set (UCS)" and GB/T 13000- Compared with.2010, in addition to structural adjustments and editorial changes, the main technical changes are as follows. a) Added the provisions for the Third Ideographic Plane (TIP), three encoding forms of UCS and seven encoding schemes in the scope (see Chapter 1); b) Some of the terms and definitions have been modified (see Chapter 3, Chapter 4 of the.2010 edition); c) The provisions on information exchange conformity (see 4.2, 2.2 of the.2010 edition) and equipment conformity (see 4.3, 2.3 of the.2010 edition) have been changed; d) Added a chapter on “Electronic Data Annexes” (see Chapter 5); e) Changed the scope of the UCS code space and added the Third Ideographic Plane (TIP) to the overall structure of UCS (see 6 Chapter 7.1 of the.2010 edition, Chapter 5, 6.1); f) Changed the character encoding format (see 7.2, 6.2 of the.2010 edition), deleted the eight-bit order (see 6.3 of the.2010 edition), added the code type specification (see 7.3), and changed the format of the code short identifier (UID) (see 7.6,.2010 edition) 6.5 of the.2011 edition), added the provisions for eight-bit sequence identifiers (see 7.8); g) Added a chapter on "UCS encoding format" (see Chapter 10); h) Added a chapter on "UCS Coding Scheme" (see Chapter 11); i) Deleted the chapter "Level of Implementation" (see Chapter 14 of the.2010 edition); j) Added the UCS encoding scheme identifier (see 13.2), and deleted the identifier of the UCS encoding representation with implementation level (see 16.2 of the.2010 edition); k) The provisions on mirror characters have been changed (see Chapter 16, Chapter 19 and Appendix E of the.2010 edition); l) Added the provisions of ideographic character descriptors (see 17.5), variant selectors and variant selector sequences (see 17.6); m) Added the rules for combination categories and regular order (see 21.2) and combination grapheme connectors (see 21.7); n) added unique spelling rules for the scripts of India and some South Asian countries (see 23.2) and etymological references for graphic symbols (see 23.4); o) Added the CJK Chinese character etymology standard classification (see 24.1), changed the etymology reference list (see 24.1.2010 version 27.1), added Added CJK Chinese character etymology reference file (see 24.2), CJK unified Chinese character etymology reference information representation method (see 24.3), CJK compatible Chinese character etymology reference information representation method (see 24.4); p) Added "Etymology of Xixia characters", "Etymology of Nüshu characters", "Character names and notes" and "Named UCS sequence "Recognition of Symbols" (see Chapters 25, 26, 27, and 28); q) New languages have been added (see Chapter 29, Chapter 30, Chapter 31 and Chapter 33); r) Added a chapter on "The Structure of the Third Ideographic Plane (TIP)" (see Chapter 32); s) The code table, character name list and character source have been changed (see Chapter 34, Chapter 33 of the.2010 edition); t) The list of graphic character sets used for subsets has been changed (see Appendix E, Appendix A of the.2010 edition). This document is modified to adopt ISO /IEC 10646.2020 "Universal Coded Character Set (UCS) for Information Technology" and ISO /IEC 10646.2020/Amd1.2023. This document has the following structural adjustments compared to ISO /IEC 10646.2020 and ISO /IEC 10646.2020/Amd1.2023. --- To avoid the occurrence of suspended sections, 27.5.1, G.1, I.1, I.2.3.1 and I.2.4.1 are added, and the number of the following chapters is postponed in sequence; --- Note 2 of 34.1 corresponds to Appendix M in ISO /IEC 10646.2020; --- Note 4 of 34.2 corresponds to Appendix Q in ISO /IEC 10646.2020; --- Appendix A corresponds to Appendix U in ISO /IEC 10646.2020; --- Appendix B corresponds to Appendix N in ISO /IEC 10646.2020; --- Appendix C corresponds to Appendix K in ISO /IEC 10646.2020; --- Appendix D corresponds to Appendix J in ISO /IEC 10646.2020; --- Appendix E corresponds to Appendix A in ISO /IEC 10646.2020; --- Appendix F corresponds to Appendix G in ISO /IEC 10646.2020; --- Appendix G corresponds to Appendix F in ISO /IEC 10646.2020; --- Appendix H corresponds to Appendix I in ISO /IEC 10646.2020, which adds H.2.1 and H.2.2; --- Appendix I corresponds to Appendix S in ISO /IEC 10646.2020; --- Appendix J corresponds to Appendix L in ISO /IEC 10646.2020; --- Appendix K corresponds to Appendix R in ISO /IEC 10646.2020; --- Appendix M corresponds to Appendix P in ISO /IEC 10646.2020; --- Appendix N corresponds to 34.5 in ISO /IEC 10646.2020; --- Deleted Annex B, Annex C, Annex D, Annex E, Annex H and Annex T of ISO /IEC 10646.2020. The technical differences between this document and ISO /IEC 10646.2020 and ISO /IEC 10646.2020/Amd1.2023 and their reasons are as follows. --- Moved the supplementary content of the entity name formation rules applicable to non-English versions from the notes to the main text to comply with the national standard drafting rules; --- Deleted the contents of the abolished documents or standards and explained them with footnotes to comply with the national standard drafting rules; --- Changed the inaccurate standard numbers cited in international standards and explained them with footnotes to comply with the national standard drafting rules; Other new additions are consistent with ISO /IEC 10646.2020/Amd1.2023. The following editorial changes were made to this document. --- Changed the note of the term "dedicated plane" and added other commonly used translations; --- Added notes on the use of special characters; ---Added content about CJK Chinese character recognition and sorting; --- Added a note on the character source list, the content of which comes from Annex M of ISO /IEC 10646.2020; --- Added a note on the query method of the Korean syllable code mapping table, the content of which comes from Appendix Q of ISO /IEC 10646.2020; --- Added Chinese translation of block names; --- Changed the appendix number of the referenced document in the identification related content of ASN.1 character transfer syntax; --- Added character names that are not listed in the character name list file that are missing from ISO /IEC 10646.2020/Amd1.2023 --- Added Appendix L (informative) "Supplementary explanation of some Chinese characters and symbols"; ---Added references. Please note that some of the contents of this document may involve patents. The issuing organization of this document does not assume the responsibility for identifying patents. This document was proposed and coordinated by the National Technical Committee for Information Technology Standardization (SAC/TC28). This document was drafted by. China Electronics Technology Standardization Institute, Language and Literature Application Research Institute of the Ministry of Education, Software Research Institute of the Chinese Academy of Sciences Institute, Beijing Founder Electronics Co., Ltd. The main drafters of this document. Chen Zhuang, Chen Xiaoyan, He Zhengan, Huang Shanshan, Wang Xiaoming, Wu Jian, Chen Ken. The previous versions of this document and the documents it replaces are as follows. ---First published in.1993 as GB 13000.1-1993 "Information Technology Universal Multiple-Octet Coded Character Set (UCS) Part 1 Part. Architecture and Basic Multilingual Plane"; ---First revised in.2010 as GB/T 13000-2010 "Information Technology Universal Multi-Octet Coded Character Set (UCS)"; ---This is the second revision. Information technology Universal Coded Character Set (UCS)

1 Scope

This document. ---Specifies the architecture of UCS; --- Defines the terms used in UCS; ---Describes the overall structure of the UCS code space; ---Specifies the UCS allocated planes. UCS Basic Multilingual Plane (BMP), Supplementary Multilingual Plane (SMP), Supplementary Ideographic Plane (SMP) the Sigmographic Plane (SIP), the Third Ideographic Plane (TIP), and the Supplementary Special Purpose Plane (SSP); --- Defines graphic character sets for writing in various languages around the world; ---Specifies the names and encodings of graphic characters and format characters of BMP, SMP, SIP, TIP and SSP of UCS; ---Specifies the encoding of control characters and special characters; ---Specifies three encoding forms of UCS. UTF-8, UTF-16 and UTF-32; ---Specifies seven UCS encoding schemes. UTF-8, UTF-16, UTF-16BE, UTF-16LE, UTF-32, UTF-32BE and UTF-32LE; ---Specifies the management methods for future supplementary coded characters. This document is applicable to technical products that have the function of information processing and exchange of graphic characters in various languages around the world. Note. This document does not specify whether the characters it encodes are suitable for use as identifiers in programming languages. Appendix A gives information on characters suitable for use as identifiers. Reference documents.

2 Normative references

The contents of the following documents constitute the essential clauses of this document through normative references in this text. Only the version corresponding to that date is applicable to this document; for undated referenced documents, the latest version (including all amendments) is applicable to this document. ISO /IEC 2022 Information technology character code structure and extension technology Note. GB/T 2311-2000 Information technology character code structure and extension technology (ISO /IEC 2022.1994, IDT) ISO /IEC 6429 Information technology coded character set control functions