GB 18030-2022 PDF in English
GB 18030-2022 (GB18030-2022) PDF English
Standard ID | Contents [version] | USD | STEP2 | [PDF] delivered in | Name of Chinese Standard | Status |
GB 18030-2022 | English | 5005 |
Add to Cart
|
0-9 seconds. Auto-delivery.
|
Information technology - Chinese coded character set
| Valid |
GB 18030-2005 | English | 4690 |
Add to Cart
|
0-9 seconds. Auto-delivery.
|
Information technology -- Chinese coded character set
| Obsolete |
GB 18030-2000 | English | RFQ |
ASK
|
3 days
|
Information technology-Chinese ideograms coded character set for information interchange-Extension for the basic set
| Obsolete |
Standards related to (historical): GB 18030-2022
PDF Preview
GB 18030-2022: PDF in English GB
NATIONAL STANDARD OF THE
PEOPLE’S REPUBLIC OF CHINA
ICS 35.040
CCS L 71
GB 18030-2022
Replacing GB 18030-2005
Information technology - Chinese coded character set
ISSUED ON: JULY 19, 2022
IMPLEMENTED ON: AUGUST 01, 2023
Issued by: State Administration for Market Regulation;
Standardization Administration of the People's Republic of China.
Table of Contents
Foreword ... i
1 Scope ... 0
2 Normative references ... 0
3 Terms and definitions ... 0
4 Repertoire ... 1
5 Overall structure ... 2
6 Sequence of characters ... 4
7 Code point allocation ... 4
8 Explanation of some characters and codes ... 7
9 Implementation level ... 7
Annex A (normative) Character table of double-byte ... 9
Annex B (normative) Ideographic descriptors ... 91
Annex C (normative) Character table of four-byte ... 92
Annex D (informative) Explanation of some characters and codes ... 546
Annex E (informative) Code positions of Chinese characters in "General Standard
Chinese Character List" ... 549
Bibliography ... 742
ii
Foreword
This document was drafted in accordance with the rules given in GB/T 1.1-2020
"Directives for standardization - Part 1: Rules for the structure and drafting of
standardizing documents".
This document replaces GB 18030-2005 "Information technology - Chinese coded
character set". Compared with GB 18030-2005, in addition to the structural
modifications and editorial changes, the main technical changes in this document are
as follows:
a) Add the applicable objects of this document (see Chapter 1 of this Edition);
b) In the double-byte coding area, change the GB/T 13000 code positions
corresponding to 10 vertical punctuation marks and 8 Chinese character
components. Delete 6 repeated coded Chinese character components and 9
repeated coded Chinese characters (see Annex D of this Edition, Annex A of
Edition 2005);
c) In the four-byte coding area, change 18 GB/T 13000 code positions (see Annex D
of this Edition, Annex D of Edition 2005);
d) In the part of four-byte code 0x82358F33~0x82359636, add 66 new Chinese
characters added by CJK unified Chinese characters (see Annex C of this Edition);
e) In the part of four-byte code 0x9835F738~0x98399E36, add 4149 Chinese
characters of CJK unified Chinese character extension C (see Annex C of this
Edition);
f) In the part of four-byte code 0x98399F38~0x9839B539, add 222 Chinese
characters of CJK unified Chinese character expansion D (see Annex C of this
Edition);
g) In the part of four-byte code 0x9839B632~0x9933FE33, add 5762 Chinese
characters of CJK unified Chinese character extension E (see Annex C of this
Edition);
h) In the part of four-byte code 0x99348138~0x9939F730, add 7473 Chinese
characters of CJK unified Chinese character expansion F (see Annex C of this
Edition);
i) In the part of four-byte code 0x81398B32~0x8139A035, add 214 Kangxi radicals
(see Annex C of this Edition);
j) In the part of four-byte code 0x8134F932~0x81358437, add 83 Xishuangbanna
New Dai characters (see Annex C of this Edition);
iii
Information technology - Chinese coded character set
1 Scope
This document specifies the hexadecimal representation of Chinese graphic characters
and their binary codes used in information technology.
This document applies to the processing, exchange, storage, transmission, presentation,
input and output of Chinese and other graphic character information.
This document is applicable to technical products with information processing and
exchange functions of Chinese and other text and graphic characters, including but not
limited to the software products represented by input methods, optical character
recognition (OCR), editing and proofreading, machine translation, speech synthesis,
text transcription, intelligent writing, etc., as well as the hardware products represented
by computers, communication terminal equipment, e-book readers, learning machines,
etc.
2 Normative references
The following referenced documents are indispensable for the application of this
document. For dated references, only the edition cited applies. For undated references,
the latest edition of the referenced document (including any amendments) applies.
GB/T 2312-1980, Code of Chinese graphic character set for information
interchange - Primary set
GB/T 11383-1989, Information process in 8-bit code for information interchange -
Structure and rules for implementation
GB/T 13000, Information technology - Universal multiple - Octet coded character
set (UCS)
3 Terms and definitions
For the purposes of this document, the following terms and definitions apply.
3.1 character
An element in a collection of elements used to organize, control, or represent data.
3.2 coded character
Character (3.1) and its coded representation.
3.3 private use area
An area that can be specified by the user of a product conforming to this document.
3.4 repertoire
A specified set of characters (3.1) represented by a coded character (3.2) set.
3.5 reserved zone
Areas reserved for future specified by this document.
4 Repertoire
4.1 Overview
The characters included in this document are coded in single-byte, double-byte or four-
byte.
4.2 Part of single-byte
In this document, the part of single-byte includes all 128 characters from 0x00 to 0x7F
of GB/T 11383-1989.
4.3 Part of double-byte
The part of double-byte includes all graphic characters in GB/T 2312-1980, CJK unified
Chinese characters and some graphic characters in GB/T 13000. The characters in the
part of double-byte are in accordance with the provisions in Annex A. Among them, the
graphics, code positions and functions of ideographic descriptors shall comply with the
provisions of Annex B.
NOTE: GB/T 13000 uniformly encodes Chinese characters used in China, Japan, South Korea,
Vietnam and other countries and regions. Chinese characters with unique abstract glyphs are
assigned a separate code position. Chinese characters with different sources but the same abstract
glyphs are given a common code position. The encoded Chinese characters are called CJK unified
Chinese characters (CJK Unified Ideographs), where CJK means China, Japan, and Korea.
4.4 Part of four-byte
The part of four-byte includes 66 CJK unified Chinese characters (9FA6~9FEF,
excluding 9FB4~9FBB) in GB/T 13000 other than the above-mentioned double-byte
characters, CJK unified Chinese character extension A, CJK unified Chinese character
extension B, CJK unified Chinese character extension C, CJK unified Chinese character
extension D, CJK unified Chinese character extension E, CJK unified Chinese character
extension F and the characters of ethnic minorities that have been coded in GB/T 13000.
The characters in the part of four-byte follow the provisions of Annex C.
5 Overall structure
In the text, all numbers marked with 0x are in hexadecimal. Those not marked with 0x
are in decimal. All coded representations in the appendix are expressed in hexadecimal.
All other numbers are expressed in decimal.
The part of single-byte adopts the encoding structure of GB/T 11383-1989. Use code
points 0x00~0x7F.
The part of double-byte adopts two octet strings to represent a character. Its first byte
code point is from 0x81~0xFE. The tail byte code points are 0x40~0x7E and
0x80~0xFE respectively.
The part of four-byte adopts 0x30~0x39 not used in GB/T 11383-1989 as the suffix to
expand the double-byte code. The encoding range is 0x81308130~0xFE39FE39. The
encoding range of the first byte of a four-byte character is 0x81~0xFE. The encoding
range of the second byte is 0x30~0x39. The encoding range of the third byte is
0x81~0xFE. The encoding range of the fourth byte is 0x30~0x39. That is:
0x81308130 ~ 0x81308139;
0x81308230 ~ 0x81308239;
...
0x8130FE30 ~ 0x8130FE39;
0x81318130 ~ 0x81318139;
...
0x8131FE30 ~ 0x8131FE39;
...
0x82308130 ~ 0x82308139;
...
0x8230FE30 ~ 0x8230FE39;
...
0xFE308130 ~ 0xFE308139;
This document specifies three implementation levels. System software products that
meet the corresponding implementation level shall provide input and output functions
for all characters within the corresponding implementation level.
9.2 Implementation level 1
Implementation level 1 supports CJK unified Chinese characters (i.e.,
0x82358F33~0x82359636) and CJK unified Chinese character extension A (i.e.,
0x8139EE39~0x82358738) of the single-byte coded part, double-byte coded part and
four-byte coded part of this document.
Any product to which this document applies shall meet the requirements for
implementation level 1.
NOTE: According to the needs of software applications, implementation level 1 can also choose to
support any one or more non-Chinese characters listed in Table 3.
9.3 Implementation level 2
Implementation level 2 contains implementation level 1. In addition, implementation
level 2 also supports encoded Chinese characters that are not included in
implementation level 1 in the "General Standard Chinese Character List". See Annex E
for the code positions and glyphs of the Chinese characters included in the "General
Standard Chinese Character List" in this document.
The system software and supporting software shall meet the requirements for
implementation level 2.
NOTE: System software and supporting software include but not limited to operating system,
database management system, and middleware (see GB/T 36475 for information on software
product classification).
9.4 Implementation level 3
Implementation level 3 contains implementation level 2. In addition, implementation
level 3 also supports all Chinese characters specified in this document and Kangxi
radicals in Table 3.
Products used for government services and public services shall meet the requirements
of level 3.
NOTE: Government services and public service industries include but are not limited to railway
transportation, road transportation, water transportation, air transportation, multimodal
transportation and transportation agency, postal services, monetary and financial services, insurance,
land management, health, national institutions, social security, etc. (see GB/T 4754 for industry
classification information).
...... Source: Above contents are excerpted from the PDF -- translated/reviewed by: www.chinesestandard.net / Wayne Zheng et al.
|