HOME   Cart(0)   Quotation   About-Us Policy PDFs Standard-List
www.ChineseStandard.net Database: 189759 (19 Oct 2025)

GB/T 38667-2020 English PDF

US$339.00 ยท In stock
Delivery: <= 4 days. True-PDF full-copy in English will be manually translated and delivered via email.
GB/T 38667-2020: Information technology - Big data - Guide for data classification
Status: Valid
Standard IDContents [version]USDSTEP2[PDF] delivered inStandard Title (Description)StatusPDF
GB/T 38667-2020English339 Add to Cart 4 days [Need to translate] Information technology - Big data - Guide for data classification Valid GB/T 38667-2020

PDF similar to GB/T 38667-2020


Standard similar to GB/T 38667-2020

GB/T 39117   GB/T 37741   GB/T 38319   GB/T 38664.4   GB/T 38664.1   

Basic data

Standard ID GB/T 38667-2020 (GB/T38667-2020)
Description (Translated English) Information technology - Big data - Guide for data classification
Sector / Industry National Standard (Recommended)
Classification of Chinese Standard L70
Classification of International Standard 35.240.70
Word Count Estimation 18,117
Date of Issue 2020-04-28
Date of Implementation 2020-11-01
Quoted Standard GB/T 4754-2017; GB/T 35295-2017
Issuing agency(ies) State Administration for Market Regulation, China National Standardization Administration
Summary This standard specifies recommendations and guidance on the big data classification process and its classification perspective, classification dimensions, and classification methods. This standard is applicable to guide the classification of big data.

GB/T 38667-2020: Information technology - Big data - Guide for data classification

---This is a DRAFT version for illustration, not a final translation. Full copy of true-PDF in English version (including equations, symbols, images, flow-chart, tables, and figures etc.) will be manually/carefully translated upon your order.
Information technology--Big data--Guide for data classification ICS 35.240.70 L70 National Standards of People's Republic of China Information Technology Big Data Data Classification Guide 2020-04-28 release 2020-11-01 implementation State Administration of Market Supervision and Administration Issued by the National Standardization Management Committee

Contents

Foreword I 1 Scope 1 2 Normative references 1 3 Terms and definitions 1 4 Acronyms 2 5 Classification process 2 5.1 Overview 2 5.2 Classification planning 3 5.3 Classification preparation 3 5.4 Classification implementation 4 5.5 Evaluation of results 5 5.6 Maintenance improvement 5 6 Classification perspective 6 6.1 Overview 6 6.2 Technical selection perspective 6 6.3 Business Application Perspective 6 6.4 Security and privacy protection perspective 6 7 Classification dimension 6 7.1 Overview 6 7.2 Technical selection dimension 7 7.3 Business Application Dimension 9 7.4 Security and privacy protection dimension 12 8 Classification method 12 8.1 Line classification 12 8.2 Face Classification 13 8.3 Hybrid taxonomy 13 Appendix A (Informative Appendix) Big Data Classification Example 14

Foreword

This standard was drafted in accordance with the rules given in GB/T 1.1-2009. Please note that some content of this document may involve patents. The issuer of this document does not assume responsibility for identifying these patents. This standard is proposed and managed by the National Information Technology Standardization Technical Committee (SAC/TC28). This standard was drafted by. Institute of Information Engineering, Chinese Academy of Sciences (State Key Laboratory of Information Security), National Information Center, Inspur Group Co., Ltd., Wisdom China (Beijing) Technology Co., Ltd., Founder International Software (Beijing) Co., Ltd., State Grid Anhui Electric Power Co., Ltd. Company (Electric Power Research Institute), China Railway Research Institute Group Co., Ltd., China Institute of Electronic Technology Standardization, Shanghai Sanlingwei Information Security Co., Ltd., Unicom Big Data Co., Ltd., China Insurance Information Technology Management Co., Ltd., Ninth Party Big Data Information Group Co., Ltd., CLP Great Wall Internet System Application Co., Ltd., Guangdong Power Grid Co., Ltd. Information Center, CEC Big Data Research Institute Co., Ltd., Peking University, Shandong Provincial Computing Center (National Supercomputing Jinan Center). The main drafters of this standard. Chen Chi, Ma Hongxia, Ma Shunan, Tian Xue, Gao Yanan, Huang Xianzhi, Shan Zhen, Zhang Huimin, Zhang Yu, Gu Guangyu, Wu Yanhua, Zheng Jinjin, Yin Zhuo, Ye Lin, Ganlu, Guan Tailu, Li Yanchao, Lang Peipei, Min Jinghua, Wei Lihao, Lu Kai, Zhang Jicai, Feng Nianci, Zhao Junfeng, Shi Congcong, Sun Jiayang. Information Technology Big Data Data Classification Guide

1 Scope

This standard provides advice and guidance on the big data classification process and its classification perspective, classification dimensions, and classification methods. This standard is applicable to guide the classification of big data.

2 Normative references

The following documents are essential for the application of this document. For dated references, only the dated version applies to this article Pieces. For the cited documents without date, the latest version (including all amendments) applies to this document. GB/T 4754-2017 Classification of National Economic Industries GB/T 35295-2017 Information technology big data terminology

3 Terms and definitions

The terms and definitions defined in GB/T 35295-2017 and the following apply to this document. For ease of use, the following list is repeated Certain terms and definitions in GB/T 35295-2017. 3.1 Bigdata It has the characteristics of huge volume, diverse sources, extremely fast generation, and changeability, and it is difficult to effectively deal with traditional data architecture Data from large data sets. Note. Internationally, the four characteristics of big data are generally expressed directly without modification with volume, variety, velocity and variability, and are given Their definition in the context of big data. a) Volume. the size of the data set that constitutes big data. b) diversity. data may come from multiple data warehouses, data fields, or multiple data types. c) Velocity velocity. data flow per unit time. d) variability. other characteristics of big data, namely volume, speed and diversity are all in a volatile state. [GB/T 35295-2017, definition 2.1.1] 3.2 Dataset The data form of the data record aggregation. Note. It can have the characteristics of volume, speed, diversity and volatility of big data. The characteristics of the data set characterize the data itself or static data, and the data When it is transmitted on the network or temporarily resides in the computer memory for reading or updating, it characterizes dynamic data. [GB/T 35295-2017, definition 2.1.46] 3.3 Bigdataclassification According to the attributes or characteristics of big data, distinguish and classify it according to certain principles and methods, and establish a certain classification system and The process of ordering. 3.4 Classification subject Organizations or individuals who sort out and classify big data during the process of big data collection, storage, use, distribution, and deletion. 3.5 Classificationangle The perspective of the classification subject to observe and carry out big data classification activities. 3.6 Classificationdimension One or some common characteristics of the data used to achieve classification. Note. Common data classification dimensions include source of generation, structured features, business attribution, and timeliness of processing requirements. 3.7 Classification method The logical method of arranging and organizing data categories in a certain form according to the selected classification dimension. 3.8 Data distribution The process of transferring data in the form of raw data, processed data, and analysis results to internal or external entities. Note. Data distribution includes various methods such as online or offline, such as data exchange, data transaction, data sharing, data disclosure, etc. 3.9 Category A collection of data with common attributes (or characteristics).

4 Acronyms

The following abbreviations apply to this document. ETL. Extract-Transform-Load FTP. File Transfer Protocol (FileTransferProtocol) SQL. Structured Query Language (StructuredQueryLanguage)

5 Classification process

5.1 Overview The big data classification process is divided into five stages. classification planning, classification preparation, classification implementation, result evaluation, and maintenance improvement, as shown in Figure 1. Figure 1 Big data classification process This chapter standardizes the classification process of big data, and according to the actual application scenarios of big data, classify in Chapter 6, Chapter 7, and Chapter 8, respectively. The three key steps of perspective, classification dimension, and classification method are standardized. For specific classification examples, see Appendix A. 5.2 Classification planning 5.2.1 Select classification perspective The process of selecting a classification perspective includes. a) Clearly classify business scenarios; b) Select the classification perspective according to the business scenario. Note. See Chapter 6 for classification perspective. 5.2.2 Work plan The process of developing a work plan includes. a) Clearly plan the data range to be classified; b) Clarify the classification dimensions and methods to be adopted; c) Clarify the expected classification results; d) Clear implementation plan and schedule of classification work; e) Clarify the evaluation method of the classification results; f) Clarify the maintenance plan for the classification result system. 5.3 Preparation for classification 5.3.1 Status of survey data The survey data status process includes. a) Survey data generation, including but not limited to data generation scenarios, subjects, methods, frequency, sparse and dense, legal compliance Sex, etc.; b) Investigate the current status of data storage, including but not limited to data content format, storage method, storage location, storage volume, etc.; c) Investigating the quality of data, including but not limited to the standardization, completeness, accuracy, consistency, timeliness, and accessibility of the data Sex, etc.; d) Investigate data business types, such as organization personnel management data, business data, financial data, etc.; e) Survey data sensitivity, including but not limited to data confidentiality, security, protection needs, etc.; f) Investigate the application of data, including but not limited to the purpose of data use, application field, method of use, etc.; g) Investigating the timeliness of data, including but not limited to the timeliness requirements of data processing, the timeliness of data value, etc. h) Investigate data ownership, including but not limited to data ownership, management rights, use rights, etc. 5.3.2 Determine the classification object The process of determining the classification object includes. a) Determine the business scenario of data classification; b) Determine the start and end time of data generation; c) Determine the amount of data; d) Determine the frequency of data generation; e) determine the structured characteristics of the data; f) Determine the data storage method; g) Determine the timeliness of data processing; h) Determine the data exchange method; i) determine the source of the data; j) determine the type of data circulation; k) Determine data quality; l) Determine the data sensitivity. 5.3.3 Select classification dimension The process of selecting classification dimensions includes. a) Sort out the data characteristics of the classification perspective; b) Select the classification dimension according to the data characteristics. Note. See Chapter 7 for classification dimensions. 5.3.4 Select classification method The process of selecting a classification method should clearly clarify the order and combination of classification dimensions. Note 1.See Chapter 8 for the classification method. Note 2.If you choose a hybrid classification method, you need to consider which classification dimension is the main and which classification dimension is the supplement. 5.4 Classification implementation 5.4.1 Draft implementation process The proposed implementation process should be combined with the life cycle of big data to formulate a specific classification implementation process, including but not limited to clear implementation steps, Start the implementation work, carry out the implementation work, summarize the implementation process, etc. 5.4.2 Development tool script Development tools/scripts should write classification algorithms according to the implementation process, classification dimensions and classification methods, following software development or scripting The specification develops classification tools/scripts. 5.4.3 Record implementation process The record implementation process should record the various steps of the classification implementation process and its classification results, and output documents. 5.4.4 Output classification results The output classification results should be sorted out the classification results of each step to form a data classification table. 5.5 Evaluation of results 5.5.1 Verification implementation process The verification implementation process includes. a) Check the data classification table to determine whether the classification is reasonable; b) Check the classification process records to clarify the degree of deviation of the classification results from the expected goals; c) Check the classification dimension to ensure that the classification dimension meets business needs and classification goals; d) Check the rationality of the classification method; e) Adjust the big data classification process according to the verification results. 5.5.2 Interviewing related personnel Interview related personnel include. a) Interview the data classification executive, and ask the relevance of the classification perspective, scope, dimensions, methods and business scenarios; b) Interview the data owner, and ask whether the data ownership classification and frequency classification in the data classification results meet the actual conditions International situation; c) Interview with the data manager and ask about the data structure category classification, data storage method category classification and sparseness in the data classification results Whether the degree division and sensitivity degree division conform to the actual situation; d) Interview data users and ask about the real-time division of data processing, classification of exchange methods, and business attribution in the data classification results Whether the classification of categories, the classification of circulation types, etc. meet the actual application situation; e) Check opinions and questions and adjust the big data classification process. 5.5.3 Test classification results Test classification results include. a) Perform a classification script or program on the classified data to see if there are classification results that do not meet the classification strategy; b) Check opinions and questions and adjust the big data classification process. 5.6 Maintenance improvement 5.6.1 Change control Change control includes. a) Analyze the necessity and rationality of the change to determine whether to implement the change; b) Formulate a change plan to assess the impact of the change on big data classification, including changes in classification dimensions and classification methods; c) Perform changes, make changes to the classification results, and record the change process; d) Evaluate the new big data classification results; e) Publish new big data classification results. 5.6.2 Regular evaluation Regular assessments include. a) Regularly evaluate the rationality of big data classification dimensions and methods, and check whether they are consistent with changes in business scenarios and changes in classification perspectives; b) Regularly evaluate the validity and application of big data classification results and check whether they meet the needs of business applications for updates; c) Check comments and questions and adjust the big data classification process.

6 Classification perspective

6.1 Overview The big data classification perspective is divided into a technology selection perspective, a business application perspective, and a security and privacy protection perspective. 6.2 Technical selection perspective Technical selection perspectives include but are not limited to. a) Clarify the frequency of data generation, clarify the data generation rules, determine the data update cycle and storage strategy, and determine the data storage platform configuration Type and other storage resource allocation scheme; b) Clarify the data generation method, analyze the source and quality of the data, and determine the location of the data in the entire data processing process, and Data processing and storage technology; c) Analyze the structured characteristics of the data and determine the data storage and processing scheme; d) Clarify the data storage method, determine the data modeling model and data access method, and support various data application scenarios; e) Sort out the degree of sparse and dense data, clarify the rules of sparse and dense data, determine the data storage strategy and analysis method, and select the data storage Storage plan and analysis plan; f) Clarify data processing timeliness requirements, clarify data processing timing, determine data processing strategies, and choose to include computing platforms and resources Data processing solutions such as matching; g) Sort out data exchange methods, determine data sharing methods and strategies, and support the construction of information exchange systems. 6.3 Business Application Perspective Business application perspectives include but are not limited to. a) Clarify the source of data generation, clarify the data ownership and access rights, and facilitate data tracking and tracing; b) Clarify data application scenarios, determine data business topics, judge data application value, and select data analysis solutions; c) Clarify data distribution scenarios, determine the data application industry, and clarify the types and scope of available data; d) Sort out the data quality, clarify the data application requirements, and determine the data quality management plan. 6.4 Perspective of security and privacy protection Security and privacy protection perspectives include but are not limited to. a) Clarify the security requirements of big data with different degrees of sensitivity during storage, transmission, access and distribution; b) Clarify the privacy protection requirements of big data with different degrees of sensitivity; c) Instruct the classification subject to formulate a privacy protection plan; d) Instruct the classification subject to formulate a safety management plan.

7 Classification dimension

7.1 Overview This chapter gives different classification dimensions from three perspectives of technology selection, business applications and security and privacy protection, and is used to describe each classification Dimensional classification elements, data categories and applicable scenarios. 7.2 Technical selection dimension 7.2.1 Classification by frequency 7.2.1.1 Overview Classification according to the frequency of generation refers to the frequency of data generation (the amount of data generated per unit time or the frequency of reaching the specified amount of data) The data is classified. 7.2.1.2 Classification elements Elements classified by frequency of production include. a) Data generation cycle, such as seconds, minutes, hours, days, weeks, months, quarters, half-years, years, etc.; b) The amount of data generated in a unit period can be expressed in the number of records or in the space occupied by the data, such as millions of records, thousands Ten thousand records, GB level data, TB level data, etc. 7.2.1.3 Category According to the frequency of generation, it can be divided into. annual update data, monthly update data, weekly update data, daily update data, hourly update data, Update data every minute, update data every second, no update data, etc. 7.2.1.4 applicable scenarios Applicable scenarios classified by frequency of generation, such as judging the rationality of resource allocation and data analysis value based on the frequency of data generation. 7.2.2 Classification by production method 7.2.2.1 Overview Classification according to the generation method refers to classifying the data according to the generation method of the data. 7.2.2.2 Classification elements Elements categorized by production method include. a) The way the data is obtained or collected, such as manual collection, collection through information system, etc.; b) The degree of data processing, such as original data, secondary processing data, etc. 7.2.2.3 Category The classification according to the generation method can include. manual data collection, data generated by information system, data generated by perception equipment, original data, secondary addition Data, etc. 7.2.2.4 applicable scenarios Applicable scenarios classified according to production methods, such as determining data collection schemes, data protection schemes and data processing schemes. 7.2.3 Classification by structured features 7.2.3.1 Overview Classification by structured features refers to the classification of data according to the degree of structure of the data. 7.2.3.2 Classification elements Elements classified by structured features include. a) Whether there is a predefined data model; b) Whether the data structure is regular; c) Whether the data length is standardized; d) Whether the data type is fixed. 7.2.3.3 Category Classification according to structured characteristics can be divided into. structured data, such as retail, finance, bioinformatics, geographic data, etc.; unstructured data, Such as images, videos, sensor data, web pages, etc.; semi-structured data, such as application system logs, e-mail, etc. 7.2.3.4 applicable scenarios Applicable scenarios classified by structured features, such as planning data processing and storage architecture according to data structure. 7.2.4 Classification by storage method 7.2.4.1 Overview Classification by storage means to classify the data according to the data storage method suitable for the data. 7.2.4.2 Classification elements Elements classified by storage method include. a) Data model suitable for data modeling, such as relational model, document model, graph model, etc.; b) The query language used for data access, such as SQL, SQL-like, graph query language, etc. 7.2.4.3 Category According to the storage method, it can be divided into. relational database storage data, key-value database storage data, columnar database storage data, graph database Storage data, document database storage data, etc. 7.2.4.4 Applicable scenarios Applicable scenarios classified by storage method, such as selecting the database system used for data storage, determining the application system and data storage system Data access methods, etc. 7.2.5 Classification by sparseness 7.2.5.1 Overview Classification by sparseness refers to classification of data according to the sparseness and density of the data. 7.2.5.2 Classification elements The elements classified by the degree of sparseness mainly include the evaluation criteria of the data sparseness, that is, the data with missing or zero values in the data set. Proportion. If the data with null or zero value less than 50% is dense data, the data with null or zero value greater than or equal to 50% is sparse data. 7.2.5.3 Category According to the degree of sparseness, it can be divided into. dense data and sparse data. 7.2.5.4 applicable scenarios Applicable scenarios classified according to sparseness, such as data value density analysis and judgment based on the magnitude of data per unit time. 7.2.6 Classification by treatment timeliness 7.2.6.1 Overview Classification according to the timeliness of processing refers to the classification of data according to the time delay requirements of data processing. 7.2.6.2 Classification elements Elements classified according to timeliness of processing include. a) Data processing delay time requirements, that is, whether the application scenario has a clear upper limit requirement for processing delay time; b) Timeliness of data value, that is, the effectiveness of data application value over time; c) Data processing volume, that is, how many levels of data need to be processed within the upper limit of delay. 7.2.6.3 Category According to the processing timeliness, it can be divided into. real-time data processing, quasi-real-time data processing and batch data processing. 7.2.6.4 applicable scenarios Applicable scenarios classified according to processing timeliness, such as arranging business sequence and resource investment according to data timeliness requirements. 7.2.7 Classification by exchange method 7.2.7.1 Overview Classification according to the exchange method refers to classifying the data according to the way the data is exchanged between the provider and the receiver. 7.2.7.2 Classification elements Elements classified by exchange include. a) The network status between the two parties of the data exchange, that is, whether the network between the two parties of the exchange is interoperable; b) Synchronous real-time requirements of data between the two exchange parties; c) The amount of data exchanged in a single operation; d) Frequency of data exchange, such as fixed frequency exchange, fixed time exchange or on-demand exchange. 7.2.7.3 Category According to the exchange method, it can be divided into. ETL method, system interface method, FTP method, mobile media copy method, etc. 7.2.7.4 applicable scenarios Applicable scenarios classified by exchange methods, such as the impact of different exchange methods on the convenience of big data sharing, plan the information exchange system System structure, etc. 7.3 Business Application Dimensions 7.3.1 Classification by source 7.3.1.1 Overview Classifica...

Tips & Frequently Asked Questions:

Question 1: How long will the true-PDF of GB/T 38667-2020_English be delivered?

Answer: Upon your order, we will start to translate GB/T 38667-2020_English as soon as possible, and keep you informed of the progress. The lead time is typically 2 ~ 4 working days. The lengthier the document the longer the lead time.

Question 2: Can I share the purchased PDF of GB/T 38667-2020_English with my colleagues?

Answer: Yes. The purchased PDF of GB/T 38667-2020_English will be deemed to be sold to your employer/organization who actually pays for it, including your colleagues and your employer's intranet.

Question 3: Does the price include tax/VAT?

Answer: Yes. Our tax invoice, downloaded/delivered in 9 seconds, includes all tax/VAT and complies with 100+ countries' tax regulations (tax exempted in 100+ countries) -- See Avoidance of Double Taxation Agreements (DTAs): List of DTAs signed between Singapore and 100+ countries

Question 4: Do you accept my currency other than USD?

Answer: Yes. If you need your currency to be printed on the invoice, please write an email to [email protected]. In 2 working-hours, we will create a special link for you to pay in any currencies. Otherwise, follow the normal steps: Add to Cart -- Checkout -- Select your currency to pay.