Home Cart Quotation About-Us
www.ChineseStandard.net
SEARCH

GB/T 38667-2020 English PDF

US$339.00 ยท In stock
Delivery: <= 4 days. True-PDF full-copy in English will be manually translated and delivered via email.
GB/T 38667-2020: Information technology - Big data - Guide for data classification
Status: Valid
Standard IDUSDBUY PDFLead-DaysStandard Title (Description)Status
GB/T 38667-2020339 Add to Cart 4 days Information technology - Big data - Guide for data classification Valid

Similar standards

GB/T 39117   GB/T 37741   GB/T 38319   GB/T 38664.4   GB/T 38664.1   

Basic data

Standard ID: GB/T 38667-2020 (GB/T38667-2020)
Description (Translated English): Information technology - Big data - Guide for data classification
Sector / Industry: National Standard (Recommended)
Classification of Chinese Standard: L70
Classification of International Standard: 35.240.70
Word Count Estimation: 18,117
Date of Issue: 2020-04-28
Date of Implementation: 2020-11-01
Quoted Standard: GB/T 4754-2017; GB/T 35295-2017
Issuing agency(ies): State Administration for Market Regulation, China National Standardization Administration
Summary: This standard specifies recommendations and guidance on the big data classification process and its classification perspective, classification dimensions, and classification methods. This standard is applicable to guide the classification of big data.

GB/T 38667-2020: Information technology - Big data - Guide for data classification

---This is a DRAFT version for illustration, not a final translation. Full copy of true-PDF in English version (including equations, symbols, images, flow-chart, tables, and figures etc.) will be manually/carefully translated upon your order.
Information technology--Big data--Guide for data classification ICS 35.240.70 L70 National Standards of People's Republic of China Information Technology Big Data Data Classification Guide 2020-04-28 release 2020-11-01 implementation State Administration of Market Supervision and Administration Issued by the National Standardization Management Committee

Contents

Foreword I 1 Scope 1 2 Normative references 1 3 Terms and definitions 1 4 Acronyms 2 5 Classification process 2 5.1 Overview 2 5.2 Classification planning 3 5.3 Classification preparation 3 5.4 Classification implementation 4 5.5 Evaluation of results 5 5.6 Maintenance improvement 5 6 Classification perspective 6 6.1 Overview 6 6.2 Technical selection perspective 6 6.3 Business Application Perspective 6 6.4 Security and privacy protection perspective 6 7 Classification dimension 6 7.1 Overview 6 7.2 Technical selection dimension 7 7.3 Business Application Dimension 9 7.4 Security and privacy protection dimension 12 8 Classification method 12 8.1 Line classification 12 8.2 Face Classification 13 8.3 Hybrid taxonomy 13 Appendix A (Informative Appendix) Big Data Classification Example 14

Foreword

This standard was drafted in accordance with the rules given in GB/T 1.1-2009. Please note that some content of this document may involve patents. The issuer of this document does not assume responsibility for identifying these patents. This standard is proposed and managed by the National Information Technology Standardization Technical Committee (SAC/TC28). This standard was drafted by. Institute of Information Engineering, Chinese Academy of Sciences (State Key Laboratory of Information Security), National Information Center, Inspur Group Co., Ltd., Wisdom China (Beijing) Technology Co., Ltd., Founder International Software (Beijing) Co., Ltd., State Grid Anhui Electric Power Co., Ltd. Company (Electric Power Research Institute), China Railway Research Institute Group Co., Ltd., China Institute of Electronic Technology Standardization, Shanghai Sanlingwei Information Security Co., Ltd., Unicom Big Data Co., Ltd., China Insurance Information Technology Management Co., Ltd., Ninth Party Big Data Information Group Co., Ltd., CLP Great Wall Internet System Application Co., Ltd., Guangdong Power Grid Co., Ltd. Information Center, CEC Big Data Research Institute Co., Ltd., Peking University, Shandong Provincial Computing Center (National Supercomputing Jinan Center). The main drafters of this standard. Chen Chi, Ma Hongxia, Ma Shunan, Tian Xue, Gao Yanan, Huang Xianzhi, Shan Zhen, Zhang Huimin, Zhang Yu, Gu Guangyu, Wu Yanhua, Zheng Jinjin, Yin Zhuo, Ye Lin, Ganlu, Guan Tailu, Li Yanchao, Lang Peipei, Min Jinghua, Wei Lihao, Lu Kai, Zhang Jicai, Feng Nianci, Zhao Junfeng, Shi Congcong, Sun Jiayang. Information Technology Big Data Data Classification Guide

1 Scope

This standard provides advice and guidance on the big data classification process and its classification perspective, classification dimensions, and classification methods. This standard is applicable to guide the classification of big data.

2 Normative references

The following documents are essential for the application of this document. For dated references, only the dated version applies to this article Pieces. For the cited documents without date, the latest version (including all amendments) applies to this document. GB/T 4754-2017 Classification of National Economic Industries GB/T 35295-2017 Information technology big data terminology

3 Terms and definitions

The terms and definitions defined in GB/T 35295-2017 and the following apply to this document. For ease of use, the following list is repeated Certain terms and definitions in GB/T 35295-2017. 3.1 Bigdata It has the characteristics of huge volume, diverse sources, extremely fast generation, and changeability, and it is difficult to effectively deal with traditional data architecture Data from large data sets. Note. Internationally, the four characteristics of big data are generally expressed directly without modification with volume, variety, velocity and variability, and are given Their definition in the context of big data. a) Volume. the size of the data set that constitutes big data. b) diversity. data may come from multiple data warehouses, data fields, or multiple data types. c) Velocity velocity. data flow per unit time. d) variability. other characteristics of big data, namely volume, speed and diversity are all in a volatile state. [GB/T 35295-2017, definition 2.1.1] 3.2 Dataset The data form of the data record aggregation. Note. It can have the characteristics of volume, speed, diversity and volatility of big data. The characteristics of the data set characterize the data itself or static data, and the data When it is transmitted on the network or temporarily resides in the computer memory for reading or updating, it characterizes dynamic data. [GB/T 35295-2017, definition 2.1.46] 3.3 Bigdataclassification According to the attributes or characteristics of big data, distinguish and classify it according to certain principles and methods, and establish a certain classification system and The process of ordering. 3.4 Classification subject Organizations or individuals who sort out and classify big data during the process of big data collection, storage, use, distribution, and deletion. 3.5 Classificationangle The perspective of the classification subject to observe and carry out big data classification activities. 3.6 Classificationdimension One or some common characteristics of the data used to achieve classification. Note. Common data classification dimensions include source of generation, structured features, business attribution, and timeliness of processing requirements. 3.7 Classification method The logical method of arranging and organizing data categories in a certain form according to the selected classification dimension. 3.8 Data distribution The process of transferring data in the form of raw data, processed data, and analysis results to internal or external entities. Note. Data distribution includes various methods such as online or offline, such as data exchange, data transaction, data sharing, data disclosure, etc. 3.9 Category A collection of data with common attributes (or characteristics).

4 Acronyms

The following abbreviations apply to this document. ETL. Extract-Transform-Load FTP. File Transfer Protocol (FileTransferProtocol) SQL. Structured Query Language (StructuredQueryLanguage)

5 Classification process

5.1 Overview The big data classification process is divided into five stages. classification planning, classification preparation, classification implementation, result evaluation, and maintenance improvement, as shown in Figure 1. Figure 1 Big data classification process This chapter standardizes the classification process of big data, and according to the actual application scenarios of big data, classify in Chapter 6, Chapter 7, and Chapter 8, respectively. The three key steps of perspective, classification dimension, and classification method are standardized. For specific classification examples, see Appendix A. 5.2 Classification planning 5.2.1 Select classification perspective The process of selecting a classification perspective includes. a) Clearly classify business scenarios; b) Select the classification perspective according to the business scenario. Note. See Chapter 6 for classification perspective. 5.2.2 Work plan The process of developing a work plan includes. a) Clearly plan the data range to be classified; b) Clarify the classification dimensions and methods to be adopted; c) Clarify the expected classification results; d) Clear implementation plan and schedule of classification work; e) Clarify the evaluation method of the classification results; f) Clarify the maintenance plan for the classification result system. 5.3 Preparation for classification 5.3.1 Status of survey data The survey data status process includes. a) Survey data generation, including but not limited to data generation scenarios, subjects, methods, frequency, sparse and dense, legal compliance Sex, etc.; b) Investigate the current status of data storage, including but not limited to data content format, storage method, storage location, storage volume, etc.; c) Investigating the quality of data, including but not limited to the standardization, completeness, accuracy, consistency, timeliness, and accessibility of the data Sex, etc.; d) Investigate data business types, such as organization personnel management data, business data, financial data, etc.; e) Survey data sensitivity, including but not limited to data confidentiality, security, protection needs, etc.; f) Investigate the application of data, including but not limited to the purpose of data use, application field, method of use, etc.; g) Investigating the timeliness of data, including but not limited to the timeliness requirements of data processing, the timeliness of data value, etc. h) Investigate data ownership, including but not limited to data ownership, management rights, use rights, etc. 5.3.2 Determine the classification object The process of determining the classification object includes. a) Determine the business scenario of data classification; b) Determine the start and end time of data generation; c) Determine the amount of data; d) Determine the frequency of data generation; e) determine the structured characteristics of the data; f) Determine the data storage method; g) Determine the timeliness of data processing; h) Determine the data exchange method; i) determine the source of the data; j) determine the type of data circulation; k) Determine data quality; l) Determine the data sensitivity. 5.3.3 Select classification dimension The process of selecting classification dimensions includes. a) Sort out the data characteristics of the classification perspective; b) Select the classification dimension according to the data characteristics. Note. See Chapter 7 for classification dimensions. 5.3.4 Select classification method The process of selecting a classification method should clearly clarify the order and combination of classification dimensions. Note 1.See Chapter 8 for the classification method. Note 2.If you choose a hybrid classification method, you need to consider which classification dimension is the main and which classification dimension is the supplement. 5.4 Classification implementation 5.4.1 Draft implementation process The proposed implementation process should be combined with the life cycle of big data to formulate a specific classification implementation process, including but not limited to clear implementation steps, Start the implementation work, carry out the implementation work, summarize the implementation process, etc. 5.4.2 Development tool script Development tools/scripts should write classification algorithms according to the implementation process, classification dimensions and classification methods, following software development or scripting The specification develops classification tools/scripts. 5.4.3 Record implementation process The record implementation process should record the various steps of the classification implementation process and its classification results, and output documents. 5.4.4 Output classification results The output classification results should be sorted out the classification results of each step to form a data classification table. 5.5 Evaluation of results 5.5.1 Verification implementation process The verification implementation process includes. a) Check the data classification table to determine whether the classification is reasonable; b) Check the classification process records to clarify the degree of deviation of the classification results from the expected goals; c) Check the classification dimension to ensure that the classification dimension meets business needs and classification goals; d) Check the rationality of the classification method; e) Adjust the big data classification process according to the verification results. 5.5.2 Interviewing related personnel Interview related personnel include. a) Interview the data classification executive, and ask the relevance of the classification perspective, scope, dimensions, methods and business scenarios; b) Interview the data owner, and ask whether the data ownership classification and frequency classification in the data classification results meet the actual conditions International situation; c) Interview with the data manager and ask about the data structure category classification, data storage method category classification and sparseness in the data classification results Whether the degree division and sensitivity degree division conform to the actual situation; d) Interview data users and ask about the real-time division of data processing, classification of exchange methods, and business attribution in the data classification results Whether the classification of categories, the classification of circulation types, etc. meet the actual application situation; e) Check opinions and questions and adjust the big data classification process. 5.5.3 Test classification results Test classification results include. a) Perform a classification script or program on the classified data to see if there are classification results that do not meet the classification strategy; b) Check opinions and questions and adjust the big data classification process. 5.6 Maintenance improvement 5.6.1 Change control Change control includes. a) Analyze the necessity and rationality of the change to determine whether to implement the change; b) Formulate a change plan to assess the impact of the change on big data classification, including changes in classification dimensions and classification methods; c) Perform changes, make changes to the classification results, and record the change process; d) Evaluate the new big data classification results; e) Publish new big data classification results. 5.6.2 Regular evaluation Regular assessments include. a) Regularly evaluate the rationality of big data classification dimensions and methods, and check whether they are consistent with changes in business scenarios and changes in classification perspectives; b) Regularly evaluate the validity and application of big data classification results and check whether they meet the needs of business applications for updates; c) Check comments and questions and adjust the big data classification process.

6 Classification perspective

6.1 Overview The big data classification perspective is divided into a technology selection perspective, a business application perspective, and a security and privacy protection perspective. 6.2 Technical selection perspective Technical selection perspectives include but are not limited to. a) Clarify the frequency of data generation, clarify the data generation rules, determine the data update cycle and storage strategy, and determine the data storage platform configuration Type and other storage resource allocation scheme; b) Clarify the data generation method, analyze the source and quality of the data, and determine the location of the data in the entire data processing process, and Data processing and storage technology; c) Analyze the structured characteristics of the data and determine the data storage and processing scheme; d) Clarify the data storage method, determine the data modeling model and data access method, and support various data application scenarios; e) Sort out the degree of sparse and dense data, clarify the rules of sparse and dense data, determine the data storage strategy and analysis method, and select the data storage Storage plan and analysis plan; f) Clarify data processing timeliness requirements, clarify data processing timing, determine data processing strategies, and choose to include computing platforms and resources Data processing solutions such as matching; g) Sort out data exchange methods, determine data sharing methods and strategies, and support the construction of information exchange systems. 6.3 Business Application Perspective Business application perspectives include but are not limited to. a) Clarify the source of data generation, clarify the data ownership and access rights, and facilitate data tracking and tracing; b) Clarify data application scenarios, determine data business topics, judge data application value, and select data analysis solutions; c) Clarify data distribution scenarios, determine the data application industry, and clarify the types and scope of available data; d) Sort out the data quality, clarify the data application requirements, and determine the data quality management plan. 6.4 Perspective of security and privacy protection Security and privacy protection perspectives include but are not limited to. a) Clarify the security requirements of big data with different degrees of sensitivity during storage, transmission, access and distribution; b) Clarify the privacy protection requirements of big data with different degrees of sensitivity; c) Instruct the classification subject to formulate a privacy protection plan; d) Instruct the classification subject to formulate a safety management plan.

7 Classification dimension

7.1 Overview This chapter gives different classification dimensions from three perspectives of technology selection, business applications and security and privacy protection, and is used to describe each classification Dimensional classification elements, data categories and applicable scenarios. 7.2 Technical selection dimension 7.2.1 Classification by frequency 7.2.1.1 Overview Classification according to the frequency of generation refers to the frequency of data generation (the amount of data generated per unit time or the frequency of reaching the specified amount of data) The data is classified. 7.2.1.2 Classification elements Elements classified by frequency of production include. a) Data generation cycle, such as seconds, minutes, hours, days, weeks, months, quarters, half-years, years, etc.; b) The amount of data generated in a unit period can be expressed in the number of records or in the space occupied by the data, such as millions of records, thousands Ten thousand records, GB level data, TB level data, etc. 7.2.1.3 Category According to the frequency of generation, it can be divided into. annual update data, monthly update data, weekly update data, daily update data, hourly update data, Update data every minute, update data every second, no update data, etc. 7.2.1.4 applicable scenarios Applicable scenarios classified by frequency of generation, such as judging the rationality of resource allocation and data analysis value based on the frequency of data generation. 7.2.2 Classification by production method 7.2.2.1 Overview Classification according to the generation method refers to classifying the data according to the generation method of the data. 7.2.2.2 Classification elements Elements categorized by production method include. a) The way the data is obtained or collected, such as manual collection, collection through information system, etc.; b) The degree of data processing, such as original data, secondary processing data, etc. 7.2.2.3 Catego......
Image     

Tips & Frequently Asked Questions:

Question 1: How long will the true-PDF of GB/T 38667-2020_English be delivered?

Answer: Upon your order, we will start to translate GB/T 38667-2020_English as soon as possible, and keep you informed of the progress. The lead time is typically 2 ~ 4 working days. The lengthier the document the longer the lead time.

Question 2: Can I share the purchased PDF of GB/T 38667-2020_English with my colleagues?

Answer: Yes. The purchased PDF of GB/T 38667-2020_English will be deemed to be sold to your employer/organization who actually pays for it, including your colleagues and your employer's intranet.

Question 3: Does the price include tax/VAT?

Answer: Yes. Our tax invoice, downloaded/delivered in 9 seconds, includes all tax/VAT and complies with 100+ countries' tax regulations (tax exempted in 100+ countries) -- See Avoidance of Double Taxation Agreements (DTAs): List of DTAs signed between Singapore and 100+ countries

Question 4: Do you accept my currency other than USD?

Answer: Yes. If you need your currency to be printed on the invoice, please write an email to Sales@ChineseStandard.net. In 2 working-hours, we will create a special link for you to pay in any currencies. Otherwise, follow the normal steps: Add to Cart -- Checkout -- Select your currency to pay.