Powered by Google www.ChineseStandard.net Database: 189759 (9 Jun 2024)

GB/T 38673-2020 PDF in English

GB/T 38673-2020 (GB/T38673-2020, GBT 38673-2020, GBT38673-2020)
Standard IDContents [version]USDSTEP2[PDF] delivered inName of Chinese StandardStatus
GB/T 38673-2020English205 Add to Cart 0-9 seconds. Auto-delivery. Information technology -- Big data -- Basic requirements for big data systems Valid

Standards related to: GB/T 38673-2020

GB/T 38673-2020: PDF in English (GBT 38673-2020)

GB/T 38673-2020
ICS 35.240
L 67
Information technology - Big data - Basic
requirements for big data systems
Issued by: State Administration for Market Regulation;
Standardization Administration of the People’s Republic of
Table of Contents
Foreword ... 3 
1 Scope ... 4 
2 Normative references ... 4 
3 Terms and definitions ... 4 
4 Abbreviations ... 5 
5 Big data system framework ... 5 
6 Functional requirements ... 7 
7 Non-functional requirements ... 14 
Information technology - Big data - Basic
requirements for big data systems
1 Scope
This Standard specifies the functional requirements and non-functional
requirements of big data systems.
This Standard is applicable to the design, model selection, acceptance and
testing of various big data system requirements.
2 Normative references
The following documents are indispensable for the application of this document.
For dated references, only the dated version applies to this document. For
undated references, the latest edition (including all amendments) applies to this
GB/T 35295-2017, Information technology - Big data - Terminology
GB/T 35589-2017, Information technology - Big data - Technical reference
3 Terms and definitions
Terms and definitions determined by GB/T 35295-2017 and the following ones
are applicable to this document. For ease of use, some of the terms and
definitions in GB/T 35295-2017 are repeated below.
3.1 Big data system
The system that implements all or part of the big data reference architecture.
[GB/T 35295-2017, Definition 2.1.14]
3.2 Distributed computing
A computing mode that covers the storage layer and the processing layer and
is used to implement multi-type programming algorithm models.
c) It shall provide column conversion, row conversion and table conversion
functions of structured data;
d) It shall provide data loading function, to support the loading of cleaned and
converted data to the data analysis module;
e) It should provide data comparison function before and after cleaning;
f) It should support data conversion function of unstructured data.
6.3 Data storage module
The data storage module requirements are as follows:
a) It shall provide data storage function, to support the storage of structured
data, unstructured data and semi-structured data.
b) It shall provide the function of exchanging data or files with relational
databases and other file systems.
c) Support distributed file storage, to realize the following functions:
1) It shall support basic operations of the file system, including upload,
download, read and write, copy, move, delete, rename, permission
modification, etc.;
2) It shall support multi-copy storage and recovery functions of data blocks;
3) It should support the function of fast retrieval of files, and support the
unified retrieval, cataloging, adding and deleting operations of data
4) It should support data compression storage function.
d) Support distributed column data storage, to achieve the following functions:
1) It shall support the function of storing data in the form of key-value;
2) It should support user authority management functions that are based
on tables, column families, and columns. Authority management
operations include read, write, and create.
e) Support distributed structured data storage, to achieve the following
1) It should support distributed storage of structured data, to ensure the
scalability and consistency of data storage;
1) Built-in graph data query API, support synchronous or asynchronous
computing model to write iterative algorithms;
2) Online graph analysis and query function;
3) Graph data expression that is based on the attribute graph model,
including the label and attribute type definition on the node/edge;
4) Built-in common graph index calculation function, to describe the
topological structure characteristics of graphs.
d) It should support memory computing, to realize the following functions:
1) Provide data processing capabilities through distributed memory
computing and DAG execution engine;
2) Support multiple data types, including data processing of structured data,
unstructured data, and semi-structured data.
e) It should support the batch stream integration computing framework, to
achieve the following functions:
1) Batch stream integration unified query SQL language;
2) Streaming SQL in multiple scenarios, such as location information
analysis, etc.;
3) Common time windows, including jumping windows, sliding windows, etc.
f) It should support automatic scheduling of tasks according to the
dependencies between tasks.
g) It should support the description of multi-task dependencies within the job
in the form of a directed acyclic graph.
h) It should provide the ability to dispatch complex tasks.
6.5 Data analysis module
The data analysis module requirements are as follows:
a) Support data query, to realize the following functions:
1) It shall provide the function of querying through a standard database
connection interface;
2) It shall provide the function of querying through the REST API query
3) It should support data statistics on real-time streams;
4) It should support the sorting of streaming data;
5) It should support the association with static tables;
6) It should support the associated processing of multiple data streams.
f) It should support interactive on-line analysis, to achieve the following
1) Perform distributed on-line analysis of data through structured query
language, such as OLAP;
2) Perform ad hoc query of data through structured query language;
3) Use visualization middleware to display data analysis results;
4) Define the calculation formula and parameter configuration during the
interactive analysis process;
5) Automatically save and roll back during interactive analysis;
6) Save and publish analysis results during interactive analysis;
7) Interactive data analysis based on online on-line analysis.
g) It should support visual process editing operations, to achieve the
following functions:
1) Perform process editing and revision through drag;
2) Support workflow dispatch trigger mechanism, configurable trigger time
or trigger event;
3) Support the persistent storage of process editing results.
6.6 Data visualization module
The requirements of the visualization module are as follows:
a) It should support the use of conventional charts to display data, such as
tables, bar charts, pie charts, line charts, heat maps;
b) It should support the API of third-party data visualization tools.
6.7 Data access module
d) It shall provide service management functions, including the management
of big data system component services;
e) It should provide the health check management function, to support the
realization of cluster health check through a graphical interface.
7 Non-functional requirements
7.1 Reliability requirements
7.1.1 High availability
High availability requirements are as follows:
a) It shall provide the system automatic fault detection and management
b) It shall ensure that there is no single point failure risk for system
c) When any node of the cluster fails, there shall be no service interruption,
data loss or data inconsistency;
d) When any unit of the cluster fails, the system operation shall not be
e) It shall guarantee that the system operates without any problems for a
long time without interruption.
7.1.2 Data redundant storage and distribution
Data redundancy storage and distribution requirements are as follows:
a) It shall provide the metadata multi-copy memory function; the failure of
any node will not affect the system's ability to continue to provide services;
b) It shall provide the master copy planning function that is based on partition
fault tolerance, with the ability to plan the physical distribution of each copy
data in advance.
7.1.3 Data backup and recovery
The data backup and recovery requirements are as follows:
a) It shall provide distributed file storage backup and recovery functions;
b) It shall configure authority for users according to the principle of minimizing
c) It shall support the allocation of authority for users according to the
granularity of the data table level and the data column level;
d) It shall support the allocation of authority for users according to different
operation types (such as adding, deleting, modifying, checking, executing).
7.3.3 Log management
The log management requirements are as follows:
a) It shall provide the function of recording system operation logs, to record
important operations of users;
b) It shall ensure that the system operation log cannot be deleted, modified
or overwritten;
c) The operation log shall include date, time, operator information, operation
type, operation description and operation result;
d) It shall provide functions of statistics, query, analysis and report generation
of system operation logs.
7.3.4 Data security
The data security requirements are as follows:
a) It shall provide data storage encryption and decryption functions, to
support database-level data encryption;
b) It shall provide encrypted transmission function of system sensitive data,
and the encryption key can be replaced;
c) It should support data encryption at the data column level.
7.4 Scalability requirements
The system scalability requirements are as follows:
a) It shall provide online cluster expansion and reduction functions;
b) It shall provide offline cluster expansion and reduction functions.
7.5 Maintainability requirements
The system maintainability requirements are as follows:
Source: Above contents are excerpted from the PDF -- translated/reviewed by: www.chinesestandard.net / Wayne Zheng et al.