GB/T 37722-2019 English PDFUS$199.00 · In stock
Delivery: <= 3 days. True-PDF full-copy in English will be manually translated and delivered via email. GB/T 37722-2019: Information technology - Technical requirements for big data storage and processing systems Status: Valid
Basic dataStandard ID: GB/T 37722-2019 (GB/T37722-2019)Description (Translated English): Information technology - Technical requirements for big data storage and processing systems Sector / Industry: National Standard (Recommended) Classification of Chinese Standard: L67 Classification of International Standard: 35.240 Word Count Estimation: 10,125 Date of Issue: 2019-08-30 Date of Implementation: 2020-03-01 Issuing agency(ies): State Administration for Market Regulation, China National Standardization Administration GB/T 37722-2019: Information technology - Technical requirements for big data storage and processing systems---This is a DRAFT version for illustration, not a final translation. Full copy of true-PDF in English version (including equations, symbols, images, flow-chart, tables, and figures etc.) will be manually/carefully translated upon your order. Information technology - Technical requirements for big data storage and processing systems ICS 35.240 L67 National Standards of People's Republic of China information Technology Big data storage and processing system functional requirements 2019-08-30 released 2020-03-01 Implementation State Administration for Market Regulation Issued by China National Standardization Administration Table of contentsPreface Ⅲ 1 Scope 1 2 Normative references 1 3 Terms and definitions 1 4 Abbreviations 2 5 Overview 2 6 Big data storage subsystem functional requirements 3 6.1 Basic requirements 3 6.2 Distributed file storage 3 6.3 Distributed structured data storage 3 6.4 Distributed columnar data storage 3 6.5 Distributed Graph Data Storage 4 7 Big data processing subsystem functional requirements 4 7.1 Basic requirements 4 7.2 Batch Processing Framework 4 7.3 Stream Processing Framework 4 7.4 Graph calculation framework 5 7.5 Memory Computing Framework 5 7.6 Batch Stream Fusion Computing Framework 5ForewordThis standard was drafted in accordance with the rules given in GB/T 1.1-2009. Please note that certain contents of this document may involve patents. The issuing agency of this document is not responsible for identifying these patents. This standard was proposed and managed by the National Information Technology Standardization Technical Committee (SAC/TC28). Drafting organizations of this standard. Huawei Technologies Co., Ltd., China Electronics Standardization Research Institute, Inspur Electronic Information Industry Co., Ltd. Company, Shanghai Computer Software Technology Development Center, Qinzhi Digital Technology Co., Ltd., Shenzhen Kingdee Tianyan Middleware Co., Ltd., New H3C Technology Co., Ltd., ZTE Corporation, Hangzhou Zhongao Technology Co., Ltd., Tianjin Nanda General Data Technology Co., Ltd. The main drafters of this standard. Zhao Hua, Fu Haifang, Wei Fenglin, Zhang Qun, Su Zhiyuan, Zhao Jiang, Chen Mingang, Liu Zhenyu, Cai Lizhi, Liu Yufeng, Li Zheng, Lin Lin, Pan Zijian, Wu Wenfeng, Zhang Dongtao, Zhu Song, Shen Beilun, Lu Yun, Wu Xin, Zhang Shaoyong, Li Bing, Yin Zhuo, Sun Jiayang. information Technology Big data storage and processing system functional requirements1 ScopeThis standard specifies the distributed file storage, distributed structured data storage, and distributed columnar data of big data storage and processing systems Storage, distributed graph data storage, batch processing framework, stream processing framework, graph computing framework, memory computing framework and batch stream fusion computing framework, etc. The functional requirements. This standard applies to the design, development and application deployment of big data storage and processing systems.2 Normative referencesThe following documents are indispensable for the application of this document. For dated reference documents, only the dated version applies to this article Pieces. For undated references, the latest version (including all amendments) applies to this document. GB/T 35295-2017 Information Technology Big Data Terminology3 Terms and definitionsThe following terms and definitions defined in GB/T 35295-2017 apply to this document. 3.1 Graph database A non-relational database that uses graph theory to store entities and their relationship information. Note 1.The data model of the graph database is composed of nodes and edges (ie, the relationship between nodes). Note 2.The graph database supports functions such as graph query, graph traversal and graph analysis, and is suitable for exploration and discovery of complex relationships. 3.2 Batch processing Decompose a large job into multiple tasks to be processed by multiple nodes separately, and then summarize the results of multiple tasks after the decomposition Get up to get the final analysis results of the computing framework, with high availability, high scalability, high concurrency and other capabilities. 3.3 Stream processing The computing power capable of real-time processing of streaming data with real-time, high-speed, borderless, and instantaneous characteristics. 3.4 Graph calculation An abstract expression of a "graph" structure of data based on "graph theory", and the calculation mode on this data structure. Note. In graph calculation, the basic data structure expression includes. nodes, edges, weights, etc. 3.5 Memory computing A data processing technology that prioritizes the use of memory to calculate and analyze data. 3.6 Batch stream fusion computing It can support batch processing and stream processing at the same time. 3.7 Scatter-aggregate The processing form of large data sets, in which the required calculations are divided and distributed on multiple nodes, and the overall result is combined by the results of each node. And become. 3.8 Tenant One or more cloud service users who share access to a set of physical and virtual resources.4 AbbreviationsThe following abbreviations apply to this document.5 overviewThe big data storage and processing system consists of a big data storage subsystem and a big data processing subsystem. The overall framework is shown in Figure 1.among them. a) Big data storage subsystem. Provides distributed storage management of big data, covering a variety of storage methods, including distributed file storage, Distributed structured data storage, distributed column data storage, distributed graph data storage; b) Big data processing subsystem. Provides processing of structured, unstructured and semi-structured data, involving multiple calculation/processing frameworks, Including batch processing framework, stream processing framework, graph computing framework, memory computing framework, batch stream fusion computing framework.6 Big data storage subsystem functional requirements6.1 Basic requirements The basic requirements of the big data storage module are as follows. a) It should support operations such as data upload, data download, catalog view, catalog creation, catalog deletion, and permission modification; b) Standards and open data access APIs should be supported to manipulate data; c) The functions of data loading tools should be provided to meet the requirements of big data storage and processing systems, traditional relational databases, and other file systems Exchange data and files between; d) It should have high availability design and requirements for key nodes (components); f) It should support data management functions such as batch update and deletion of data; g) It should support streaming real-time data storage and real-time query. 6.2 Distributed file storage The requirements for distributed file storage are as follows. a) File upload, download, read and write, copy, move, delete, access control and other functions should be provided; b) A fault-tolerant mechanism for files and a high-availability mechanism for the system should be provided, including functions such as data block backup and rapid system recovery; c) The check and synchronization function of file data should be provided to ensure the integrity and consistency of the data; d) Distributed elastic expansion function should be provided, supporting dynamic addition and deletion of nodes; e) Compression, encryption and decryption functions for stored data should be provided; f) Quick search functions should be provided to support unified search, cataloging, addition and deletion operations of data resources; g) Document search, batch operation, recycle bin, snapshot and other functions shall be provided; h) It is advisable to provide the function of packaging small files into large files for centralized storage; i) A storage quota function should be provided, which can control quotas based on the storage space of the directory and the number of files. 6.3 Distributed structured data storage The requirements for distributed structured data storage are as follows. a) A distributed storage mechanism for structured data should be provided to realize the scalability of data storage; b) API interface should be provided to realize various query operations of data; c) Multi-table association function should be provided; d) The consistency of distributed storage of data should be supported; e) It should support mixed storage of rows and columns, and support the storage of tables organized in row or column format; f) It should support rank conversion. 6.4 Distributed columnar data storage The requirements for distributed columnar data storage are as follows. a) The function of storing data in key-value form should be provided; b) User rights management functions based on tables, column families, and columns should be provided. Rights management operations include read, write, create, etc.; c) Should provide the function of column encryption for data in the database according to user needs; d) Data backup and recovery functions should be provided, including library-level backup and recovery, backup recovery progress/historical record viewing, etc.; f) It should provide the function of combining and storing multiple business tables with similar functions or related. 6.5 Distributed graph data storage The requirements for distributed graph data storage are as follows. a) A data model consisting of nodes and edges (i.e. the relationship between nodes) should be supported; b) Graph query, graph traversal and graph analysis functions should be provided; d) Should support single-node, multi-node multi-layer relationship expansion query; e) Should support the shortest path and optimal path traversal search; f) Should support the inheritance operation of vertices and attributes; g) Should support the long-task asynchronous conversation mechanism.7 Functional requirements for big data processing subsystem7.1 Basic requirements The basic requirements of the big data processing module are as follows. a) It should support the scheduling and configuration of heterogeneous resources such as CPU, memory, and GPU; b) Should support the horizontal expansion of the computing framework; c) Should support setting priority for tasks, and scheduling resources according to task priority; d) Should support centralized management of global resources; e) Should support static resource allocation strategies and dynamic resource allocation strategies; f) A hierarchical structure of matching organizations should be provided to support multi-level queue resource management; queue resources are strictly isolated, that is, no Exceeds the upper limit of resources allocated to the queue; g) Resource elasticity and preemption should be supported, that is, when there are free resources, tenants can use more than their configured resources; when the system is busy, such as The resources of other tenants do not meet the original configuration requirements, they can preempt the resources of the tenant that exceed the configuration; h) Should support resource management, job scheduling and data loading, and various distributed computing framework scheduling; i) It should support automatic scheduling of tasks according to the dependencies between tasks to improve the automation of the processing system; j) It is advisable to support the dynamic allocation of computing resources according to job requirements, and automatically manage and recover resources; k) It should support the automatic completion of job scheduling, and support the dependency relationship of multiple tasks within the job described in the form of a loop-free directed graph; l) Should support the ability to schedule complex tasks. 7.2 Batch processing framework The batch processing framework requirements are as follows. a) Offline analysis of multiple data types should be supported, including structured and unstructured data; b) Real-time reporting of the progress and status of offline computing tasks should be supported; c) It should support the linkage execution of multi-node offline tasks; d) Multi-language development interface that should support analysis tasks; e) Should support job scheduling; f) The decentralized-aggregate approach should be supported; g) Should support batch computing framework running on distributed resource management. 7.3 Stream Processing Framework The stream processing framework requirements are as follows. a) It should support obtaining real-time message data from data sources, complete high-throughput, low-latency real-time calculations, and output the results to the message Queue or persist; Note. The data source of streaming data is generally message queue, TCP connection, etc. b) User-level access control functions should be provided to support the creation, browsing, suspension, activation, deactivation, etc. of message processing tasks Operations, and record audit logs for user-level operations; c) Real-time analysis tasks using sliding windows should be supported, and the time window size should be adjustable; d) Should support fault tolerance, that is, in the event of a fault, the system has a fault tolerance mechanism to handle the fault; e) It should support high fault tolerance, that is, there are abnormalities in nodes, processes, etc. during message processing, and the processing can be redeployed The capacity of the unit. 7.4 Graph calculation framework The graph calculation framework requirements are as follows. a) APIs with built-in graph data query classes should be provided to support synchronous computing models or asynchronous computing models to write iterative algorithms; b) Full import, incremental import and custom import of detailed data should be supported; d) It should support graph data expression based on the attribute graph model, including the label and attribute type definition on the node/edge; e) Should support the built-in common graph index calculation function to describe the topological structure characteristics of the graph; f) Distributed graph calculation and query that achieve horizontal expansion should be supported; g) Concurrent query of graph data should be supported. 7.5 Memory Computing Framework The memory computing framework requirements are as follows. a) It should support the provision of data processing capabilities through distributed memory computing and DAG execution engine; b) Should support the realization of horizontal expansion, support automatic load balancing; c) It should support multiple data types, including data processing of structured data, semi-structured data, and unstructured data; d) It is advisable to provide highly abstract operators to quickly build distributed data processing applications; e) It is advisable to support docking with non-relational databases, that is, read data in non-relational databases without data migration. 7.6 Batch Stream Fusion Computing Framework The requirements of the batch stream fusion computing framework are as follows. a) Should support batch stream integration and unified query SQL language; b) It should support streaming SQL in multiple scenarios, such as location information analysis, etc.; c) Should support common time windows, including jumping windows, sliding windows, etc.; d) It should support the pattern recognition of batch and stream data based on SQL language; f) It should support event-driven stream processing to reduce processing delay; g) Should support processing out-of-order event streams, window calculations, CEP, etc.; h) Should support the scheduling of complex tasks, such as support for deep learning training and MPI tasks. ......Tips & Frequently Asked Questions:Question 1: How long will the true-PDF of GB/T 37722-2019_English be delivered?Answer: Upon your order, we will start to translate GB/T 37722-2019_English as soon as possible, and keep you informed of the progress. The lead time is typically 1 ~ 3 working days. The lengthier the document the longer the lead time.Question 2: Can I share the purchased PDF of GB/T 37722-2019_English with my colleagues?Answer: Yes. The purchased PDF of GB/T 37722-2019_English will be deemed to be sold to your employer/organization who actually pays for it, including your colleagues and your employer's intranet.Question 3: Does the price include tax/VAT?Answer: Yes. Our tax invoice, downloaded/delivered in 9 seconds, includes all tax/VAT and complies with 100+ countries' tax regulations (tax exempted in 100+ countries) -- See Avoidance of Double Taxation Agreements (DTAs): List of DTAs signed between Singapore and 100+ countriesQuestion 4: Do you accept my currency other than USD?Answer: Yes. If you need your currency to be printed on the invoice, please write an email to Sales@ChineseStandard.net. In 2 working-hours, we will create a special link for you to pay in any currencies. Otherwise, follow the normal steps: Add to Cart -- Checkout -- Select your currency to pay. |