Path:
Home >
GB/T >
Page224 > GB/T 45288.2-2025
Price & Delivery
US$599.00 · In stock · Download in 9 secondsGB/T 45288.2-2025: Artificial intelligence - Large-scale model - Part 2: Testing and evaluation for metrics and methods
Delivery: 9 seconds. True-PDF full-copy in English & invoice will be downloaded + auto-delivered via email. See
step-by-step procedureStatus: Valid
| Std ID | Version | USD | Buy | Deliver [PDF] in | Title (Description) |
| GB/T 45288.2-2025 | English | 599 |
Add to Cart
|
5 days [Need to translate]
|
Artificial intelligence - Large-scale model - Part 2: Testing and evaluation for metrics and methods
|
Click to Preview a similar PDF
Basic data
| Standard ID | GB/T 45288.2-2025 (GB/T45288.2-2025) |
| Description (Translated English) | Artificial intelligence - Large-scale model - Part 2: Testing and evaluation for metrics and methods |
| Sector / Industry | National Standard (Recommended) |
| Classification of Chinese Standard | L70 |
| Classification of International Standard | 35.240 |
| Word Count Estimation | 30,396 |
| Date of Issue | 2025-02-28 |
| Date of Implementation | 2025-02-28 |
| Issuing agency(ies) | State Administration for Market Regulation, China National Standardization Administration |
GB/T 45288.2-2025: Artificial intelligence - Large-scale model - Part 2: Testing and evaluation for metrics and methods
---This is a DRAFT version for illustration, not a final translation. Full copy of true-PDF in English version (including equations, symbols, images, flow-chart, tables, and figures etc.) will be manually/carefully translated upon your order.
GB/T 45288.2-2025 English version. Artificial intelligence - Large-scale model - Part 2.Testing and evaluation for metrics and methods
ICS 35.240
CCSL70
National Standard of the People's Republic of China
Artificial Intelligence Big Model
Part 2.Evaluation indicators and methods
Artificial intelligence - Large-scale model- Part 2.Testing and evaluation for metrics and methods
Released on 2025-02-28
2025-02-28 Implementation
State Administration for Market Regulation
The National Standardization Administration issued
Table of Contents
Preface III
Introduction V
1 Scope 1
2 Normative references 1
3 Terms and Definitions 1
4 Abbreviations 1
5 Evaluation indicators 1
5.1 Comprehension Ability Evaluation Indicators 1
5.2 Generation capability evaluation indicators 8
6 Evaluation Methods11
6.1 Overview 11
6.2 Evaluation Dataset 14
6.3 Evaluation Environment 14
6.4 Evaluation Tools 14
6.5 Evaluation Implementation 14
Appendix A (Informative) Evaluation Index Calculation Method 17
A.1 Objective evaluation method 17
A.2 Subjective evaluation method 18
Reference 21
Foreword
This document is in accordance with the provisions of GB/T 1.1-2020 "Guidelines for standardization work Part 1.Structure and drafting rules for standardization documents"
Drafting.
This document is Part 2 of GB/T 45288 "Artificial Intelligence Big Model". GB/T 45288 has been published in the following parts.
--- Part 1.General requirements;
--- Part 2.Evaluation indicators and methods;
--- Part 3.Service capability maturity assessment.
Please note that some of the contents of this document may involve patents. The issuing organization of the document does not assume the responsibility for identifying patents.
This document was proposed and coordinated by the National Information Technology Standardization Technical Committee (SAC/TC28).
This standard was drafted by. China Electronics Technology Standardization Institute, Shanghai Artificial Intelligence Innovation Center, Institute of Automation, Chinese Academy of Sciences,
Ant Group Co., Ltd., Beijing University of Aeronautics and Astronautics, Tsinghua University, Hangzhou Lianhui Technology Co., Ltd., China Railway Construction Corporation
Co., Ltd., Beijing Baidu Netcom Technology Co., Ltd., China Southern Power Grid Co., Ltd., China Mobile Communications Co., Ltd. Research Institute,
China Energy Investment Group Information Technology Co., Ltd., Huawei Cloud Computing Technology Co., Ltd., Shanghai SenseTime Intelligent Technology Co., Ltd.
Alibaba Cloud Computing Co., Ltd., Shenzhen Tencent Computer Systems Co., Ltd., Beijing Qihoo Technology Co., Ltd., Beijing Zhiyuan Artificial Intelligence Co., Ltd.
Research Institute, China Railway Fifth Survey and Design Institute Group Co., Ltd., Beijing Zhipu Huazhang Technology Co., Ltd., Inspur Cloud Information Technology Co., Ltd.,
iFlytek Co., Ltd., China Electric Power Research Institute Co., Ltd., Tianjin University, China Telecom Research Institute, China Central Radio and Television
China Central Television, Beijing Baichuan Intelligent Technology Co., Ltd., Tongfang Knowledge Network Digital Publishing Technology Co., Ltd., Beijing Zhongguancun Laboratory, Shanghai
Harbin Artificial Intelligence Industry Association, China Southern Power Grid Research Institute Co., Ltd., Xidian University, Southwest University of Science and Technology, Harbin
University of Science and Technology, Institute of Software, Chinese Academy of Sciences, Wuhan Institute of Artificial Intelligence, Peking University, Qingdao Hisense Electronic Technology Service Co., Ltd.,
Beijing DeepGlint Information Technology Co., Ltd., Beijing University of Technology, China Southern Power Grid Artificial Intelligence Technology Co., Ltd., China Telecom Group
Co., Ltd., Tianyi Cloud Technology Co., Ltd., Beijing Software Product Quality Testing and Inspection Center Co., Ltd., Beijing Century Good Future Education Technology
Co., Ltd., Beijing Xiaomi Mobile Software Co., Ltd., Beijing Zhixin Microelectronics Technology Co., Ltd., China Mobile Communications Group Co., Ltd., Cloud
Zhisheng Intelligent Technology Co., Ltd., Beijing Zhongguancun Kejin Technology Co., Ltd., Qingdao Haier Technology Co., Ltd., Hangzhou Hikvision Digital Technology Co., Ltd.
Digital Technology Co., Ltd., BOE Technology Group Co., Ltd., Kunlun Digital Intelligence Technology Co., Ltd., Inspur Electronic Information Industry
Co., Ltd., Inspur Software Technology Co., Ltd., Mashang Consumer Finance Co., Ltd., Pengcheng Laboratory, Pingtouge (Shanghai) Semiconductor
Technology Co., Ltd., Qilin Hesheng Network Technology Co., Ltd., Shandong Inspur Science Research Institute Co., Ltd., Shandong Artificial Intelligence Research Institute
Institute, Shanghai Computer Software Technology Development Center, Shanghai Artificial Intelligence Research Institute Co., Ltd., Beijing Ansheng Technology Co., Ltd., Shanghai Suiyuan Technology
Technology Co., Ltd., Shanghai Tianshu Zhixin Semiconductor Co., Ltd., Shenzhen Qianhai Weizhong Bank Co., Ltd., Shenzhen Simo Information Technology
Co., Ltd., Northwestern Polytechnical University, Siemens (China) Co., Ltd., CloudWalk Technology Group Co., Ltd., Shanghai Wenyue Information Technology Co., Ltd.
Company, Zhejiang Dahua Technology Co., Ltd., Wanda Information Co., Ltd., Shanghai Xuanwu Information Technology Co., Ltd., China Mobile Internet
Co., Ltd., Sichuan Changhong Electronics Holding Group Co., Ltd.
The main drafters of this standard.
Huang Xiancui, Sun Chuanxing, Ma Shanshan, Li Dong, Yu Dianhai, Long Yun, Liu Weidong, Jing Dichun, Zheng Zimu, Jiang Hui, Peng Juntao, Hu Zhichao, Zhang Xiangzheng,
Yang Xi, Zheng Zhong, Feng Tao, Zheng Jiajia, Liu Cong, Zhou Fei, Chen Xi, Li Jianxin, Xiong Deyi, Yang Mingchuan, Wang Feng, Mei Jianping, Chen Weipeng, Zhang Hongwei,
Zhang Songyang, Peng Jin, Liu Jing, Liu Aishan, Wang Jiakai, Gao Donghui, Ma Tongsen, Zhang Tianlin, Gao Tiezhu, Chen Xi, Liang Zhihong, He Gang, Yu Wenxin,
Yang Muyun, Meng Lingzhong, Zhu Guibo, Wang Jinqiao, Zheng Ruolin, Shen Zhiyue, Nie Jiandi, Ren Haifeng, Shi Xian, Wu Xihong, Liu Shang, Liu Weiwei, Shi Congcong,
Ding Peng, Liu Xiaoou, Xiang Chao, Xue Dejun, Wang Longyue, Liu Wei, Hu Quanyi, Sun Haoyuan, Sun Lin, Zhao Bimei, Xuan Richeng, Zhao Chunhao, Suo Siliang,
Chen Liming, Jiang Yixin, Wu Shanshan, Gao Pengjun, Kong Hao, Xue Yunzhi, Liu Zitao, Yu Lei, Zheng Zhe, Deng Chao, Liang Jiaen, Cui Mingfei, E Lei, Ren Ye,
Zhang Zhigang, Chen Hongzhi, Wu Shaohua, Wang Kechen, Feng Yue, Li Rui, Li Jinwei, Long Zhenyue, Gao Hui, Zhang Xu, Duan Qiang, Shan Ke, Chen Mingang, Song Haitao,
Liu Yifan, Wang Sishan, Yu Xuesong, Li Bin, Zhang Chi, Zhang Tao, Sheng Ruogu, Sun Jin, Rui Ziwen, Kong Weisheng, Tong Qing, Yang Dengfeng, Sun Wenqing, Zhu Lin,
Yang Lan.
Introduction
Big models have become an important technical means for the development of artificial intelligence and play an important role in leading industrial transformation.
Relevant institutions have successively researched and developed more than 100 large-scale model products and evaluation lists, making it difficult for users to effectively evaluate the technical level of artificial intelligence products.
GB/T 45288 "Artificial Intelligence Big Model" aims to specify the technical requirements, evaluation indicators and service capabilities of general big models.
Force is proposed to consist of five parts.
--- Part 1.General requirements. The purpose is to establish a reference architecture for large models and specify general technical requirements.
--- Part 2.Evaluation indicators and methods. The purpose is to establish the evaluation indicators of large models and describe the evaluation methods.
--- Part 3.Service capability maturity assessment. The purpose is to provide the large model service capability maturity level and assessment method.
--- Part 4.Computer vision big model. The purpose is to define the concept and function of the computer vision big model and specify the technical requirements
and testing methods.
--- Part 5.Multimodal large models. The purpose is to define the concept and function of multimodal large models, specify technical requirements and tests
method.
Artificial Intelligence Big Model
Part 2.Evaluation indicators and methods
1 Scope
This document establishes the evaluation indicators for large AI models and describes the evaluation methods for large AI models.
This document is applicable to model providers, application servers, and application consumers to evaluate and test the capabilities of large models.
Lead the design, development and application of large models.
2 Normative references
The contents of the following documents constitute essential clauses of this document through normative references in this document.
For referenced documents without a date, only the version corresponding to that date applies to this document; for referenced documents without a date, the latest version (including all amendments) applies to
This document.
GB/T 42755-2023 Artificial Intelligence Data Labeling Procedure for Machine Learning
GB/T 45288.1 Artificial Intelligence Large Model Part 1.General Requirements
3 Terms and definitions
The terms and definitions defined in GB/T 45288.1 apply to this document.
4 Abbreviations
The following abbreviations apply to this document.
API. Application Programming Interface
BLEU. Bilingual Evaluation Understudy
5 Evaluation Indicators
5.1 Comprehension Ability Evaluation Indicators
5.1.1 Overview
The evaluation of large model understanding ability is mainly divided into single-modal dimension and multi-modal dimension. The single-modal dimension mainly includes text, image, and audio.
The multimodal dimension mainly includes four secondary dimensions. picture and text, text and sound, picture and sound, and picture and text and sound.
The types of tasks are shown in Table 1.
...