YY/T 1833.2-2022 PDF EnglishUS$350.00 · In stock · Download in 9 seconds
YY/T 1833.2-2022: Artificial intelligence medical device - Quality requirements and evaluation - Part 2: General requirements for datasets Delivery: 9 seconds. True-PDF full-copy in English & invoice will be downloaded + auto-delivered via email. See step-by-step procedure Status: Valid
Similar standardsYY/T 1833.2-2022: Artificial intelligence medical device - Quality requirements and evaluation - Part 2: General requirements for datasets---This is an excerpt. Full copy of true-PDF in English version (including equations, symbols, images, flow-chart, tables, and figures etc.), auto-downloaded/delivered in 9 seconds, can be purchased online: https://www.ChineseStandard.net/PDF.aspx/YYT1833.2-2022 YY PHARMACEUTICAL INDUSTRY STANDARD ICS 11.040.99 CCS C 30 Artificial intelligence medical device - Quality requirements and evaluation -- Part 2.General requirements for datasets Issued on: JULY 01, 2022 Implemented on: JULY 01, 2023 Issued by. National Medical Products Administration Table of ContentsForeword... 3 Introduction... 4 1 Scope... 5 2 Normative references... 5 3 Terms and definitions... 5 4 Requirements for dataset description... 7 5 Requirements for dataset quality... 13 6 Evaluation of dataset quality compliance... 18 Annex A (normative) Explanation of dataset types... 23 Annex B (informative) Data screening and cleaning instructions... 25 Bibliography... 27 Artificial intelligence medical device - Quality requirements and evaluation -- Part 2.General requirements for datasets1 ScopeThis document specifies the general quality requirements and evaluation methods for datasets used throughout the life cycle of artificial intelligence medical devices. This document is applicable to the development and evaluation of datasets used in the research and development, production, testing, quality control and other aspects of artificial intelligence medical devices.2 Normative referencesThe following referenced documents are indispensable for the application of this document. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies. GB/T 2828.4, Sampling procedures for inspection by attributes -- Part 4.Procedures for assessment of declared quality levels GB/T 2828.11, Sampling procedures for inspection by attributes -- Part 11. Procedures for assessment of declared quality levels for small population GB/T 6378.4, Sampling procedures for inspection by variables -- Part 4.Procedures for assessment of declared quality levels for mean YY/T 1833.1, Artificial Intelligence Medical Devices Quality Requirements and Evaluation -- Part 1.Terminology3 Terms and definitionsFor the purposes of this document, the terms and definitions defined in YY/T 1833.1 as well as the followings apply. 3.1 inspection by attributes An inspection that is carried out with respect to a specified requirement or group of requirements, or that only classifies unit products as qualified or unqualified, or that only counts the number of unqualified units. 4.1.2.1 Compliance statement The dataset description shall provide a compliance statement of the data source. 4.1.2.2 Privacy protection The dataset description shall describe the technical means used to protect the privacy of subjects, such as data de-identification, data anonymization, etc. When appropriate, the dataset description document shall describe the rules for data de-identification or data anonymization. 4.1.2.3 Diversity The dataset description shall provide a description of the diversity of data sources, such as population, collection location, collection equipment, parameter settings, operator qualifications, collection process, collection time, etc. 4.1.2.4 Principles of compliance for data collection The dataset description shall provide the regulations, technical standards, clinical specifications, expert consensus or other references on which the data were collected. 4.1.2.5 Data screening The dataset description shall describe the data entry and exclusion criteria, as well as the methods for data screening, such as manual cleaning and automatic cleaning. NOTE. See Annex B for examples. 4.1.3 Data preprocessing When appropriate, the dataset description shall describe the steps and content of data preprocessing. 4.1.4 Dataset annotation 4.1.4.1 Principles for dataset annotation If the dataset has annotation information, the dataset description shall describe the regulations, technical standards, clinical specifications, expert consensus or other references on which the dataset is annotated. 4.1.4.2 Reference standards If the dataset has annotation information, the dataset description shall describe the establishment rules, scope, storage format and data specifications of the dataset reference standard. If the reference standard is verifiable, the verification method of the reference standard shall be described. 4.1.4.3 Annotation process If the dataset has annotation information, the dataset description shall describe the data annotation and quality control process and clarify the decision-making mechanism. In the case of multiple annotations or multiple annotations, the arbitration mechanism for annotation differences shall be described. 4.1.4.4 Other annotation information If the dataset has annotation information, the dataset description shall describe the scope, data specifications and storage format of other annotation information in addition to the reference standard. 4.1.5 Dataset storage information The dataset description shall describe the data storage information, such as the dataset storage method and storage path, security control, backup, and recovery instructions. If the dataset is stored using cloud services, the cloud service provider name and qualifications, access path, and usage permission instructions shall be provided. 4.1.6 Dataset user access 4.1.6.1 Access control The dataset description shall describe the user access control mechanism, such as user type, permission allocation, and authorization mechanism. 4.1.6.2 Access conditions The dataset description shall describe the conditions required to access the dataset, such as software and hardware configuration, access method, data interface, protocol, tools, etc. 4.1.6.3 Visualization The dataset description shall describe the visual presentation of the dataset information. 4.1.7 Development management The dataset description shall describe the governing standards to which the dataset was developed. 4.2 Dataset identification 4.2.1 Identification The dataset shall display a unique identifier, including the dataset name, version number, and information about the dataset manufacture responsible organization. This can be provided in the form of an attached file and described in detail in the dataset description The dataset description shall state the extent to which the dataset is authentic and trustworthy, including the acquisition and processing of data and metadata. Verifiable evidence shall be presented in written form. 4.3.5 Timeliness The dataset description shall state the extent to which the timeframes required for each step in the dataset development phase meet expectations, taking into account preprocessing, cleaning, labeling, etc. Provide verifiable metrics in written form. 4.3.6 Accessibility The dataset description shall state the extent to which the dataset is accessible. Demonstrate verifiable evidence in written form. 4.3.7 Compliance The dataset description shall state the standard specifications, expert consensus, operating procedures or other references to which the dataset complies. 4.3.8 Confidentiality The dataset description shall describe the measures related to information security and data confidentiality. Verifiable evidence shall be presented in written form. The dataset description shall state the resource consumption required to perform the dataset-related tasks. Verifiable compliance evidence shall be presented in written form, such as the software, hardware, and network configuration required for tasks such as accessing, reading data, previewing, and retrieving. 4.3.10 Precision The dataset description shall describe the closeness of the quantitative information of the data to the true value, taking into account the data elements, metadata, and data annotation results. Provide verifiable indicators in written form, such as spatial/temporal resolution, significant figures, and minimum measurement units. 4.3.11 Traceability The dataset description shall describe the extent to which the data can be traced, taking into account the data collection history, data annotation history, data access traces, and data change traces. 4.3.12 Understandability The dataset description shall use terms that are understandable to the users of the dataset. Provide explanations of the meaning of the data elements, metadata, and annotation results. Present verifiable evidence in written form. 4.3.13 Availability The dataset description shall state the extent to which the dataset can be used and retrieved by authorized users. Verifiable evidence shall be presented in written form. 4.3.14 Portability The dataset description shall state the ability of the dataset to be installed, replaced, or moved from one system to another. Maintain the attributes of the existing quality. Consider the efficiency of data installation, replacement, and movement. Show verifiable evidence in written form. 4.3.15 Recoverability The dataset description shall state the extent to which the dataset can be recovered. Verifiable evidence shall be presented in written form. The dataset description may provide measures for data recovery. The dataset description may provide measures to prevent interruption or failure in the use of the dataset. 4.3.16 Representativeness The dataset description shall analyze the sample composition, proportion, population distribution characteristics, data diversity, and the degree of closeness to the application scenario. Verifiable indicators shall be provided in written form.5 Requirements for dataset quality5.1 Overview The content of this document focuses on the quality characteristics and overall risk of the dataset. It is advisable to conduct a quality assessment of the dataset based on its intended use and application scenarios, and form a technical report as a verification of the dataset quality. 5.2 Quality characteristics 5.2.1 Completeness 5.2.1.1 Accuracy The dataset shall comply with the accuracy statements in the dataset description, such as. a) Accuracy of recorded information; b) Accurate, clear and unambiguous text description; d) Original records, intermediate records and final records shall be consistent. External consistency refers to the correlation between data from different sources, such as. a) Data from different sources shall be consistent in terms of data characteristics; b) Outliers shall be explainable; c) Data from different sources shall comply with the same regulations, technical standards, medical specifications, and other literature requirements during collection and annotation. 5.2.4 Authenticity The dataset shall comply with the dataset description with statements about authenticity, such as. a) Data shall come from real clinical data collection processes. When appropriate, the equipment, personnel, and methods involved in data collection shall comply with technical standards, clinical norms, or expert consensus; b) Data amplification, data synthesis activities, and results shall be traceable and interpretable; c) Metadata shall describe the data truthfully. 5.2.5 Timeliness The time limit for data collection, labeling, circulation, archiving, and change activities shall comply with the statement of timeliness in the dataset description. Dynamically updated datasets shall specify the data update cycle, update method, and update ratio. If the data involves the time series process in clinical diagnosis and treatment, the rationality of the data in terms of clinical timeliness shall be demonstrated. 5.2.6 Accessibility Datasets shall meet the access needs within the scope of the intended use and application scenarios of the dataset. 5.2.7 Compliance Datasets shall comply with the statements about compliance in the dataset description. 5.2.8 Confidentiality Datasets shall comply with the confidentiality statements in the dataset description. Take steps to ensure that they can only be accessed by authorized users. The isolated datasets shall have dataset authorization access mechanism and isolation protection mechanism. There shall be measures to prevent data leakage, data tampering, and data loss, such as data anonymization, physical isolation, data auditing, etc. 5.2.9 Resource utilization The processing and use of the dataset shall be in accordance with the statement of resource availability in the dataset description. 5.2.10 Precision The dataset shall comply with the precision statement in the dataset description document. 5.2.11 Traceability The dataset shall comply with the dataset description regarding traceability, with relevant records, such as. a) Original data source, metadata source, compliance proof; b) Data collection activity records; c) Personnel management records; d) Data annotation process records; e) Blind management records; f) Data circulation records; g) Data questioning, auditing, deactivation, and correction records; h) Records of labeling tools and platform usage; i) Statistical information query of dataset labeling results, including labeling progress, label statistics, labeler progress statistics, difficult case set, etc.; j) Data service anomaly and failure records; k) Data maintenance and backup records; l) Data update records; m) Cloud service provider name, contact information, cloud service type, etc. 5.2.12 Understandability Datasets shall conform to the dataset description's statements regarding understandability. 5.2.13 Availability Datasets shall comply with the availability statements in the dataset description document. 5.2.14 Portability The dataset shall comply with the statement about portability in the dataset description. If the dataset is allowed to be used on different platforms and systems, the data quality shall not change with the platform and system. 5.2.15 Recoverability Measures used to maintain the quality of the dataset and protect against failure events shall be consistent with the statements about recoverability in the dataset description. 5.2.16 Representativeness The data feature hierarchy, epidemiological statistics, sample source diversity, data diversity, etc. of the dataset shall comply with the representative statements in the dataset description. 5.3 Dataset risk analysis 5.3.1 Selection bias The dataset manufacture responsible organization shall analyze the representativeness of the dataset (whether the subject population, collection site, equipment selection, parameter settings, etc. represent clinical reality). 5.3.2 Coverage bias The dataset manufacture responsible organization shall analyze whether the disease composition of the dataset can cover all situations of the target population in the application scenario (single, multiple, concurrent, complex pathology, etc.). 5.3.3 Reference standard bias The dataset manufacture responsible organization shall verify the data annotation results or the process of establishing the reference standard. Analyze the bias of the reference standard. 5.3.4 Verification bias The dataset manufacture responsible organization shall describe the verification method of the reference standard. Analyze the correlation, differences and impact between the data annotation process and the clinical diagnosis process. 5.3.5 Annotation order bias 6.3.6 Timeliness The extraction and inspection of time information from the sampled sample set shall comply with the requirements of 5.2.5. 6.3.7 Accessibility Write test cases based on statements about accessibility in the dataset description. Actual operational verification shall comply with the requirements of 5.2.6. 6.3.8 Compliance Write test cases. Check the compliance of the sampled sample set with relevant regulations, technical standards, and technical specifications, which shall comply with the requirements of 5.2.7. 6.3.9 Confidentiality Write test cases. The authorization access mechanism and data isolation protection mechanism of the sampled sample set shall comply with the requirements of 5.2.8. 6.3.10 Resource utilization Write test cases. Under the operating environment specified by the dataset manufacture responsible organization, operational verification of the dataset shall comply with the requirements of 5.2.9. 6.3.11 Precision If applicable, perform process validation on the dataset annotation tool. The data, metadata, quantitative features included in the annotated results, and the accuracy of the quantitative description of the dataset in the sample set shall meet the requirements of 5.2.10. 6.3.12 Traceability The traceable records of the sampled sample set shall be checked in accordance with the requirements of 5.2.11. 6.3.13 Understandability Refer to the dataset description for actual operation. Preview and interpretation of data shall comply with the requirements of 5.2.12. 6.3.14 Accessibility Refer to the dataset description for actual operation. The use and retrieval of data shall comply with the requirements of 5.2.13.Annex A(normative) Explanation of dataset types The dataset types are divided into the following five dimensions. - Expected use. model training and optimization set, model validation set, third-party performance testing set, clinical evaluation dataset, product quality control set, etc.; - Data sources. public datasets, private datasets, mixed datasets, etc.; - User type. self-use dataset, other use dataset, etc.; - Access management methods. open datasets, closed datasets, etc.; - Update format. static dataset, dynamic dataset, etc. The model training and tuning set refers to the dataset used during the training and tuning stages of artificial intelligence algorithm models. This algorithm model is directly called by artificial intelligence medical devices. The model validation set refers to the testing set of medical device manufacturers. It shall be independent of the model training and tuning dataset. Avoid crossing. Product quality control set refers to the dataset used for verification, daily quality control, and other activities during the clinical use of artificial intelligence medical device products. The product quality control set shall be stored on dedicated media. Third party performance testing set refers to a third-party testing set used for performance testing. The third-party performance test set shall ensure that there is no overlap between the test data and the training and internal validation data used by the tested product (including data indirectly used by the tested product through pre trained models, etc.). This type of dataset is only used for closed testing. After the testing is completed, it shall be ensured that it is removed from the tested product and third-party systems. Avoid targeted tuning. The third-party performance test set shall be stored independently and physically isolated from the outside world. The third-party performance test set shall have measures to prevent unauthorized access and record access activities. The data source of the third-party performance test set shall be traced back to the collection institution and annotator. Clinical evaluation dataset refers to the dataset used for clinical evaluation of artificial intelligence medical devices. The clinical evaluation dataset is also independent of medical device manufacturers. There is no overlap with the training and internal ......Source: Above contents are excerpted from the full-copy PDF -- translated/reviewed by: www.ChineseStandard.net / Wayne Zheng et al. Tips & Frequently Asked Questions:Question 1: How long will the true-PDF of English version of YY/T 1833.2-2022 be delivered?Answer: The full copy PDF of English version of YY/T 1833.2-2022 can be downloaded in 9 seconds, and it will also be emailed to you in 9 seconds (double mechanisms to ensure the delivery reliably), with PDF-invoice.Question 2: Can I share the purchased PDF of YY/T 1833.2-2022_English with my colleagues?Answer: Yes. The purchased PDF of YY/T 1833.2-2022_English will be deemed to be sold to your employer/organization who actually paid for it, including your colleagues and your employer's intranet.Question 3: Does the price include tax/VAT?Answer: Yes. Our tax invoice, downloaded/delivered in 9 seconds, includes all tax/VAT and complies with 100+ countries' tax regulations (tax exempted in 100+ countries) -- See Avoidance of Double Taxation Agreements (DTAs): List of DTAs signed between Singapore and 100+ countriesQuestion 4: Do you accept my currency other than USD?Answer: Yes. www.ChineseStandard.us -- YY/T 1833.2-2022 -- Click this link and select your country/currency to pay, the exact amount in your currency will be printed on the invoice. Full PDF will also be downloaded/emailed in 9 seconds.How to buy and download a true PDF of English version of YY/T 1833.2-2022?A step-by-step guide to download PDF of YY/T 1833.2-2022_EnglishStep 1: Visit website https://www.ChineseStandard.net (Pay in USD), or https://www.ChineseStandard.us (Pay in any currencies such as Euro, KRW, JPY, AUD).Step 2: Search keyword "YY/T 1833.2-2022". Step 3: Click "Add to Cart". If multiple PDFs are required, repeat steps 2 and 3 to add up to 12 PDFs to cart. Step 4: Select payment option (Via payment agents Stripe or PayPal). Step 5: Customize Tax Invoice -- Fill up your email etc. Step 6: Click "Checkout". Step 7: Make payment by credit card, PayPal, Google Pay etc. After the payment is completed and in 9 seconds, you will receive 2 emails attached with the purchased PDFs and PDF-invoice, respectively. Step 8: Optional -- Go to download PDF. Step 9: Optional -- Click Open/Download PDF to download PDFs and invoice. See screenshots for above steps: Steps 1~3 Steps 4~6 Step 7 Step 8 Step 9 |