| US$1094.00 · In stockDelivery: <= 7 days. True-PDF full-copy in English will be manually translated and delivered via email.
 GBZ43768-2024: Information and documentation - Statistics and quality issues for web archiving
 Status: Valid
 
	
		
			| Standard ID | Contents [version] | USD | STEP2 | [PDF] delivered in | Standard Title (Description) | Status | PDF |  
			| GB/Z 43768-2024 | English | 1094 | Add to Cart | 7 days [Need to translate] | Information and documentation - Statistics and quality issues for web archiving | Valid | GB/Z 43768-2024 |  
	 
       PDF similar to GBZ43768-2024 
 Basic data             | Standard ID | GB/Z 43768-2024 (GB/Z43768-2024) |           | Description (Translated English) | Information and documentation - Statistics and quality issues for web archiving |           | Sector / Industry | National Standard |           | Classification of Chinese Standard | A14 |           | Classification of International Standard | 01.140.20 |           | Word Count Estimation | 54,591 |           | Date of Issue | 2024-03-15 |           | Date of Implementation | 2024-10-01 |           | Issuing agency(ies) | State Administration for Market Regulation, China National Standardization Administration | GBZ43768-2024: Information and documentation - Statistics and quality issues for web archiving---This is a DRAFT version for illustration, not a final translation. Full copy of true-PDF in English version (including equations, symbols, images, flow-chart, tables, and figures etc.) will be manually/carefully translated upon your order.GB /Z 43768-2024.Statistics and quality issues for online archiving of information and documents
ICS 01.140.20
CCSA14
Guiding technical documents of the People's Republic of China on national standardization
Statistics and quality issues in online archiving of information and documents
webarchiving
(ISO /T R14873.2013,IDT)
Released on 2024-03-15
2024-10-01 Implementation
State Administration for Market Regulation
The National Standardization Administration issued
 Table of ContentsPreface III
Introduction IV
1 Scope 1
2 Normative references 1
3 Terms and Definitions 1
4 Methods and Purpose of Web Archiving 7
4.1 Collection methods 7
4.2 Access and Description Methods 9
4.3 Saving Method 11
4.4 Legal basis for web archiving12
4.5 Other reasons for web archiving 13
5 Statistics 14
5.1 Overview 14
5.2 Resource Collection Construction 15
5.3 Resource Collection Representation 20
5.4 Resource Collection Usage24
5.5 Network Archive Storage 28
5.6 Network Archive Cost 30
6 Quality indicators 32
6.1 Overview 32
6.2 Restrictions 33
6.3 Description 33
7 Uses and Benefits41
7.1 Overview 41
7.2 Intended use and audience 42
7.3 Benefits to User Groups 42
7.4 Statistics by user group 42
7.5 Network archiving process and related performance indicators 44
References 46
Figure 1 Statistics of usage by user group 43
Figure 2 Network archiving process and corresponding performance indicators 45
Table 1 HTTP status code list 16
Table 2 Core statistical data of resource collection construction 20
Table 3 Core statistics of resource collection representation24
Table 4 Basic statistics for evaluating archive usage 26
Table 5 Summary statistics of high-level representation of archive usage 27
Table 6 Core statistics on resource collection usage 27
Table 7 Statistics related to metadata preservation29
Table 8 Core statistics of resource collection preservation30
Table 9 Core statistics of resource collection costs 32
Table 10 Intended use and audience 42
Table 11 Terminology used in Figure 144ForewordThis document is in accordance with the provisions of GB/T 1.1-2020 "Guidelines for standardization work Part 1.Structure and drafting rules for standardization documents"
Drafting.
This document is equivalent to ISO /T R14873.2013 "Statistics and quality issues for information and documentation network archiving", and the file type is
IEC 's technical reports have been adjusted to become my country's national standardization guiding technical documents.
This document adds a chapter on “Normative References”.
Please note that some of the contents of this document may involve patents. The issuing organization of this document does not assume the responsibility for identifying patents.
The following minimal editorial changes were made to this document.
--- In order to enhance readability, some examples are replaced with domestic examples while retaining the examples in the international standards;
--- In view of the fact that there is no legal deposit agency for online information archiving in my country, the relevant statements in Chapter 1 are modified.
This document is proposed and coordinated by the National Technical Committee for Information and Documentation Standardization (SAC/TC4).
The drafting units of this document are. Documentation and Information Center of Chinese Academy of Sciences, National Library of China, Archives of Chinese Academy of Sciences, and Peking University Library.
The main drafters of this document are. Wu Zhenxin, Zhang Dongrong, Pan Yanan, Dun Wenjie, Zhu Jiali, Qu Yunpeng, Sun Chao, Xie Jing, Fu Honghu, Shan Songyan,
Xue Jie, Wu Xinyu, Kong Beibei, Hu Jiying, Chen Zijun, Zhang Jing.IntroductionThis document is developed to guide the management and evaluation of network archiving and network archiving products in my country.
Web archiving refers to the selection, capture, storage, and preservation of snapshots of Internet resources over time.
In the late.1990s, it was foreseen that archiving Internet resources would become an important part of future research, business, and government.
To record, start implementing web archiving. Internet resources are considered part of cultural heritage and can be preserved like printed books. Many participants
The network archives see this as an extension of their long-standing mission to protect the nation's cultural heritage and are subject to many national laws and regulations such as statutory fees.
Recognition and support of the existing system.
The Internet provides a variety of resources, including text, pictures, movies, audio and other multimedia formats.
In addition to the web pages that are linked to, there are newsgroups, newsletters, blogs, and interactive services (such as games) that are provided using various transmission and communication protocols.
The goal of a web archive is to automatically collect copies of Internet resources (usually on a regular basis) using collection software.
Realize the playback of resources, including internal connections, such as through hypertext links, to present the same effect as the original environment as much as possible. Network
The main goal of the archive is to preserve web records permanently in as original a state as possible for a variety of academic, professional, and private purposes.
Web archiving is an emerging but expanding activity that requires the continued introduction of new methods and tools to keep pace with the rapid development of web technologies.
Different archiving institutions have different perceptions of strategic importance, available approaches, and legal requirements, leading to the emergence of a variety of Internet archives.
Resource archiving methods, ranging from single web page crawling to full top-level domain crawling. Web archiving maturity of different organizations, etc.
For some organizations, web archiving is already a regular part of their business, while for others, this challenge is just beginning.
Experiment plan.
Based on the scale and purpose of the collection, network archiving strategies can be divided into two categories. bulk collection and selective collection.
A collection, such as a national domain collection, aims to capture a snapshot of an entire domain (or a subset of it). A selective collection is much smaller, more focused, and more
Frequent, often based on a rule, such as subject matter, event, format (such as audio or video files), or agreement with content owners.
The key difference between these two strategies is the degree of quality control, that is, the evaluation of the collected websites to determine whether they meet the predefined quality standards.
The scale of the domain collection (so large) that it is impossible to manually verify the collected resources and the live version of the resources.
This method is a common quality assurance method in selective collection.
This paper aims to demonstrate that web archives, as part of a broader collection of cultural heritage resources, can be used in a similar way to traditional library workflows.
This document describes the collection construction, characterization, description, conservation, use and organization structure, and
While some adjustments may be needed in practice, most aspects of the traditional collection management workflow are still applicable to the Web in principle.
Archive.
This document provides an overview of the current state of web archiving, with an emphasis on the definition and use of web archiving statistics and quality indicators.
The generation of statistical data depends on the acquisition, indexing or browsing software used. Choosing different software may lead to different results.
It does not present specific or recommended software, but rather provides a set of metrics to help assess the overall performance and quality of web archives.
Statistics and quality issues in online archiving of information and documents1 ScopeThis document defines statistics, terminology and quality standards for web archiving. This document takes into account libraries, archives, museums, research
The needs and practices of a wide range of institutional organizations, including the Centre for International Cooperation and Development and the Cultural Heritage Foundation.
This document is intended for experts directly involved in web archiving, typically web archiving agency leadership, engineers, and preservation managers.
It is also useful to funding agencies and stakeholders in web archiving. The terminology used in this document attempts to
Ability to represent a wide range of interests and expertise held by the audience, with a balance between computer science, management, and librarianship.
This document is not intended for the management of academic and commercial electronic resources, such as electronic journals, electronic newspapers, or electronic books, which are usually managed using
Different management systems store and process them separately. Although they are considered Internet resources, they are not included in this document as specific web archives.
Some organizations also collect electronic documents that are distributed over networks, such as through publishers' electronic repositories and repository systems.
The principles and techniques used in this type of collection are very different from those used in network archiving, so this article will not discuss them in detail.
Statistics and quality indicators of the documents may not be applicable.
This document focuses on the principles and methods of web archiving and does not cover other ways of collecting Internet resources.
Sources, especially those that are not distributed over the Internet (such as communications distributed in the form of e-mail), are not collected through network archiving technology.
The data are not collected by other methods, but are collected by other methods, which are not within the scope of application of this document.2 Normative referencesThis document has no normative references.3 Terms and definitionsThe following terms and definitions apply to this document.
3.1
Access
A successful request for an online service provided by the library (3.36).
Note 1.A visit is a period of user activity that usually begins when the user connects to an online service provided by the library and ends explicitly (by logging off or
The termination activity ends either by exiting (leaving the database) or implicitly (timeout due to user inactivity).
Note 2.Visits to the library website (3.52) are considered virtual visits.
NOTE 3 Requests (3.36) to general entry or gateway pages (3.33) are not included.
Note 4.Requests initiated by search engines (3.36) are excluded as far as possible.
[Source. ISO 2789.2022, 3.2.1]
3.2
Access Toolaccesstool
Specialized software for finding, retrieving, and playing back archived Internet resources.
Note. This tool is implemented by running multiple independent software packages in combination.
3.3
Information necessary to properly manage digital objects in repositories.
 
 Tips & Frequently Asked Questions:Question 1: How long will the true-PDF of GBZ43768-2024_English be delivered?Answer: Upon your order, we will start to translate GBZ43768-2024_English as soon as possible, and keep you informed of the progress. The lead time is typically 4 ~ 7 working days. The lengthier the document the longer the lead time. Question 2: Can I share the purchased PDF of GBZ43768-2024_English with my colleagues?Answer: Yes. The purchased PDF of GBZ43768-2024_English will be deemed to be sold to your employer/organization who actually pays for it, including your colleagues and your employer's intranet. Question 3: Does the price include tax/VAT?Answer: Yes. Our tax invoice, downloaded/delivered in 9 seconds, includes all tax/VAT and complies with 100+ countries' tax regulations (tax exempted in 100+ countries) -- See Avoidance of Double Taxation Agreements (DTAs): List of DTAs signed between Singapore and 100+ countries Question 4: Do you accept my currency other than USD?Answer: Yes. If you need your currency to be printed on the invoice, please write an email to [email protected] . In 2 working-hours, we will create a special link for you to pay in any currencies. Otherwise, follow the normal steps: Add to Cart -- Checkout -- Select your currency to pay.    |