Metadata is the information about the data that allows for a better understanding of the data. Among other things, its purpose is to explain to the user how the raw data were transformed into information and from information into knowledge. In other words, metadata provides information throughout the statistical process according to the stages of the GSBPM: specifying needs, building, collecting, processing, analyzing, and disseminating.
Metadata makes data accessible to users. By means of metadata, one can locate, sort, filter, and more easily use data. Data without metadata, as well as metadata without data, are meaningless, but their combination allows for a better understanding of the data and for the ability to explore and analyze them.
There are several types of metadata:
- Descriptive Metadata
- Reference Metadata
- Structured Metadata
- Technical Metadata
Descriptive Metadata: Describes the information and data, usually through the use of titles of tables/charts, column headers in tables/databases, source of the data, subject, keywords, detailed descriptions, notes, etc.
Reference Metadata: Refers to the content and quality level of the statistical data. The CBS has adopted the SIMS standard, which was chosen by international organizations in order to present reference metadata (as it appears on the CBS website), as well as to meet the requirement to transmit metadata to the International Monetary Fund as defined in the metadata components of SDDS PLUS using the SDMX standard.
Structured and Process Metadata: Describes the physical structure of complex information and the process of its production, detailing the processes of collection, processing, and production. It also includes definitions of terms.
Technical Metadata: Metadata that is collected automatically through the data management system, for example: file creation date, creator of the file, directory in the database, etc.
Reference Metadata – SIMS (Single Integrated Metadata Structure)
International Quality Standard
- Contributes to improving the quality of statistical products and to methodological transparency.
- Permits the simple comparison of the same statistical processes across several countries.
- Aims to enhance harmonization and standardization.
SIMS is an integrated standard composed of two standards: ESMS and ESQRS (see explanation on pages 26–28 in the ESS Handbook for QR 2021).
ESMS (Euro-SDMX Metadata Structure): A quality report for users that focuses on meeting users' needs. Through the metadata, the data user can determine whether the information meets their needs and requirements.
ESQRS (ESS Standard for Quality Reports Structure): A quality report for data producers, focusing on improving statistical processes as well as on control and standardization of statistical producers, thereby aiding in enhancing the quality of the statistics.
In other words, quality reports for producers focus on aspects of quality, while user reports focus on satisfying the needs of data users.
SIMS enables the integration and harmonization of these two report structures, such that all of the concepts are included and appear only once. It also aligns with the statistical standards of SDMX (Statistical Data and Metadata Exchange).
The standard includes 19 main sections, which contain dozens of sub-sections:
- Contact
- Metadata update
- Statistical presentation
- Unit of measure
- Reference period
- Institutional mandate
- Confidentiality
- Release policy
- Frequency of dissemination
- Accessibility and clarity
- Quality management
- Relevance
- Accuracy and reliability
- Timeliness and punctuality
- Coherence and comparability
- Cost and burden
- Data revision
- Statistical processing
- Comment
Defining the SDMX Standard
The SDMX standard was developed to allow statistical organizations and international bodies to share data and metadata. SDMX is recognized as an international standard by the ISO (ISO-17369:2013). Dozens of central banks and statistical bureaus worldwide have adopted the standard and work according to it.
The standard regulates the way data is transmitted (machine-to-machine, machine-to-human, and human-to-machine), thus improving the quality of data transmission through standardization, automation, validation, and data sharing (creating a common language). Additionally, the standard allows for the characterization of each data series through the use of uniform metadata.
Data description standards enable a more precise understanding of the data; better retrieval, filtering, and sorting; integration and linkage of data from separate systems; easier data investigation; and international comparisons.
At CBS, the SDMX standard is mainly used for transmitting information to the IMF. The information includes the following series: national accounts, producer price index, and consumer price index (further examples).
The SDMX website provides explanations of and guides to the standard, as well as a range of tools for its implementation.