Principles |
defining objectives and means. |
questionnaires are: |
| ||||||||||
by means of a process, which is not only economically efficient, but also which is rational. The central consideration guiding these objectives is optimizing the division of activity - concerning correcting mistakes in values found in questionnaires, filling in missing values, and coding - between the data capture stage and data processing in the central computer. | ||||||||||
| ||||||||||
capture process for the 1995 Israeli Census of Population and Housing, was the Windows environment and the improved Optical Character Recognition (OCR) technology. |
sources, external and internal to the system, and of different types, ASCII files and files of scanned images. Windows technology along with the powerful computers enabled easier and more convenient accessibility to external files such as the National Population Register (NPR), data tables such as the various coding dictionaries and process-information tables. |
technology as a useful method for the data capture process. |
effectively brought together in the optical data capture system described in this paper. |
Capture Process | ||||||||||||
questionnaires in pre-defined units, correcting values, and coding the texts on the questionnaire pages and keying them into the computer. These tasks are, essentially, no different with the optical system, but the technological improvements enables a number of essential changes in the process: | ||||||||||||
| ||||||||||||
files (first and final). The first file is the product of the data capture process, and in the improved technological environment becomes a raw data file per se. |
marriage preceding date of birth) or not unique (both Romania and Hungary marked as country of birth). |
(Enumeration Area - EA), as they were received from the field (individual and household records), as they were originally planned, if they were confirmed in the field (residential building identified by address), and those defined as physical units for data capture (separate questionnaire sheets and all pages of a questionnaire). |
reconstruct editing actions (correction / completion / imputation), enables us to create a high quality census file (end-file). |
beneficiary to the data capture process in several aspects: |
| ||||||||||||||||
editing" to be carried out throughout the data capture process. There is no correction of errors which may be trivial; all of the respondents' answers, with all the logical and factual errors they contain, are captured as they are. Editing tasks become to mainly verify precise data capture. This alternative notion of editing within the data capture process, shift traditional editing tasks to the central computer, to be carried out in macro operations and therefore the ability to recreate the raw file and the interim files is improved significantly. Sweeping and uniform actions can be more readily canceled and recreated as compared with micro editing, because in spite of the general guidelines, there is also individual, subjective judgment involved in handling each and every record. | ||||||||||||||||
undergone different editing processes. This attribute enables the evaluation of editing processes using comparative methods, and designated editing of the raw file, according to various needs, both internal and external. It contributes not only to the current census file, but also as an input to the decision-making process regarding the manner in which large data files should be edited in the future. | ||||||||||||||||
operations designed to verify: | ||||||||
| ||||||||
editing activities which define the process units and the census analysis units. Automation of the definition of the structural units is based on prior planning of all the variables on the questionnaire which will enable automatic definitions, and on linking records with auxiliary files which serve as an external backup source. | ||||||||
incorrect printing on questionnaires, and when there is an identification failure by the optical reader. Most problems of identification of structural units are system related problems, in actions that come to substitute the manual handling of the questionnaires. | ||||||||
work and staff control. Automated work control means verifying that the result of any action carried out during the data capture process is corroborated by at least two sources, and that it does not logically contradict other results (variable values or structural units). Automated staff control means continuous production of statistical reports which enables both, managers and employees, to see that the problems they were working on have been solved and that new problems have not arisen as a result of their handling. These statistical reports are based on attaching identified work packets, such as enumeration area, to an identified person who has a specified function in the data capture process. | ||||
| ||||
capture system with a modular structure, but which has no homogeneous stages task wise. This is a system whose main process includes operations which are similar in essence, but which are carried out in a different order than before, while the sub-processes involve an internal and external support system, both within and between the stages. | ||||
of the EA are scanned and transferred from one stage to the next together. A file is sent to the central computer and to the archive in EA's units. |
is comprised of characters from the whole EA, editing item as well as coding item are comprised of problems detected in one household. |
logically contradicting values in many fields, several editing problems are detected. However, since all problems are in the same household record, only one editing item is created. |
questionnaires and identifying the values written in their field, smart keying from images, micro-editing and coding and preparing (and sending) files to the central computer and to an optical archive. |
end of the process, after the final census file was produced. The shift to an optical reading system and creating a retrievable optical archive enable us to advance this operation to the beginning of the data capture process, immediately after scanning. Having questionnaires images instead of paper questionnaires makes the retrieving environment more user-friendly and the storage space needed for the questionnaires is shrunk (from a huge warehouse to about 80 CD-ROMs). |
from the paper questionnaires. Keying precedes editing and coding, meaning that we first make sure that the data capture (optical recognition and keying) is accurate and only then we start editing. |
linkage with external files (the NPR in this case), that was done only in the central computer in the preparation of the final census file, is integrated into the data capture process. |
different type of data capture file. |
expressed in the mutual dependency of adjacent stages. In every single stage tasks of other stages are performed. For example: | ||||||||
| ||||||||
administrative forms and files (Enumeration Area leading form and the organizational file of all Enumeration Areas in the census). The data capture process refers to this support system from its very beginning and throughout its procession. | ||||||||
Capture Stage |
| ||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||
|
after the keying stage: | ||||||||||||||||||||||||||||||||||||||||||
unit in the ODE system). An editing item includes all editing problems found in one household. | ||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||||
question ("other") but rather in subjects asked in open questions and answered in alphabetic characters. That is geocoding of addresses, economic coding of economic branch and occupation. | ||||||
received with a numeric code from the field, address 5 years ago, and the address of place of employment, all of which undergo keying and automatic coding. | ||||||
the receiving of only a partial code. A logical failure in coding signifies that the lowest address unit is not included in the next highest unit, i.e. the code for the street that was entered does not exist in this specific certain locality, or that the particular street in the locality has no house number such as the one which was entered. Each of these failures creates a coding item which is sent to the geocoder. | ||||||
when the optical character reader identifies written text in the relevant question fields. The OCR was not developed to recognize Hebrew and Arabic characters, but the existence of handwritten text is a trigger for creating an economic coding item. | ||||||
sees the image of the questionnaire page on which the text is written, defines a query to the coding table using text or numeric code, and selects the appropriate option. The economic sector and occupation fields are not keyed at all, and coding is performed directly from the questionnaire images. |
file of Israel, which contained names, codes, and addresses of places of employment. This file helps to code partial or unclear information given by the respondents. | ||||||||||||||||||
does depend on its technology. Coding items are retrieved from the optical archive, sampled and sent for second coding in stand alone PCs. An expert receive a coding item in cases where the first code given by the ODE coder does not match the second code. The expert's code replaces the code generated by the ODE only when it is different from the first code. | ||||||||||||||||||
|
archive. This file includes all the data from the questionnaires, administrative data, all statistics reports produced and used in the process and all audit trails of each field. This file can be retrieved and already serves us for evaluation purposes. |
|
information accumulated in the ODE. It includes the complete census information and a "tail" for each field, which enables | ||||||
| ||||||
final census file. | ||||||
|
created a functional system for Israel's 1995 census of population and housing. However, the optical data capture system is a modular system, which enables use of some of the modules or use of existing modules in combination with alternative modules that can be developed in accordance with specific needs. This feature allowed for differential intervention in the modules during the course of development, where limited resources and minimum requirements have led to developmental priorities. |
reduction of micro-editing tasks while taking maximum advantage of the technological improvements, have contributed to the building of a swift and high-quality census data capture system. |
Copyright © 1997-1999 The State of Israel. All rights reserved.
See "Terms of Use" for the conditions
under which this service may be used.
![]()
![]() |
![]() |
![]() |
![]() |