|
questionnaires that have been scanned and the ASCII values of the fields that have been optically identified. Theoretically, if the identification level of the optical reader was similar to the human eye's, we could have waived the keying stage. But the identification is incomplete for several reasons: | ||||||||
| ||||||||
data entry system is about 30%, so that the keying stage, used as a support for the optical identification was not optional at all but rather, a necessity. | ||||||||
| ||||||||||||||||||
short form serves as an external supporting file for the ODE system. Its introduction immediately following the scanning stage enables verification of the field values that have been assigned by the OCR and automatic determination of a field value, in a batch procedure, without any human involvement. This saves keying of about 60% (!) of the characters listed on the census questionnaires. | ||||||||||||||||||
suggested by the OCR, and since not all the variables on the census questionnaires are found in the National Population Register, a supplementary action is required. This action is keying from an image, performed in two rounds, as presented in table below: | ||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||
| ||||||||||||||||||||||||||||||||||||||||
file record |
throughout the entire data capture process. During the keying stage, this saves human intervention in entering field values and enables definition of the individual census records; during the editing stage, supplementary actions are carried out to link the records to the Register. |
census-related processes are aided by administrative records that can also be used as a partial or full alternative to the census. Therefore, as in cases where the entire census was conducted from administrative records, the ability to link the records is very important, as well as the way it is done. |
identifying variable: the identification number. But since linkage according to identification number is not sufficiently reliable, the linkage is divided into two parts: |
is a zero, and in all cases, the last digit is a control digit. In the process of finding the identification number, we use these characteristics, so that manipulations on the number increase as the handling of the field makes the value captured more reliable. | ||||||||
| ||||||||
is performed on the basis of criteria which remain permanent throughout the three attempts. Each criterion is a profile composed of four variables that are found on the short form and in the Register. Each criterion contains the identification number plus another three variables from among the following: | ||||||||||||
linkage purposes are those dealing with relation to the reference person and parents' country of birth. | ||||||||||||
automatically linked to the National Population Register. The remaining 20% are composed of records that were identified, but whose characteristics did not enable automatic linkage (single, born in Israel, recorded only the year of his birth), records that are not listed in the Register (tourists and foreigners who have been in Israel for over a year), records which have no identification number (dwellings of those who refused to participate and closed dwellings), and records where the identification number was garbled. During the editing stage, where queries to the National Population Register are interactive and can include names, about another 15% are linked. | ||||||||||||
|
assigned to them by the OCR is not supported by external file. | ||||||||||
status of the level of identification by the optical reader: | ||||||||||
| ||||||||||
| ||||||||||
the three previous identification sources (OCR, Register, first keying round). This occurrence is relatively rare (less than 2%), and characterizes fields from questionnaires where the recording on them is particularly weak or which were not scanned sensitively enough. A second round of keying is also reserved for fields which, based on an examination of their values, fall outside the legitimate range (for example, year of birth 1790, etc.). |
and it is performed at two levels of keying: correction and full-field keying. For this round, too, the basic unit is the character, so that all comparative tests between identification sources is at the character level. This characteristic contributes to the reduction in the rate of keying, since only those characters that are not agreed upon are sent for keying, rather than whole fields. |
completed during the editing stage, in spite of the controls imbedded in the procedure and despite the corrections: fields that could not be positively identified within partial images or characters and fields which, although they have been assigned 3-4 identification values, still do not have two sources that have assigned the same value. However, the rate of such cases is minimal and can be solved within the system through an additional round of keying, keying from the image of the full questionnaire page or at least from a portion of the entire question (not just the box that was filled in). |
homogeneous. The central component which aids this is the Register file: |
| ||||||||||||
capture process: getting a structured raw file. Creating a structured file is completed in the editing stage. | ||||||||||||
Copyright © 1997-1999 The State of Israel. All rights reserved.
See "Terms of Use" for the conditions
under which this service may be used.
![]()
![]() |
![]() |
![]() |
![]() |