questionnaires that have been scanned and the ASCII values of the fields that have
been optically identified. Theoretically, if the identification level of the optical
reader was similar to the human eye's, we could have waived the keying stage. But
the identification is incomplete for several reasons:
data entry system is about 30%, so that the keying stage, used as a support for the
optical identification was not optional at all but rather, a necessity.
short form serves as an external supporting file for the ODE system. Its
introduction immediately following the scanning stage enables verification of the field
values that have been assigned by the OCR and automatic determination of a field
value, in a batch procedure, without any human involvement. This saves keying of
about 60% (!) of the characters listed on the census questionnaires.
suggested by the OCR, and since not all the variables on the census questionnaires
are found in the National Population Register, a supplementary action is required.
This action is keying from an image, performed in two rounds, as presented in table
throughout the entire data capture process. During the keying stage, this saves
human intervention in entering field values and enables definition of the individual
census records; during the editing stage, supplementary actions are carried out to
link the records to the Register.
census-related processes are aided by administrative records that can also be used
as a partial or full alternative to the census. Therefore, as in cases where the
entire census was conducted from administrative records, the ability to link the
records is very important, as well as the way it is done.
identifying variable: the identification number. But since linkage according to
identification number is not sufficiently reliable, the linkage is divided into two parts:
is a zero, and in all cases, the last digit is a control digit. In the process of
finding the identification number, we use these characteristics, so that manipulations
on the number increase as the handling of the field makes the value captured more
is performed on the basis of criteria which remain permanent throughout the three
attempts. Each criterion is a profile composed of four variables that are found on
the short form and in the Register. Each criterion contains the identification
number plus another three variables from among the following:
linkage purposes are those dealing with relation to the reference person and
parents' country of birth.
automatically linked to the National Population Register. The remaining 20% are
composed of records that were identified, but whose characteristics did not enable
automatic linkage (single, born in Israel, recorded only the year of his birth),
records that are not listed in the Register (tourists and foreigners who have been
in Israel for over a year), records which have no identification number (dwellings
of those who refused to participate and closed dwellings), and records where the
identification number was garbled. During the editing stage, where queries to the
National Population Register are interactive and can include names, about another
15% are linked.
assigned to them by the OCR is not supported by external file.
status of the level of identification by the optical reader:
the three previous identification sources (OCR, Register, first keying round). This
occurrence is relatively rare (less than 2%), and characterizes fields from
questionnaires where the recording on them is particularly weak or which were not
scanned sensitively enough. A second round of keying is also reserved for fields
which, based on an examination of their values, fall outside the legitimate range (for
example, year of birth 1790, etc.).
and it is performed at two levels of keying: correction and full-field keying. For
this round, too, the basic unit is the character, so that all comparative tests between
identification sources is at the character level. This characteristic contributes to
the reduction in the rate of keying, since only those characters that are not agreed
upon are sent for keying, rather than whole fields.
completed during the editing stage, in spite of the controls imbedded in the
procedure and despite the corrections: fields that could not be positively identified
within partial images or characters and fields which, although they have been
assigned 3-4 identification values, still do not have two sources that have assigned
the same value. However, the rate of such cases is minimal and can be solved
within the system through an additional round of keying, keying from the image of
the full questionnaire page or at least from a portion of the entire question (not just
the box that was filled in).
homogeneous. The central component which aids this is the Register file:
capture process: getting a structured raw file. Creating a structured file is
completed in the editing stage.
Copyright © 1997-1999 The State of Israel. All rights reserved.