Editing and Coding Module






Olivia Blum
Census Planning & Evaluation Division

Tel. 972-2-655 3303
FAX: 972-2- 655 3531
e-mail: blum@cbs.gov.il




Defining a structured raw data file as the objective of the data capture process
changes, somewhat, the definition of editing tasks. We do not wish any more to
receive a file in which the respondent's incorrect answers and the enumerator's
errors in filling out the questionnaire have been corrected. Rather, we want a file
which reflects the respondent's answers as they appeared on the questionnaire,
using only functional correction of the process fields which the enumerator fill in.

This approach defines four main editing tasks (see also paper 3.1)
1. correcting and completing the keying stage, in cases where a final value
was not determined, and in those cases where keying operators were unable
to read what was written;
2. approval or correction of data entry in fields which were found to have
logical contradictions;
3. coding open categories in closed questions ("other").
4. defining the structural units.

The first three tasks are designed to respond to the need to capture the answers
of the respondent with maximum accuracy. The last task means receiving a file in
structural units as they were meant to be received in the field. The structural units
are: Enumeration Area (EA) (which also serves as the work unit in the data entry
process), household record, individual record, a building identified according to a
coded address, a page from the questionnaire, and a complete questionnaire (all
pages). Most of these units are defined by values which the enumerator places in
the appropriate fields. Proper values in these fields enables automatic definition
of the structural units. For example, in order to link two parts of a questionnaire
to form an extended individual record, the enumerator must fill in three common
fields: year of birth, column number, and gender. If automatic definition of the
extended individual record is unsuccessful, this is a result of the enumerator's
error in filling out these fields. The functional correction of the enumerator's
fields in this case is assigning the right column number.

Editing tasks are assigned to editors in cases of a failure in an edit check. An
editing item is the working unit that includes all the problems (failed edit checks) that
have been found in one household.

The different nature of the editing tasks relating to the field value, in contrast with
those relating to definition of the units, enabled the differentiation of the tasks and
the specialization of the two types of editors: senior editors, who receive problems
involving more than one questionnaire (usually problems in defining the household);
and regular editors, who receive problems within the household.
A questionnaire in the data entry system is defined according to the questionnaire
number printed on it. The problem of defining the household during the editing stage
usually involves pages which bear different questionnaire numbers, and therefore,
the problem is given to a senior editor for handling.

The process of creating items for handling
during the editing and coding stage

The data entry system has no homogeneous data entry stages. Processes that are
identified as characteristic to one stage are integrated in other stages as well.
This means that editing tasks and other procedures which serve the editing stage
are performed from the beginning of the data entry process until the end; in
scanning' keying, editing stages and in between.

During the scanning stage, the system opens three main records: an EA, household
and individual.
1. EA records are opened manually, when the scanning operator feeds a
computerized form containing details found on the leading form of the EA
which comes with the paper questionnaires.
2. A household record is opened when the optical reader identifies the
existence of the first page of the questionnaire and that at least one field
has been filled in, except for the field with the EA number, which is usually
printed on a label.
3. An individual record is opened when the optical reader identifies one of the
three following conditions:
a value in one of the two identification number fields;
a value in the first name field;
values in at least three other fields.
4. A long form individual record (with socio-economic questions) is opened when
the optical reader identifies one of the following three conditions:
first name and a value in an additional field;
year of birth (in fields which are filled in by the enumerator) and a value
in an additional field;
column number and gender (in fields which are filled in by the enumerator)
and a value from an additional field.
At this stage, the working units are also defined: both sides of the page, and all
pages of the questionnaire.

During the keying stage, in addition to keying, linkage between the individual records
and the National Population Register is performed; following this stage, the system
contains:
1. EA records, household records (in the physical sense, although not yet in
the logical sense), individual records and records of questionnaire pages
for which there was no first page.
2. For each identification number there is a status which indicates that it has
or has not been found in the Register; each individual record has a status
showing if it has been linked with a Register record. In each census record
that was linked to this external file, an identification number, first name and
last name from the Register are added;
3. Every X and numeric field has an ASCII value within the database;
4. All alphabetic fields that are planned to undergo automatic coding have
keyed values.

When preparing for the editing and coding stage, the following actions are
automatically performed:
1. location of the identification number and linkage with the Population Register.
2. automatic coding of alphabetic fields (country of birth, relation to reference
person in the household).
3. identifying the existence of text in an alphabetic field which is not slated
for automatic coding (education, apartment ownership etc.).
4. automatic definition of the structural units (EA, household, individual record).
5. Every X or numerical field is checked to see if the result of the keying
stage is a value that has been agreed to by two identification sources.
6. Each field in the questionnaire has had at least two logical edit checks
between fields to check for inconsistencies.

Editing items
All failures that have been identified in one household create the working unit
known as an "editing item".
When the problems found in one household may create editing items to both types of
editors, there is a priority to the senior editor. His editing process may result in
joining pages of different questionnaires to one household and than more fields are
to be involved in the edit checks and new logical inconsistencies within that
household can be found. Therefore the work procedure is as follows:
1. Senior editing items have top priority; if definition of a household has
failed during the automatic procedure, a senior editor first receives the
problem of the definition, whether or not that household has additional
problems within it.
2. A household that has reached a senior editor for handling will not receive
further handling from someone who is not a senior editor, even if the
editing tasks that have been created have been defined as the responsibility
of a regular editor. The working assumption is that when handling problems
of defining a household, the senior editor develops a kind of specific
expertise on that particular household and his handling the problems does
not need further "acquaintance" with the household attributes.

Editing items and new problems within editing items are not only created during the
preparation for editing stage, but also during the editing stage itself. Edit checks
are performed throughout the editing process, both at the PC stations of each
editor and at the server:
In the PC station we check if the values in the fields that have been sent to
an editor for handling are within a legitimate range, and check the
completeness of the handling (did the editor handle all fields that were
involved in all problems of the editing item). Failure which is found by the
PC station returns the item for handling by the same editor (regular or
senior).
Failure which is found by the server goes for a second round of handling
by a regular editor and up to three rounds of handling by a senior editor.
A senior editor receives items for another round of editing since the first
two rounds are different suggestions to solve a problem of defining a
structural unit, while the third is reserved for problems within the
household.
Failure which is found by the server following the second round of
handling by a regular editor is sent to a senior editor.

Special Coding items
Coding problems of open categories in closed questions are referred to the
editors, however, coding items are created for coding open questions. The
stipulation for creating them is the existence of text in alphabetic fields or a logical
failure involving one of these fields. The stipulations for creating special coding
items are:
1. text or logical failure in one of the three address fields (home address,
address from five years ago, and address of place of work).
2. text in one of the fields which characterize the economic branch where the
respondent work in.
3. text in one of the fields which characterize the respondent's occupation.
Coding items are differentially sent to stations for geocoding, coding for economic
branch and coding for occupation, respectively.


Editing and Coding Work environment and
activities


The work environment at the editing and coding stations creates a user-friendly
interface which enables:
viewing the image of an entire page of the questionnaire, images of all the
pages from the household being handled and images of any questionnaire in
the EA;
simultaneous display of everything written on the questionnaire by the
enumerator and the respondents, as well as the ASCII values of those
fields, as they are found in the database at that time;
field corrections, confirmations or completions for households arriving for
handling;
accessibility to the external file (the National Population Register file,
process-information tables and coding tables);
separation and joining of questionnaire pages;
granting a status of "canceled" to individual records or household records.
transfer, usually one-way, of households to virtual EAs for solving editing
and coding problems by experts, or to await handling at a later time by the
same editor, or to be allocated later to the appropriate EA;


Editing actions of the optical data entry system are not homogeneous in terms of
resources and difficulty of their performance. These relatively quick and simple
actions are actions whose objective is to correct or confirm values in the
questionnaire fields. Even coding open "other" categories is not complicated since
the number of entries in the coding dictionary for these variables is relatively small
and finite.

The complicated actions are those relating to the definition of the structural units.

The basic working unit, both in the field and in the data entry process, is the EA.
The single-value variable which identifies the EA is the EA number. An editor who
receives a questionnaire from a household with an EA number that is different
from the others being handled in the system at the same time, uses the address
listed on the first page of the questionnaire to verify that this is, indeed, a
questionnaire from a different EA, and transfers it to the virtual box. At a later
time, the questionnaire is allocated to its appropriate enumeration area.

The basic analysis unit is the individual unit. The definition of an individual unit is
expressed by linking two census records belonging to that individual (the short
demographic section and the long socio-economic section, which is filled in by 20%
of those respondents over age 15), and linking the individual's census record to the
record of the same individual in the National Population Register.
The individual's two census sections are linked through three linking fields which
the enumerator fills in. Errors in these fields create an editing problem and the
editor must correct the column number in order to link both parts.
Linking individual census records to the Register is performed throughout the data
entry process, from the preparation for keying stage through the editing actions.
The linkage itself is divided into two components: locating the census identification
number in the Register, and verifying the identification of the individual using rigid
criteria, based on demographic variables. Several manipulations are performed on
the identification number, in order to completely match the census number with the
Register number (see also paper 3.5).
Non-identification or non-linkage creates a problem which is sent to an editor. The
editor can perform flexible queries to the Register using a flexible variable sample
or by flexible definition of values. This is done by entering free strings which
represent a character or several characters, for the values in the query. For
example, if the third number in the year of birth is not clear, the query can be of
a profile including a variable that looks like that: 19%7. The system opens a window
with suggestions that are relevant to the query. In this case, all the records of
people with identical values as defined in the profile that were born in 1907, 1917,
1927, 1937... would appear in the window.
At the end of this process, about 95% of the individual records are linked to the
Register, with an additional 2% not found in the Register at all (tourists, foreigners).
In other words, the scope of unsuccessful linkage is no more than 3%.
Another editing task is logical (not physical) cancellation of the record, due to a
duplicate record for an individual in the same EA or due to deletion in the field by
a large X on the individual column in the questionnaire. The individual record was
opened inspite of the X because the X went through fields that define the opening
of an individual record.

The basic census unit is the household. This is also the working unit at the editing
and coding stations. Defining the household is the most complex among the editing
tasks and the need for it arises when the enumerator does not follow instructions.
These problems are basically system related problems, since the optical scanning
required a qlong form questionnaire with separate pages.
All tasks of defining a household are handled by a senior editor as an editing item.
The problems associated with household definition are:
1. a suggestion to join multiple questionnaires to a single household in cases
where the enumerator filled out two questionnaires for the same household
but did not give them clear cut signs enabling them to be automatically
joined together;
2. a request to verify joining three questionnaires to a single household
since this phenomenon is rare, the editor is requested to verify a correct
automatic definition of three questionnaires household;
3. a suggestion to join questionnaire pages with no first page to other pages
which do include a first page in cases where the enumerator used pages
from different questionnaires for enumerating a single household;
4. other problems whose solutions require separation of the pages of one
questionnaire and joining them to other households in cases where the
enumerator used pages bearing the same questionnaire number to
enumerate different households;
5. fictitious duplication of a household in cases where the enumerator gave
two households the same single-value identification number. The editor must
change these identical variables.
6. Logical cancellation of household records, whether this is due to
duplication, to opening a household record which was already erased in the
field or an outcome of canceling duplicate individual records (the
household no longer have any residents).

While the editing is being performed, special coding actions are being carried out.
Geographic coding is, in essence, the completion of automatic coding.
The most difficult and most complex coding involves the economic branch and
occupation. Coding takes place via a process which simulates manual coding; a
query is sent on the text containing key words selected by the coder, or with a
numeric value, if the coder remember the expected code. Whether the query was an
alphabetic query or a numeric query, the coder must choose the suitable option from
the suggestion window he gets, and must not rely on his memory.
All of the dictionaries in the ODE system are dictionaries that are updated during
the data capture process, only if approved by the experts. In other words, coding
improves over time due to the assimilation of the learning process in the dictionaries
themselves.
Coding control is also performed in the conventional manner; the computerization
speeds the process and simplifies it, but does not change it in principle. Coding
control is carried out on stand-alone PCs, and not within the ODE. The texts are
coded independently a second time and each case of a mismatch between the first
code and the second code is sent to an expert for decision. A learning process is
also assimilated within this process, so that at the beginning of the process, each
coding item was sent for coding a second time, but as knowledge was accumulated
regarding problematic codes, it was possible to go to the sample and more
effectively utilize the resources offered by man and machine.


Summary

The editing, in its unique definition regarding the optical data entry system, serves
the objective of the entire process: production of a structured file containing raw
census data. The technological environment and the logical instructions were
planned for their functionality, efficiency, and effectiveness in reaching this
objective.
Outgoing quality control shows a rate of error of only half a percent in entering
the values recorded on the questionnaire, and a similar rate in defining the
structural units.

Undoubtedly, the reduction of the subjective components in the data entry process,
maintenance of rigid criteria for the performance of automatic and manual
procedures, and the integration of quality control at each of the system's
components, all contributed a great deal in the achievement of this goal.



Copyright © 1997-1999 The State of Israel. All rights reserved.
See "Terms of Use" for the conditions under which this service may be used.