Development of the method was based on several new technical abilities in the field
of image processing, improved computer capabilities, and the ability of databases to
perform production line management tasks in addition to their traditional role of file
management. To reduce the risks of development, we utilized a development method
known as "evolutionary prototype." The Bureau managed the project while four
implementation centers were operated by companies who could carry out the
assignment according to the Request For Proposal (RFP). The task of development
of applications was given to a single supplier, Israel IBM Science & Technology
(IIS&T), which was able to perform all the programming tasks independently. The
tasks of supplying computer equipment, location of operations and hiring of
personnel were given to companies who were awarded the jobs through a public
tender.
|
|
Important milestones of the system:
|
|
In 1992, the Bureau conducted an experiment which was partially based on the system used in Switzerland, to check if it could be used to carry out data capture
from the Israeli census questionnaires. The controlled experiment took place over
a period of one-half year, and the key results were:
|
| |
|
| |
1.Special attention must be given to the construction of the questionnaire and primarily: the actual paper, background colors, the black frames and
the way in which the questionnaire is to be filled out.
|
| |
2.The scanning was the bottleneck of the system. It determined the speed of the entire process. Also, massive loss of data could happen at this
step.
|
| |
3.Worker's activities, performed while sitting in front of nothing but a computer screen for hours on end, without break, create a hostile
environment and leads to a great many errors entering the system.
|
| |
4.Opportunities for designing an advanced work environment are possible and we needed to develop several unique tools:
|
 |
for keying, we developed three out of six methods that had been tested;
|
 |
for editing, we developed Windows systems linked to a specific question on the questionnaire;
|
 |
for coding, files system for quick queries.
|
| |
|
|
In 1993, we began developing the system of data entry from the Census of Population and Housing questionnaires, whose principle requirements were:
|
| |
|
| |
1.not to perform any manual actions on the questionnaires once they left the responding household;
|
| |
2.a scanning system which would not restrict the operator to the specific page order, or to counting the pages of the questionnaires;
|
| |
3.The keying would be performed using advanced methods;
|
| |
4.Editing and coding would be performed on the basis of a (screen) image consisting of five layers:
|
 |
filling in,
|
 |
the questionnaire,
|
 |
the combination of the two,
|
 |
the combination of the two together with the ASCII values,
|
 |
linking windows (secondary lists, dialog boxes);
|
| |
5.The census equipment would be determined through open tender, and therefore the codes would be written at an international standard;
|
| |
6.The cost of the project could not exceed the budgetary framework earmarked for the previous census (1983 Census).
|
| |
|
|
In 1994, we performed a dress rehearsal with the system, and several issues required special attention:
|
| |
|
| |
1.the performance issue (system production): it became clear that by building separate modules that were linked together, we did not
successfully reach the required pace demand (we reached about 25% of
what was planned).
|
| |
2.the issue of accuracy: all points were handled surprisingly well, and our achievements were impressive.
|
| |
3.We implemented a drastic change to the system which included, inter alia:
|
 |
replacing the hardware with computers that were four times more powerful than planned;
|
 |
changing the architecture of the system and giving the PCs additional tasks at the expense of the network server (in the area of logical
checks and preventing editing errors);
|
 |
creating a line management system which would enable us to perform three important tasks: performance of tasks divided into day / night, setting
proper priorities for completion of processes, and guarding against
overloading the system to prevent a collapse.
|
 |
improving the efficiency of the workers vis a vis the computers by creating simple and convenient operating tools.
|
| |
|
| |
In 1995, we handled three large-scale projects:
|
| |
1.Adjusting the system to the final questionnaire format and switching the system over to use Compaq and Data General computer hardware;
|
| |
2.setting up an ideal operations site, at a location that was carefully selected and where special attention was paid to the operators' working
conditions;
|
| |
3.recruitment, selection, training and placement of about 150 people for system assignments:
|
 |
system managers, editors, key operators, scanning operators,
|
 |
computer operators (system administrators and system operators),
|
 |
managers and administrators.
|
| |
|
| |
|
| |
In 1996, while the system was in operation, the main issues are:
|
| |
1.the speed of the work: it is possible to achieve greater speed than was planned.
|
| |
2.accuracy: we achieved better than with any other method, but there are still technical opportunities for improving accuracy.
|
| |
3.cost: identical to what was planned (about $1 per respondent). Today it is clear that the system cost can be reduced even further.
|
| |
|
|
Structure of the System
|
|
|
The ODE system is composed of three technological components developed in partnership with IIS&T (Israel IBM Science and Technology):
|
| |
|
| |
1.an image processing system: Optical Mark Recognition (OMR), Optical Character Recognition (OCR), Form Drop-Out (FDO), file compression, cut
and paste and smart-key for operator effectiveness.
|
| |
2.client-server ability and database management.
|
| |
3.queues management and organization system that is highly capable of controlling a production line.
|
| |
|
|
The ODE application consists of six sub-systems, each of which stands on its own,
with a dedicated data flow system (including rate, scale and required accuracy of
work). An additional system handles data transfer from one stage to the next. The
six sub-systems are:
|
|
|
1. Scanning sub-system, through which the paper questionnaire turns into an
image. There are several processes in this sub-system:
|
| |
|
| |
1.scanning management system;
|
| |
2.image processing system and identification of the form's unique number;
|
| |
3.registration system of the questionnaire (24 different pages);
|
| |
4.questionnaire adjusting and straightening system (stretching and contracting) in order to implement FDO, OMR and OCR;
|
| |
5.insert data to the database.
|
|
|
2. Smart-keying sub-system, through which an operator improves the machine's
results:
|
| |
|
| |
1.verification of all the characters' information with their OCR identification status, to keying in smart-key tools;
|
| |
2.edit checks and record linkage to the National Population Register for values verification;
|
| |
3.Comparison of values in order to determine type of further handling (keying regimes 2 and 3 or referral directly to the editing stage).
|
| |
|
|
3. Editing and coding sub-system, in which the working unit is an item based on:
|
logical checks at several levels: the field, the question, the individual record, the
household record or the Enumeration Area (EA). The main activity of the editor /
coder concerns correcting data capture problems that remain following the keying
stage, fields that were not coded, linking each record with the National Population
Register, checking the data capture in fields that were not within range or were
contradictory to data in another field. The work method is based on:
|
| |
|
| |
1.images of all pages of the questionnaires belonging to the same household and the ASCII values of their fields;
|
| |
2.a secondary window in which flipping through the pages of the same household, or the pages of another household in the EA, is possible;
|
| |
3.dialog box and list box of the tables existing in the database;
|
| |
4.tools given to the editors and coders that enable them to get information on the EA, the household or the questionnaire they are handling. They
can also get various display possibilities (on the screen) of editing and
coding problems. Throughout the editing and coding steps, they could
utilize external auxiliary files.
|
| |
|
|
4. Make ASCII file sub-system to be sent to the main frame (ICBS central
computer):
|
Three information systems are created for this stage: the scanning images; statistical information which was created during the process of transferring the
information and data to the central computer, which included, among others:
|
| |
|
| |
1.extraction of the values of the fields from the database table, including "flags" (the status) of each and every field;
|
| |
2.construction of a hierarchical file at three levels: EA, household, individual;
|
| |
3.detailed statistical data for each and every EA, including administrative data.
|
| |
|
|
5. Archive sub-system, in which all the information at the EA level is saved in
the following formats:
|
| |
|
| |
1.WORM: Write One Read Many (for image ASCII);
|
| |
2.DAT cassette of the file that is sent to the central computer (ASCII only);
|
| |
3.CD/R disk (re-writable) containing the information (image ASCII) arranged in a way that facilitates quick retrieval by pre-determined
keys.
|
| |
|
|
The archive sub-system is built in such a way that it enables quick return to a final
status of certain activity, where three retrieval keys have been defined: according
to the individual identification number, according to EA number and the dwelling
number in the EA, and according to questionnaire number. The existing system
enables transfer of all questionnaires for paper recycling, because all the
information is stored on only a few dozen CDs.
|
|
6. Command and line-control sub-system. We can define three control components which operated automatically in the system:
|
| |
|
| |
1.inspections of hardware, basic software and communications software, testing for the number of files created at the start and at the end of the
process, and examining the computers' work load. The data received
enabled detailed planning of the system's daily and weekly activities.
|
| |
2.creation of a statistical mechanism of collected data during the processing, thereby enabling analysis of the massive amount of data
created by the system (numbers of problems, general and relative work
times, automatic vs. manual record linkage, quality of keying and level of
accuracy of the OCR system).
|
| |
3.The system manager received detailed information on all the activities within the system, which included: information on the status of the flow
of questionnaires of the EAs handled at that time, status of work at the
PCs, and production information with which an activity is terminated. This
information enabled the manager to identify the timing at which an EA
ended the process and could have been transferred to the Bureau's
central computer.
|
| |
|
|
Lessons and conclusions
|
|
|
The technological lessons that can be drawn from the ODE system are in three spheres:
|
| |
|
| |
1.in the scanning process: improvement in scanning quality and expanding OCR capability.
|
| |
2.the PC station: reduction of errors and technical problems and widening the scale of tasks for implementation at the station.
|
| |
3.In the sphere of control, integration of all control tasks (staff control, process control and product control) is called for.
|
| |
|
| |
Organization and management lessons:
|
| |
1.more efficient preparation of work vis א vis the companies and service suppliers (implementation of trials and preliminary tests).
|
| |
2.construction of a non-designated system that will permit rapid and cheap conversion to data capture of other surveys.
|
| |
|
|
The process of transferring information from paper to ASCII code, as was done with the Israeli census, is the first stage towards development of a system which
includes additional components that should be included:
|
| |
|
| |
1.staff control (in addition to what already exists for keying).
|
| |
2.process control (mainly for editing and coding).
|
| |
3.completeness of data.
|
| |
4.including CAPI and CATI at the beginning of the process, in addition to the paper questionnaires.
|
| |
5.automatic or semi-automatic coding.
|
| |
6.improving accuracy.
|
| |
7.improving speed.
|
|
|
The overall cost of $1 per respondent is a cost which enables us to perform the appropriate development, high-quality operation, and achieving of three important
results:
|
| |
|
| |
1.accurate and reliable information.
|
| |
2.in-house personnel who are professional and highly motivated to perform additional tasks.
|
| |
3.valuable, advanced computer equipment which will improve the general performance of the CBS.
|
| |
|
|
In conclusion, I propose that a discussion be held on the subject of the information
flow process (scheduler). It is necessary to decide on the desired system
characteristics in light of the following four variables:
|
| |
|
| |
1.the level of identification of the OCR system at five levels - 100% with 0% identification errors, 80% with 5% identification errors, 60% with 20%
identification errors, 40% with 50% identification errors, and 0% i.e.,
cannot be deciphered.
|
| |
2.Identification by the OCR is also based on prior information on the reasonable response, but the identification level of an entire field can
be checked.
|
| |
3.Handling of information begins from the isolated position, but 15% of the information includes erasures and cancellations or just plain
extraneous lines that were added in error. The question then is whether
to handle the information at the household level or the individual level
prior to handling it at the isolated position level.
|
| |
4.Every field on the questionnaire has different accuracy requirements. The question is whether to establish one process (for maximum accuracy)
or several processes according to the type of field and the control files
which exist for it.
|
|
|