Technical Information Paper No. 12
Digital-Imaging and Optical Digital Data Disk Storage Systems: Long-Term Access Strategies for Federal Agencies
July 1994
A Report by:
The Technology Research Staff
The National Archives at College
Park
8601 Adelphi Road
College Park, Maryland 20740-6001
Appendix A: Federal Agency Site Visit Reports
Site Visit Selection Criteria
NARA's Technology Research Staff conducted a nationwide survey of Federal government agencies to identify existing optical digital data disk installations. This data collection process obtained up-to-date user experiences, and helped to gather insights into system administrator's plans for applying optical digital data disk technology within their respective agencies. The survey process identified a diverse universe of small, mid-range, and large sized systems storing raster image and digital data. System criteria used to select the fifteen site visits included:
Size of System (Small or Large)
Small systems were defined as having under twenty optical digital data disks in use, few image capture and user workstations, and no jukebox. In practice, small systems may often be pilot projects that have not fully scaled up, or are serving as a research test platform. Larger systems, on the other hand, typically store optical digital data disks in a jukebox, employ network communications linking multiple imaging and user retrieval workstations, and feature high speed image capture equipment.
Type of Digital Information Stored (Image or Data)
The systems described in this report store information on optical digital data disks. This digital information is in the form of scanned document images, or digital ASCII data, databases, numerical information, or scientific data.
Information Retention (Temporary or Long-term)
Temporary information retention includes records scheduled (or likely to be) with a life-span of under seven years, which is also the approximate life span of a typical computer system. Long-term retention includes scheduled or unscheduled records with a life span greater than seven years, regardless of whether or not the records will ever end up in the National Archives.
Functionality of System (Stand Alone or Integrated)
Stand alone systems often have a single, narrowly defined purpose, even if the system is linked to the agency data base. In many cases, stand alone systems have at best a fax link to gain access to other information systems. In comparison, integrated systems serve the core mission of the agency, or are linked to other automated systems that are administered by other units or even by other agencies.
The research study site selection process for this report also included other criteria such as: identification and availability of knowledgeable agency resource people able to cooperate with this research project; access to full technical documentation that describes each system; and, achieving a diversity of agency missions and types of information processed contributing to a balanced report coverage.
Site Visit Record Holdings
A majority of the fifteen Federal agency systems surveyed maintain multi-page case files, comprised of records containing mixed forms where a single index point (typically personal name, corporate body, or case number) provides access to a single image or logical file. Examples of case file storage systems are official military personnel records, hazardous waste site documentation, and records released under the Freedom of Information Act. Other systems visited maintain images of single or multi-page standard forms where access is provided by unique identifying number (e.g., social security number) or personal name. Examples of standard form systems are those for patent and trademark applications, applications for licenses and grants, and income tax returns. The remaining systems as surveyed contain non-image format data files formerly stored magnetically or as computer output microfiche; and mixed records containing a variety of records applications. A summary of the records classifications stored on optical digital data disks by the Federal agency sites visited includes:
- Case files: Construction engineering documentation containing a mixture of electronic and non-electronic formats including construction documents, tech reports, maps, microforms, engineering drawings, video tapes, 35mm films, books and periodicals.
- Case files: Official personnel records from paper and microfiche that need purging; government personnel forms, evaluations, awards, and medical forms.
- Case files: Federal land records of survey notes, plats, tract books that form the basis of land title searches--old, often fragile (brittle) handwritten information.
- Case files: Technical reports and documents describing hazardous waste sites, used for evaluating health risks and emergency events involving toxic substances.
- Case files: Environmental cost recovery reports, legal documents associated with cleanup of high priority toxic waste sites.
- Case files: Documents to be released under the FOIA laws, need redaction or clean- up prior to release.
- Case files/standard forms: Official agency records of judicial rule making and adjudicatory matters, applications for licenses and grants, and reports filed by cable system operators, often requiring next-day turnaround.
- Case files/standard forms: Claims processing; royalty collection documents for payments to the government for natural resources extracted from US lands; government forms fiscal records.
- standard forms: Applications and approvals for patent documents, scanned off site at document storage repository.
- data files: Seismic data (earth tremors) captured by remote sensors, useful for earthquake monitoring, replaces magnetic tape storage.
- Data files: Environmental and coastal satellite data for water temperatures, weather patterns, ocean currents, and other US coastal and Great Lakes data measured with instruments or observed.
- Data files: Microfilm replacement system for self-employment tax information.
- mixed records: Captured war documents, maps, misc. used for intelligence often in foreign language.
- Mixed records: Daily newsclips and legal docket records.
- Mixed records: Newly released public policy documents.
Listing of Federal Agency Sites
Of the fifteen Federal agency systems examined in detail, the range of responsibilities included: Armed Forces units (3); Federal land management office (1); public health care oversight agency (1); financial trading regulator (1); environmental oversight (1); communications regulation office (1); library references and services provider (1); natural resources (1); climatic monitoring office (1); invention registry office (1); wage and retirement benefits claims processing (1); Freedom of Information Act processing unit (1); and, earth tremors or seismology events monitoring (1).
Detailed site descriptions are provided for the following fifteen Federal agencies:
- Site Visit Report #1--Agency for Toxic Substances and Disease Registry
- Site Visit Report #2--U.S. Army Corps of Engineers
- Site Visit Report #3--Bureau of Land Management (Eastern States)
- Site Visit Report #4--Commodity Futures Trading Commission
- Site Visit Report #5--Department of the Army (Chief of Staff Office)
- Site Visit Report #6--Department of the Army (PERMS)
- Site Visit Report #7--Environmental Protection Agency
- Site Visit Report #8--Federal Communications Commission
- Site Visit Report #9--Library of Congress
- Site Visit Report #10--Mineral Management Service
- Site Visit Report #11--National Oceanic and Atmospheric Administration
- Site Visit Report #12--Patent and Trademark Office
- Site Visit Report #13--Social Security Administration
- Site Visit Report #14--State Department
- Site Visit Report #15--United States Geological Survey
SITE VISIT REPORT #1
AGENCY: Agency for
Toxic Substances and Disease
Registry
SYSTEM: Toxicological
Profile Image System Public
Health Assessments Image
System Cost Recovery Image
System
CONTACT: Sharon O.
Jacobs, Director, Office
of Information Resources,
Management Agency for Toxic
Substances and Disease Registry,
Atlanta, GA
SUMMARY DESCRIPTION:
Since 1988, the Agency for Toxic Substances and Disease Registry (ATSDR) has utilized a state of the art information system for document image management. A Wang Integrated Imaging System (WIIS) is used to convert scientific and administrative documents to optically stored digital images. The digital images describe the links between human exposure to hazardous substances and an increased incidence of adverse effects to health. The image data is recorded onto twelve-inch write once, read many (WORM) times optical digital data disks in a multi-platter jukebox retrieval system. A single unified computer interface provides access to the document indexing system, and preserves the complex structure of ATSDR reports and documents. This dual purpose computer interface also provides user access to HazDat, a scientific database containing environment and health data stored on a mainframe computer. Additionally, the ATSDR imaging subsystem conforms, to the extent feasible, with existing industry and government information technology standards.
The ATSDR information system is considered important to this study for several reasons, including: the system's potential to support interagency sharing of health related digital image data; the agency-wide approach concept applied to system development; the use of a single imaging integration vendor for system design, personnel training, and follow on technical support; the automated computer linkage to the mainframe database access system; and, the agency's recognition of document imaging legal admissibility issues.
BACKGROUND:
The mission of the Agency for Toxic Substances and Disease Registry (ATSDR) is to "prevent or lessen harmful effects to people and their quality of life caused by hazardous materials in or near their communities." The Agency was created as a separate entity of the Public Health Service (PHS) in 1980, within the Department of Health and Human Services. The creation of ATSDR as a Federal agency is one of several initiatives resulting from the Comprehensive Environmental Response, Compensation, and Liability Act (CERCLA), or what is more commonly known as "Superfund" Legislation. Congress generated this body of legislation as part of its response to two highly publicized and catastrophic events of the late 1970's: discovery of the Love Canal waste site in Niagara Falls, New York, and the industrial fire in Elizabethtown, New Jersey, which set off the release of highly toxic fumes into the air in a densely populated area. Although the ATSDR functions as an autonomous agency, it receives administrative support from the Centers for Disease Control, also headquartered in Atlanta, Georgia. The public health mandate of ATSDR differs substantially from the regulatory function of the Environmental Protection Agency (EPA), although both agencies exchange information on sites.
More than 400 ATSDR scientists and science administrators are tasked with collecting information on the release of hazardous substances from toxic waste sites or from emergency events involving hazardous materials. They are also concerned with the health effects of these substances on human populations. This information, compiled in the HazDat database, is then used as a scientific technical repository when creating agency products such as public health assessments and supporting documentation, medical health consultations, toxicological profiles and other site characterization documents. These important information sources are revised regularly as new research findings are released. The agency is currently responsible for assessing the health risks of more than 1,350 National Priorities List (NPL) toxic waste sites identified by the EPA. This is only a small part of the more than 38,000 toxic waste sites listed in the EPA CERCLIS database.
The need for imaging capability within the ATSDR is based on several factors, including: the complexity of the HazDat database; the incompatibility of at least four stand-alone database systems used by the research scientists; and, the volume and complexity of the published and unpublished output of the agency. Maintaining an effective audit trail throughout the technical report production process is one of the agency's biggest challenges. In addition, the wide distribution of the agency's products in hard-copy form is an expensive and time-consuming task. The agency's products are written for scientists and public health officials, but are provided on request to those individuals with any interest in toxic waste sites. These requesters include Congressional staff and the general public, state and local governments, other Federal Agencies and academia.
Origins: A Needs Assessment study, undertaken by the Office of Information Resources Management in 1987, identified key integrated system functions needed to support the agency's primary responsibilities. Document tracking was difficult due to lack of communication between existing stand-alone ATSDR database systems. The principal system requirements identified in that study included: linkages to the HazDat scientific database; remote access from ten regional offices; data portability; the creation of a report production audit trail; provision for records management and disaster recovery; the capability to retrieve selected portions of complex documents and reports compiled from a number of diverse sources; and storage and space considerations.
ATSDR's goal was to create a system for electronically capturing, processing, storing, and retrieving ATSDR toxic substance data linked to the scientific HazDat database. Important system criteria included the need for a user-friendly interface, accurate and timely data, security, and the ability to integrate existing hardware into the new system configuration. In 1988, a "try and buy" prototype document imaging system was installed and tested under operational conditions. A full-scale document imaging system was subsequently developed and installed. Full scale document conversion began in March 1990, and to date all of the National Priorities List (NPL), public health assessment documents and toxicological profiles have been digitally scanned and stored on optical digital data disks, as well as Toxicological Profile references numbering over 35,660. The Agency's next imaging priority is inclusion of site files documentation. ATSDR's ultimate goal is to include all relevant documentary sources and agency products into a fully integrated information system readily accessible by users.
SYSTEM CONFIGURATION:
Date system installed: 1988
System Installed by: Combination of OIRM staff and Vendor (Wang).
System Configuration Changed Since Installation? Yes. Two optical disk drives have been added to the system. One in the Jukebox and one stand alone. These two drives support the Cost Recovery Image System.
- Communication Environment: Novell LAN interface between an IBM 9070
Model 520 (mainframe), a Wang VS 7310 minicomputer, and multiple desktop
personal computers.
-
Database development was performed under ADABAS/Natural.
-
Document Scanning: Controlled by a Wang VS computer as part of Wang's
Integrated Imaging System (WIIS).
-
Index: The index for retrieving the document images is maintained separately
on magnetic media.
- Image Storage: Wang 80-disk optical jukebox with 3 optical drives.
DIGITAL IMAGE CAPTURE:
Data Access: Public Health Assessments and toxicological profile documents are scanned even though a majority of recently produced documents are available electronically. The rationale behind this is that the imaging system serves as an agency-wide repository for accurate, timely, and complete scientific and administrative information. Further imaging serves as a vital element in the agency's corporate electronic enterprise. The other elements include, text, data, and voice.
Document Scanning: Four document scanning workstations, each operating at a rate of 7-8 pages per minute, convert the original hard copy reports. The physical condition and visual appearance of the documents vary considerably, requiring scanner contrast control adjustments to ensure image quality.
Scanning Personnel: Operational responsibilities for all applications are handled by ATSDR staff.
Estimated Number of Documents/Records Converted: 75,405 divided as follows (Asterisk marks generated computation above):- TOXICOLOGY DIVISION
Profiles 108*
References (records) 35,660*
Reference Image 20,474
Total pages scanned 316,316 - HEALTH ASSESSMENTS
Profiles 1,315*
Total pages scanned 19,171
- COST RECOVERY
Time Sheets records 6,631
Linked T/S entries 38,322*
Total pages scanned 6,631
GRAND TOTAL PAGES SCANNED 342,118
- Number of Platters used:
- Tox Division: 12 double sided (2 Gigabytes each)
24 single sided (1 Gigabyte each) - DHAC: 4 single sided (1 Gigabyte each)
- CRS: 2 single sided (1 Gigabyte each)
- Tox Division: 12 double sided (2 Gigabytes each)
- Total Information in optical disks (approximately): 54 Gigabytes
Disposition of Original Records: Records scheduled for 1-10 years.
Quality Control: ATSDR digital image quality assurance procedures require on- going evaluation and maintenance of scanner performance. Operator procedures specify that the scanners be calibrated in accordance with the manufacturer's specifications. Scanner operators visually inspect each image captured to ensure conformance with established quality criteria. A major image acceptance factor is eye-readability (i.e., not too dark or light). No follow-up image quality sampling inspection or image evaluation test targets are used.
Scanning Resolution: ATSDR's scanner resolution settings are selected based on document type and physical characteristics. For example, toxicological profile files are typically scanned at 200 dots per inch (dpi), while health assessments are routinely scanned at 300 dpi. Testing with actual ATSDR documents showed that 300 dpi retains fine-line details of the graphs and other complex graphics features, while also providing a significant improvement in screen display and laser print qualities.
Color and Gray Scale: ATSDR has identified no immediate need for either color or gray scale scanning.
Image Enhancement: No special image enhancement techniques are used other than basic light/dark contrast adjustments.
Compression/Decompression: Wang's proprietary software efficiently reduces and restores the electronic digital image files.
DOCUMENT INDEXING:
Creation of Index Database: During the scanning process, computer programs take over the task of indexing. These programs make the process completely transparent to the varied report structures which was an important system design factor. The system accepts new information and the (sometimes) substantial content and structure report updates. After document scanning, index data is key entered using the display screen images.
Location of Index Database: The IBM 9070 mainframe computer, containing the central HazDat database, is linked to the Wang WIIS document index files. The initial HazDat database modules were correlated from several agency stand-alone information systems. The 9070 computer system's magnetic disks contain the document structures and formats. The document index database is maintained and located on the Wang Image server VS7310.
Index Structures: The indexing system preserves the structure of ATSDR documents using a complex hierarchical indexing scheme. Report components (e.g., table of contents, charts, graphs, chapter breaks) are tagged after scanning using a series of PF (function) keys. This function key capability, of particular value to ATSDR scientists, allows users to quickly switch from report text to technical references. The hierarchical indexing scheme follows the sample format.
3.2.2 Health Assessments
3.2.2.1 Summary/Executive Summary
3.2.2.2 Background/Introduction
3.2.2.2.1 Site Description and History
3.2.2.2.2 Site Visit
3.2.2.2.3 Demographics
3.2.2.2.4 State and Local Health Data
3.2.2.3 Community Health Concerns
An intellectual challenge arose when ATSDR grappled with the philosophy of the indexing scheme for linking document images and the image database. Top management voiced strong support for an Agency-wide indexing scheme, while some of the scientists voted for at least two different approaches, by section/chapter and by document. This issue took a good bit of time to resolve and it was finally agreed upon to have an agency-wide standard that allowed the two indexing schemes to coexist. The reasoning behind these two approaches was simple--the indexing scheme was based upon how the scientists were accustomed to locating information from the agency products.
OPTICAL DIGITAL DATA DISK STORAGE:
Scanned image pages are stored on magnetic disk cache until each document page passes quality control inspection. The Wang WIIS software allows the in-process images to be overwritten (re-scanned) to correct image quality problems. Approved document images are subsequently recorded onto the write once optical digital data disk media for permanent retention. Exact duplicates of the original optical digital data disks are created using the Wang system's backup procedures, with Fort Knox used for off-site archival disk storage and disaster recovery.
Image File Headers: The ATSDR's images are in compliance with the tagged image file format (TIFF) (Class B, Type 3 or 4, Version 5.0).
Error Detection/Correction: An optical media data error checking capability runs in system background and is transparent to users. The ATSDR optical disk subsystem has error reporting capability, but the specific system software capabilities are unknown. No optical digital data disk failures were reported to date.
Recording Process: 12-inch, write once, read many (WORM), dual-sided optical digital data disk media.
Optical Digital Data Disk Composition: Glass substrate.
Capacity: Data storage of two Gigabytes per platter.
Number of Optical Digital Data Disks in Use: 60
Jukebox: Wang jukebox with three optical drives, 76 disks.
Storage Environment: A computer room environment with controlled temperature/humidity conditions is maintained for the operational system.
RETRIEVAL AND OUTPUT:
ATSDR's imaging system automates an important segment of the agency's total information processing needs. The ATSDR system's primary function is to support information retrieval and produce hard copy reports on demand. The system's search and retrieval software assists in the rapid identification of appropriate segments of relevant reports, and automatically spools the images to a print server. Access to both index data and the HazDat database is through a single software interface, with EPA's CERCLIS toxic waste site identifier used as the common link. All users may access the imaging subsystem, and facsimile transmission (FAX) provides image and text data transmittal and receipt.
Primary System Users: Scientists and Science Administrators
User Interface: The ATSDR's image applications were developed with ease of use in mind, designed in "electronic book" format. The first menu screen provides a list of topics, chemicals or sites. After user selection, the sections or chapters available are displayed. Using the display screen menu prompts under keyboard control, users specify that an image be displayed, printed, or facsimile transmitted anywhere in the world. Display Output: Users use 19 inch high resolution monochrome image display monitors and/or standard or super VGA color monitors.
Laser Printing: Hewlett-Packard 3SI laser printing equipment.
DATA MIGRATION POLICY ISSUES:
ATSDR is committed to building and maintaining an agency-wide imaging capability as one part of a comprehensive management information system. The overall system includes: a bulletin board service; a geographical information system; a regional information system; as well as an administrative and personnel database, and a Cost Recovery application. Although still in the early stages, the imaging system was designed with remote access and data transfer capabilities in mind.
Linkages with Other Agency ADP Applications: ATSDR in-house computer scientists developed the IBM mainframe computer linkages to the proprietary Wang hardware and software, resulting in a considerable cost savings. Remote access to the HazDat database is currently offered, but remote images are only available through FAX request interface.
Network Transmission: Full LAN capability exists for transfer of image and index data. The system is also equipped with an Internet gateway for transferring index data only.
Backup of Image and Index Data: Magnetic disks are used for daily incremental backups of image and index data, with bi-weekly magnetic disk backups of the image data. Image data is then written to magnetic tape. When the master optical digital data disks are completely filled with images, mirror-image optical digital data disk copies are created. These backup optical digital data disks, with a descriptive naming convention for identification, are stored in Fort Knox under environmentally controlled conditions.
Technical Support And Documentation: System users and managers use a combination of in-house and Wang-supplied technical and administrative documentation. In addition, full time Wang computer technicians are located on-site. A Wang senior systems specialist is available as needed for additional technical system consultation.
Interoperability: Wang's Integrated Image System (WIIS) open image architecture support the capture, storage, retrieval, management, and control of digital image data stored on magnetic or optical media. Wang's imaging system is compatible within the Wang VS minicomputer family. WIIS applications can be developed using Wang software and standard programming languages, while also supporting third party software packages. Stand alone optical disk drives and jukeboxes appear as any other storage (magnetic) device through the SCSI and RS 232 interfaces.
Migration Plans: ATSDR system administrators are committed to wholesale optical digital data disk recopying upon expiration of the media's warranty. They intend to stay with the existing system for the short term, but the long-term strategic plan is to migrate imaging to a different platform. ATSDR's Office of Information Resource Management and WANG are beta testing, on-site at ATSDR, WANG imaging on an IBM RISC, RS6000 platform.
OVERVIEW OF SIGNIFICANT ISSUES:
Business Process Re-Engineering: Performed to obtain greater benefits from the new imaging system?: Yes. The paper flow of the organization has been changed. Documents are not simply placed in file cabinets nor stored in boxes in a warehouse. They are now indexed according to subject, site, employee or whatever the application might call for, and stored in optical digital data disks and made available to those scientists and science administrators needing the information.
The health assessments, toxicological profiles and other documentation description of the 1,350 hazardous waste sites contain data and information that is frequently accessed, retrieved, and copied by scientists in Atlanta, Washington, D.C., and the ten regional offices as ATSDR staff conduct their work. Each year, some of the health assessments and the toxicological profiles may as required be updated. This may require reviewing old or new references the total of which may be in the tens of hundreds for an individual document, and contains everything from chemical compound listings, maps, photographs, site sampling data, and even handwritten notes. All of this data has to be identified, retrieved and copied by the reviewers.
Updating a paper-based health assessment or toxicological profile was a time- consuming task and involved considerable amount of staff effort to accomplish. Often, time was lost trying to find the most current version of the document of interest. Time was also wasted just trying to find a misplaced file or document. Imaging changed the way the scientists did their work. No longer would they be required to keep numerous paper documents on desks. Originals would be scanned onto optical digital data disks, where they would be easily accessible to those with the need to review such material.
With imaging, scientists are confident that they are working on the bona fide and latest versions of the document--the one that is in electronic form in the system--and not have to question which paper version is the most current. And with OCR, they can manipulate, edit, and update the information using local word processing packages.
The imaging system enabled ATSDR to identify and to measure a series of notable benefits. Most significantly, it has saved the scientists considerable time in accomplishing their tasks. Time once spent searching for paperwork can now be spent addressing complex and pressing public health problems.
The integrated imaging system also provides scientists greater accessibility to timely, complete, and credible information and thus has enhanced the Agency's ability to respond to both public and private sector inquiries. Information on a hazardous waste site may be needed on short notice when ATSDR is called to testify before Congress. Before document imaging, this often involved long searches and there was little control in place as to who had what material, or whether duplicates or outdated versions existed.
Without a doubt, document imaging has enabled ATSDR to provide a better public health response to its constituents.
Agency-wide Imaging: ATSDR decided to build an imaging capability in a phased approach, beginning with functions that promise to have the largest agency staff "payback". The imaging system currently supports agency functions toward the end of the work flow process, namely, the retrieval and dissemination of ATSDR products. The introduction of imaging, however, is already affecting working relationships and work flow in other agency components. Another critical issue was evaluating and educating top level management on the implications of adopting imaging technology on an agency-wide level. One major challenge was to foster and understanding of the potential for all of the Agency's staff as to how imaging technology could support their individual and joint efforts.
ATSDR's Office of Information Resource Management pointed out that Information Technology in general and imaging in particular served as a stimulus toward changing agency policy for establishing and maintaining comprehensive, readily accessible public health findings, as mandated by Congress.
Single Vendor: The ATSDR utilized Wang office information and automation equipment prior to the installation of the WIIS imaging system. The agency's administrators have a firm commitment to an imaging system; The platform may change, however, in 3-5 years.
Access System: The ATSDR's indexing system enhances access to and preserves the structure of the complex imaged reports. The agency's codified indexing system and the process followed to develop it (including the resolution of internal differences) may provide useful guidance for other Federal agencies in converting records with complex filing structures.
Legal Admissibility: ATSDR recognizes the potential legal implications of maintaining the record copy of agency documents on an imaging system. The agency's Assistant Administrator sought a legal opinion from the general counsel of the Department of Health and Human Services concerning the admissibility of optical images in cost recovery litigation. The response noted that courts have been very willing to admit evidence stored in computers. "As long as the printout is readable, and there is a witness who can testify as to the originality and authenticity of the computer records and the printout, there should be no problem of admissibility."
SITE VISIT REPORT #2
AGENCY: U.S. Army Corps of Engineers (USACE)
SYSTEM: USACE ODI Pilot Project
CONTACT: Linda Worthington, USACE Records Administrator, Washington, DC
AGENCY OVERVIEW:
Effective utilization of information resources is critical to daily operations in the US Army Corps of Engineers. The Corps mission is to provide quality, responsive engineering and environmental services to the American nation. To do this, the Corps employs about 40,000 civilian and 600 military personnel worldwide. The annual budget is about $12 billion.
The Corps plans, designs, builds and operates water resources and other civil works projects, provides military construction including design, construction management and real estate work for the Army and Air Force and design and construction management for other Defense and Federal Agencies. The Corps remediates hazardous and toxic wastes at Army and Air Force installations and at Formerly Used Defense sites. The Corps has four research and development laboratories. Its regulatory program, established in the 19th century to protect navigation, has been expanded so that today the Corps implements environmental protection statutes, preserves wetlands and protects other natural values. The Corps responds directly to natural disasters and other emergencies as the nation's primary engineering agency through its own authorities and in support of other agencies.
These mission critical functions require ready access to the agency's records holdings. This information management effort is made more difficult due to the variety of incompatible information storage media and formats in the Army Corps records holdings.
The Corps is looking for more effective approaches to accessing and sharing information with offices throughout the Corps as well as enhancing our customer service. One example is a recent Corps of Engineers information management initiative to pilot digital imaging systems. Five Corps of Engineer offices will serve as pilot imaging system test sites and will evaluate imaging technology under real world conditions. The Corps is pursuing pilot test systems with open system architectures, avoiding proprietary or unique vendor- specific solutions.
The Army Corps of Engineers optical imaging system is important for this study because of: the agency's need to make mission critical image and index information available Corps-wide; the need to integrate multi-media formats into one cohesive information system; and, the integration of simultaneous multiple site pilot imaging systems connected in a network configuration.
BACKGROUND:
The US Army Corps of Engineers optical digital data disk imaging pilot system responds to a need to manage large volumes of information currently maintained in a variety of non-electronic formats. The US Army Corps of Engineers monitored the optical imaging marketplace for several years. The proprietary solutions offered by the imaging industry, combined with the lack of Federal Government standards or policies related to optical media, resulted in minimal Corps involvement to date.
Although the existing imaging technology industry environment can result in incompatible systems, many Corps Offices nationwide were planning to adopt digital imaging systems into their business operations. As a result of this interest and the lack of standards, the US Army Corps of Engineers chose to conduct a pilot test to determine the feasibility of using digital imaging technology.
Pilot System Approach: The pilot systems will help determine the role of digital imaging technology in the Corps future information strategy. Corps management is seeking to identify imaging requirements and eventually adopt a Corps-wide imaging solution, utilizing off-the-shelf, commercially available technology as much as possible. The Corps adopted a three phased pilot system approach:
- Phase I - Conduct Requirements Analysis Study
- Phase II - Design Pilot System; Develop Unique Functional and Technical Specifications
- Phase III - Install, test and evaluate the pilot systems
Pilot System Overview: The Corps existing non-electronic diversely formatted information, including documents, technical reports, engineering drawings, maps, and other formats requires manual, labor intensive and time consuming searches for information. The Corps of Engineers Information Management goal is to improve the efficiency of its records and information management programs by making existing and future data available Corps- wide in electronic format. This includes converting incompatible data formats to digital images, creating a computerized index system for improved search and retrievals, and permanently storing the information on write once, read many (WORM) optical digital data disks in Group IV compression.
The US Army Corps of Engineers expects to more effectively store and retrieve diversely formatted information once it is converted to a single, user friendly digital form. Adoption of digital information technology will provide a future capability to electronically route and share the agency's information more efficiently. The index database will provide electronic access to the valuable records collection. The Corps of Engineers expects to derive tangible benefits from digital imaging technology including improved staff productivity. These benefits will be based on: multiple, simultaneous access to electronic information; faster access to information; enhanced decision making processes due to improved access to information; increased record integrity; lower costs and space needs for records storage; and, improved efficiency and service to Corps of Engineers customers.
A series of pilot projects will help determine the suitability of imaging technology for Corps applications and imaging's ability to support inter-office and intra-office workflow and information exchange. No records will be destroyed since this is a pilot test.
Pilot System Description: In 1992, the Directorate of Information management initiated an Optical Disk Imaging (ODI) Pilot Test to evaluate the feasibility of integrating the latest commercially available technologies to provide Corps offices greater information access, storage and retrieval capabilities; determine appropriate policies, standards, and procedures; and, determine the most cost-effective solution.
The final phase of the ODI Pilot Test is in progress and will be completed in 1994. During this integration phase, the following items will be tested and evaluated:
- Integrate ODI technology into the Corps 95 open systems architecture.
- Scan and index documents, drawings, photographs and maps.
- Link documents to a Corporate Database.
- Provide remote access to images among test sites and HQUSACE personnel.
- Use and evaluate a Corps developed records management indexing system.
- Use and evaluate a Corps Scanning contract for digitizing E-size drawings and aerial photographs.
- Determine feasibility of importing digital microfiche and 35mm slides.
- Evaluate impact on LANs and the Corps WAN based on adding image traffic.
Functional users were recently trained on how to retrieve information from the image database. The database resides on an multi-function optical digital data disk jukebox connected to the CD4000 platform. They are using their own locally networked 386 and 486 PCs to search, retrieve, and display the images. Windows, Oracle SQL and imaging software were added to their PC configuration (See Pilot Configuration Section).
An evaluation of the pilot test will be conducted. Functional users will be asked to comment on how ODI helped them. Value added benefits we hope to achieve include providing an additional tool for re-engineering some of our business processes; increased productivity; enhanced decision-making; reduced storage and paper costs; and, enhanced customer service.
Plans are to use existing contracts to acquire ODI equipment and software. By late Spring, ODI policies, standards, procedures, and lessons learned will also be developed.
In 1992, a moratorium was issued on the purchase of ODI equipment/systems. This moratorium remains in effect until ODI policies, standards, and procedures are in place.
Pilot System Configuration:
The pilot design is based on open systems architecture. Pilot sites will use the Corps existing wide area network (WAN) to support the image traffic via T-1 lines. They will use locally owned Unix computer systems along with their relational database software to run a Corps developed records management indexing database system. The pilot imaging system will utilize commercially available, off-the-shelf (COTS) hardware and software components.
In the Fall of 1993, a systems integrator installed imaging systems at the following pilot locations.
- Mobile District, Mobile, Alabama
- Albuquerque District, Albuquerque, New Mexico
- Huntington District, Huntington, West Virginia
- HQ Health and Safety Office, Washington, DC
- Army Environmental Center, Aberdeen, MD
- Client/Server with Multifunction Optical Jukebox
- Scan/Index Workstation
- Retrieval Workstation
- Scan/Index Workstation
- Client/Server with Multifunction Optical Jukebox
- UNIX Client Server with SCSI.
- Multifunction Optical Jukebox.
- UNIX Operating System.
- Relational Database Software.
- Optical Disk Software.
- Tabletop Document Scanner.
- PC Platform: 486/50Mhz, 16MB RAM, 5.25 and 3.5-inch floppy drives, 500 MB Hard Drive, SCSI Controller, Mouse, Network Interface Card, 101-style keyboard.
- 19-inch High Resolution Monitor, Dual Page 150 DPI.
- Image Scan/Display/Compression and Decompression Components.
- PC Operating System Software, Graphical User Interface Software, Imaging Software, and SQL Software.
- Desktop Laser Printer.
- PC Platform: 486/33Mhz PC with 8MB RAM, VGA 14-inch Color Monitor, Mouse, 101-style keyboard, Network Interface Card.
- PC Operating System Software, Graphical User Interface Software, Imaging Software, and SQL Software.
OVERVIEW OF SIGNIFICANT ISSUES
Interagency agreements: The Corps of Engineers is identifying Federal Government policies and standards applicable to imaging systems and records management. The Corps plans to establish a working group to assist with their technology projects, and coordinate with the National Archives. The Army Corps of Engineers and the National Archives, under a formalized Memorandum of Understanding, plan to examine legal admissibility and long term archival requirements related to digital imaging and optical digital data disk technologies.
SITE VISIT REPORT #3
AGENCY: Bureau of Land Management (Eastern States)
SYSTEM: Federal Land Patents System
CONTACT: James F. Gegen, Project Manager, Bureau of Land Management, General Land Office Records, Springfield, Virginia
SUMMARY DESCRIPTION:
In 1989, The Bureau of Land Management's (BLM) Eastern States office initiated a multi-year project to digitally scan, enhance, index, store, and retrieve approximately nine million pages of historic Federal land grant patents and related survey documents using twelve-inch optical digital data disks. The major goals of the General Land Office Automated Records System (GLOARS) include the preservation of the land patent documents dating back over two hundred years, and improving user access to the information. The records, which chronicle land title transfers for over 1.5 billion acres of public domain properties, are important in adjudicating land ownership. The actual conversion to digital images is an ongoing effort performed by an on-site contractor. The conversion process includes document preparation, scanning, indexing, image/data quality control, and image recording onto write once, read many (WORM) times optical media. The retrieval system supports full boolean searching of index fields, and offers access to the index and image data. Digital images identified through an index search may be displayed on a 19" high resolution monitor when linked to an optical "jukebox" and printed out to letter-size paper using laser printing equipment. The land grant patent retrieval system went on-line in February 1993 and operates on a fee-based cost recovery basis.
The BLM System is important for this study because of: the historical significance of the land grant records; the comprehensive indexing and searching capabilities; lessons learned about information technology standards; positive experience with a system integrator; and, cost recovery plans based on user retrievals.
BACKGROUND:
The functions of the Bureau of Land Management and its predecessors date back to the Land Ordinance of 1785, establishing for the first time the rectangular survey system for public lands. The public domain initially consisted of western territory claimed by the original 13 states eventually ceded to the Federal Government. Additional acquisitions over the years resulted in the public domain consisting of about 1.8 billion acres. The General Land Office, then part of the Treasury Department, was tasked with surveying these lands and maintaining the land status records. The BLM's present management of public lands and resources is based on the Federal Land Policy and Management Act of 1976. The BLM now manages over 270 million acres of public lands, including the resources they contain such as soil, water, air, timber, surface and subsurface minerals, oil and gas, geothermal energy, wildlife habitat, wild and scenic rivers, and open space.
According to its mission statement, the Bureau of Land Management Eastern States Organization "is responsible for the stewardship of public lands and resources under the jurisdiction of the BLM in the 31 states east of and bordering the Mississippi River on the west. These public lands and resources will be managed to protect the environment and provide a diverse array of products and outdoor experiences. The Eastern States is also responsible for the maintenance and protection of the official land records and cadastral surveys for the Department of the Interior." Customer service and public outreach are stated components of this mission and the BLM intends to use state-of-the-art technology and related research efforts to accomplish its goals.
BLM Eastern States has custody of more than 9 million Federal land documents such as survey notes and plats, tract books, and land patent records that document the country's westward expansion. The information in these records includes areas, boundaries, ownership, limitations on titles such as rights-of-way, and other characteristics that affect the value and use of the land. The tract books, first put into use in 1810, are large, bound volumes with public domain transactions recorded. In many cases, the Eastern States' copy is the only extant version. The original (often brittle) paper versions must often be consulted to decipher handwriting and other important information. Although the entire collection exists on microfilm the quality of the film is poor and the originals are relied on for accurate information. Access to the microfilm and paper versions of the land patent records is only available via track book indexes that require the user to supply a specific legal description of the land coordinates.
Improved public access to land records is a critical part of the BLM mission. These land records, some dating as far back as 1788, document the initial transfer of sovereignty to private individuals. They form the cornerstone of the title search process that is required by law whenever property is sold. In addition to title companies, BLM customers include other Federal, State and local government agencies, lands and minerals consultants, scholars, and private citizens. The Eastern States is responsible for maintaining the documents relating to the public lands for the 31 states geographically located east of and bordering the Mississippi River on the west. (The public land states are Indiana, Illinois, Michigan, Wisconsin, Minnesota, Iowa, Missouri, Arkansas, Louisiana, Mississippi, Alabama, Florida, and part of Ohio. Federal lands in any of the other eastern states are generally lands acquired for parks, forests, wildlife refuges, Native American reservations, etc.)
Origins: A key question facing the BLM was how to protect the fragile, historical general land office documents and continue to meet the needs of its users including: Federal, State, and local government agencies; title companies; lands and minerals consultants; and, private citizens. This growing concern led to a contract to Stone and Webster Company to evaluate preservation alternatives and costs. This 1986 study recommended the continuation of the existing microfilming program augmented with an automated indexing system at a cost of $20 million over seven years. An ensuing feasibility study was conducted in 1988 by West Coast Information Systems (WESCO). This WESCO study recommended that the Bureau's land records be digitally scanned, indexed, and stored on optical digital data disks over a four year period at a cost of $6 million. BLM management decided in April 1989 to develop a digital imaging system. This decision was based largely on two factors: reliability of optical digital data disk storage systems; and, the ability to provide improved public access to the land records.
In 1990 the Department of Energy (DOE) entered into an interagency agreement with BLM to contract with a private firm to develop system requirements and attribute database specifications. A Science Applications International Corporation (SAIC) team, based in Oak Ridge, Tennessee, identified these requirements and specifications and in 1990 developed a prototype system that was used to scan and retrieve 163,000 Arkansas patents and other documents. The imaging system prototype was especially helpful in validating the PC-based architecture and throughput conversion rates. After some minor modifications in the prototype design, a production system was implemented in 1991. To date, over one million land patent records and related indexes for eight states (Arkansas, Louisiana, Florida, Michigan, Minnesota, Ohio, Mississippi, and Wisconsin) have been digitally converted. The conversion process is scheduled for completion by late 2000, costing up to $15 million to develop the full image and index database.
SYSTEM CONFIGURATION:
The BLM imaging system is a PC-based architecture divided into four functional subsystems - scanning, indexing, quality control, and retrieval. The PC-based environment is client/server oriented and supports the four subsystems and the ORACLE Relational Database Management System, Ethernet for a local area network, and ORACLE Structured Query Language (SQL) for database communications.
- LaserData LaserView LVNET Imaging System; LV-6000 Corvette Video Boards for Image Compression and Display; LV-8010 Scanner Controller Cards; QEMM Extended Memory Device Drivers.
- 80486 33-MHz PC with two 1.4 gigabyte hard drives and two 425MB external Fujitsu magnetic disc drives.
- Scanning Workstations--IBM PC/AT compatible 80386; Ricoh IS400 Document Scanners.
- Indexing/QA Workstations--IBM PC/AT compatible 80386 with 40MB hard disks.
- Optical Disk Subsystem--Sony WDD-600 Disk Drive; Sony WDC-610 Disk Controller; Sony WDA-610 50-platter optical disk jukebox.
- Image Retrieval Workstations--IBM PC/AT compatible 80386 with 40MB hard disks; Hewlett-Packard Laser Printers.
- Operating Systems: PC-based (NEC 386-486/20) Client/Server Environment; MS-DOS 5.0 on Novell File Servers, Optical Servers and Workstations; SCO- UNIX on the Database Servers.
- MicroSoft C v5.1 for Compiler/Assembler Functions.
- ORACLE Relational Database Management System version 7.0; ORACLE Lanserver for UNIX; Oracle SQL*Net for database communications.
- Ethernet EXCELAN XLN, NOVELL v3.11 LAN, TCP/IP, Group 3; Ethernet Controller Boards, Ethernet and SinNet cables; Ethernet Standard Transceivers and Receivers.
- SQL*Forms serves as Application Development System.
Technical System Specifications of Retrieval Components:
- SCO UNIX/ORACLE DATABASE SERVER (Qty=1)
(For Database and Operating System Services)
Gateway 2000 486/33 Tower PC
64Mb Random Access Memory (RAM)
Color VGA Monitor
(2) 1.3Gb Micropolis SCSI Hard Drive
(2) 425Mb Power Drive External SCSI Hard Drive
Equinox MegaPort (12 serial ports)
3COM 503-16 Network Adapter
Mountain FileSafe 1200Plus External Tape Drive
(12) 9600b External Hayes-compatible Modem
SCO UNIX System V Operating System
SCO TCP/IP Runtime System
Oracle Relational Database Management System
Oracle SQL*Net TCP/IP - MS-DOS/ORACLE CLIENT PC (Qty=3)
(For Accounting Administration, System Administration, and System Development)
Gateway 2000 486/33 Tower PC
8Mb Random Access Memory (RAM)
Color VGA Monitor
120Mb ESDI Hard Disk
EXOS 205T-512K Network Adapter
MS-DOS v5.0
Oracle Tools for MS-DOS
SQL*Net TCP/IP for DOS
Microdyne LAN Workplace for DOS - MS-DOS PRINT SERVER & LASER PRINTER (Qty=1)
(For Printing Document Images)
NEC PowerMate 486SX/25e PC
2Mb Random Access Memory (RAM)
Samsung 14" VGA Monochrome Monitor
120Mb Hard Disk
EXOS 205T-512K Network Adapter
LaserData LV6004 Image Processing Board
LaserData LV8030 Printer Controller
LaserData LV8023 LaserJet III Adapter
Hewlett-Packard LaserJet IIID Printer
MS-DOS 5.0
LaserData LV9100, LV914B Software
SQL*Net TCP/IP for DOS - MS-DOS IMAGING WORKSTATIONS (Qty=3)
(For Users in the BLM Eastern States Public Services Section)
NEC PowerMate 386/20 PC
8Mb Random Access Memory (RAM)
LaserData (Monoterm) LV719 19" High-Resolution Monochrome Monitor
40Mb Hard Disk
EXOS 205T-512K Network Adapter
LaserData LV6004 Image Processing Board
Oracle Tools for MS-DOS
SQL*Net TCP/IP for DOS
Microdyne LAN Workplace for DOS
LaserData LV9100, LV9150, LV914, LV9910 Software
- NETWORK LASER PRINTER (Qty=1)
(For Printing Data Reports)
Hewlett-Packard LaserJet II
Parallel Link HPL-100 Print Extender - MS-DOS FAX SERVER (Qty=1)
(For Unattended Fax-Out Services)
NEC PowerMate 386/20 PC
2Mb Random Access Memory (RAM)
Samsung Monochrome Monitor
40Mb Hard Disk
LaserData LV6004 Image Processing Board
Gammalink GammaFax CP Board
LaserData LV9100, LV916 Software
Alcom Easygate LanFax/10 Software - MS-DOS DOCUMENT SERVER & JUKEBOX (Qty=2)
(For Storage and Retrieval of Document Images)
NEC 486/25 PC
4Mb Random Access Memory (RAM)
Samsung 14" VGA Monochrome Monitor
120Mb Hard Disk
Adaptec 1540B/1542B SCSI Controller
LaserData LV6004 Image Processing Board
LaserData LV914B, LV912S, LV913S, LV9910 Software
Sony 50-Platter DSDD 2-Drive Optical Disk Autochanger
System Installed by: Staff and SAIC
System Configuration Changed Since Installation? (Yes)
The system originally started out with single density SONY optical media which was converted to accommodate double density media when it became available. Prior to installation of the Novell file server, the images were routed through a two cache system. The imaging components and systems software have been maintained and upgraded to maximize effectiveness of components.
DIGITAL IMAGE CAPTURE:
BLM's conversion processing includes nine distinct production stages: document preparation; logging volumes to processing queue; digital scanning; indexing; transfer to optical digital data disk; quality control review; BLM quality assurance; logging volumes from processing queue; and migration of attribute database to retrieval platform. The image conversion work flow process, performed on-site, uses a Novell file server configuration to reduce device and/or access contention problems.
Estimated Number of Documents/Records Converted to Date: Over 1.1 million.
Conversion of Records Performed By: Contractor - Dynamic Concepts, Inc.
Disposition of Original Records: Designated as Permanent Records. With approval from the Director, Bureau of Land Management, and Director of Eastern States, retire patent documents to National Archives after Project is completed and data in system is verified to be correct.
Document Preparation: After the scanned images are accepted, the patent documents are placed in acid-free archival boxes and stored in temperature-controlled vaults. We are not scanning or indexing the tract books.
Document Scanning: Ricoh IS400 scanners capture documents up to 11 X 17 inches, using a 6 page per minute manual feed transport system. The Ricoh scanners are calibrated prior to converting each discrete volume of land patent records. Contrast settings are also adjusted as needed during scanning to compensate for visible signs of document deterioration such as aged yellowed records, and volume-wide water stains. Dynamics Concepts, Inc., was awarded the conversion contract, providing on-site staff including: site manager; systems specialist; line supervisors; and, production workers rotating between scanning, indexing, and quality control stations. The document scanner throughput rates currently average 1,250 pages per day per scanner. (The prospects for any increases in project funding are very dim. We began the Project with two production teams - one working on BLM-owned equipment the other on leased equipment. The leased equipment has not been fully utilized for some time and the lease will expire in 1994. Because of funding limitations, we are not planning on renewing the lease and will continue to operate the production facility at the current rate of production.)
Quality Control: Quality control workstations utilize IBM PC compatible 80386 with nineteen inch high resolution (150 dots per inch) display monitors. The contractor conducts a 100 percent quality inspection of all images. The BLM Quality Assurance personnel select images for inspection using a statistical sampling technique, with image quality judgments based on visual comparisons of the digital images to the original documents. Scanned image data is stored in Novell file server (1.4 GB) until quality control is completed. When an image(s) fails the quality inspection, the document(s) are rescanned. Staff training and supervision provided by the contractor results in a claimed 99 percent scanning accuracy rate. Acceptable quality is defined as the ability to capture and display legible document images.
Scanning Resolution: The Ricoh scanners capture images at 300 dots per inch (DPI). Images are subsequently displayed at 150 DPI and printed at 300 DPI.
Color and Gray Scale: The Ricoh IS400 scanners do not provide color or gray scale capability.
Image Enhancement: Enhancement capabilities exist through seven levels of pixel density.
Compression/Decompression: Proprietary LaserData image compression and decompression algorithms are programmed into the system's workstation video boards. The algorithms conform to CCITT Group 3 standards, and a typical image is approximately 150KB after compression. BLM system administrators recommend open system technology as soon as industry/government standards are available to avoid becoming restricted to a single vendor's product line over the information system's life.
DOCUMENT INDEXING
Electronic images are immediately available for indexing following document scanning. Images retrieved for indexing are temporarily stored on Novell file server via a Local Area Network. Indexing workstations feature IBM PC compatible 80386 with nineteen inch high resolution (150 dots per inch) display monitors.
Creation of Index Database: Index data is key entered using the digital screen images. The index information is verified by quality assurance specialists using special quality control workflow software.
Location of Index Database: An IBM PC/AT compatible 486 server with a 1.4GB hard disk uses ORACLE software to manage and control index data. SCO-UNIX provides processing power and multi-user support.
Index Structures: Each land patent is fully indexed in 35 distinct fields. In cooperation with the National Archives (NARA), information missing from the original records due to physical deterioration is recovered using other NARA holdings. The BLM claims an indexing accuracy rate of 99.5%. Machine assisted indexing reduces keystrokes by automatically completing certain pre-defined fields (e.g., volume number, accession number, document number and state code). Key index fields include patentee name, warrantee name, and legal land descriptions.
OPTICAL DIGITAL DATA DISK STORAGE:
The scanned pages are committed to optical digital data disk only when accepted by the indexing specialists, under the control of proprietary system software.
Image File Headers: The LaserData proprietary structure is not compatible with the Tagged Image File Format (TIFF).
Error Detection/Correction: Operation is transparent to the user. No disk failures encountered.
Recording Process: 12-inch, WORM, double density (Sony WDD-600).
Optical Digital Data Disk Composition: Polycarbonate substrate disk material.
Capacity: 3.2 gigabytes per side (6.55 GB total storage per disk).
Number of Optical Digital Data Disks in Use: 40
Jukebox: 2 Sony 50-Platter DSDD 2-Drive Optical Disk Autochangers WDA-610 "Jukeboxes" (50 disk capacity).
Storage Environment: Typical computer room environment with supplemental heating/ventilation/air conditioning (HVAC) maintains constant temperature (74 degrees) and relative humidity (55-60%). No system operational problems due to the storage environment were noted.
RETRIEVAL AND OUTPUT:
The GLOARS images are searched and retrieved using the key entered index data (e.g. land patent descriptive data and patentee names). System users having access to the SONY jukeboxes via the LAN view the images on high resolution display monitors or print hard copies using laser printers.
Primary System Users: Clerks/Administrative Staff/Public
Display Output: The windows-like environment simultaneously displays index and image data. Images are displayed on nineteen-inch high resolution monitors on PC-based workstations.
Laser Printing: Hewlett-Packard IIID for images and Hewlett-Packard II for hard copy text reports.
DATA MIGRATION POLICY ISSUES:
Providing access to data beyond the confines of the immediate system, and long-term data retention extending beyond the expected life of the existing system should be design goals when the value of the information warrants such concerns. Data migration strategies should be viewed as a continuum, beginning with the universal capability of systems to display images or print them on paper.
Linkages with Other BLM ADP Applications: The imaging system has no direct linkages with other BLM information systems. An overall BLM agency ADP modernization effort is underway, with a goal of achieving inter-system compatibility. (However, all efforts have been made to use pre-existing BLM information system casetype authority codes and other codes where possible.)
Network Transmission: Users can access the data using a remote PC with 9600 baud modem capability, Kermit communications software, and BLM communications software with a BLM query session charge applied for searching the document attribute database (index data). This access system responds to remote requests for FAXes of document images and query search results while supporting a BLM cost recovery accounting system. The costs of the initial records conversion process will not be recouped through user fees. The Records Administrator/System Administrator expects that cost recovery fees received from users in the future will fund system access and maintenance. CD-ROM distribution of the attribute data base has been implemented.
Backup of Image and Index Data: A mirror-image backup copy of the optical media data is created as image and index data is completed for each state. Backup optical digital data disks are now stored on-site in a climate-controlled vault, although BLM expects to send them to the National Archives in the future. Index data is backed up regularly onto magnetic tape and floppy disks.
Technical Support and Documentation: Technical support is currently provided by the integrator under terms of the development contract. LaserData provided technical and administrative manuals as a contract deliverable. Detailed documentation describing the proprietary compression algorithms remains as exclusive LaserData property.
Interoperability: LaserData has a proprietary approach to writing image data to the Sony optical digital data disks, meaning the disks can only be read by an identical Sony optical drive with compatible LaserData software.
Migration Plans: As part of its current mission statement, BLM has a long-term commitment to make image and index data available to the general public. Current planning for migration to future technologies largely consists of assuring that the system under development functions as specified and that technical and administrative documentation is adequate for ongoing maintenance and periodic equipment upgrades. Concerns over legal admissibility of optical digital data disk images have prompted the development of detailed management and operator procedural manuals.
OVERVIEW OF SIGNIFICANT ISSUES:
Business Process Re-engineering: Performed to obtain greater benefits from the new imaging system: Yes
The mission of the Eastern States is to provide prompt, professional, and courteous service to all customers. The Legal Clerks in the Branch of Records and Public Service research the GLO records in response to walk-in and written requests for information. A recent study determined that more than 10,000 requests are received each year. Using the automated records system to conduct the research will result in more timely request processing and the processing of more requests. Patent queries are no longer restricted to land description information. For the first time in history searches can be conducted by patentee name, making the records accessible to a greater number of clientele. A search that could take hours is being reduced to minutes. This is a savings to the Bureau and the customer.
Historical Value of Records: The BLM system contains digital images of important, permanently valuable historical records. These records support one of the BLM's central missions, and have value to a wide variety of outside users. Image enhancement of the sometimes badly deteriorated records increases their legibility, and laser hard-copy output is of sufficient quality for most users. The system's potential to become the foundation of a nationwide land patent records databank increases its value to both the BLM and to the National Archives. The BLM system is one possible model for a production system that could support the conversion, preservation, and use of archival materials currently held by NARA and other Federal agencies.
Access: The BLM system enhances access to land patent records by providing a more powerful retrieval system. The previous manual storage system permitted retrieval through a single access point, namely the legal land description. The searching capabilities of the computerized index database, when combined with simultaneous display of index and image data, provides significant new opportunities for users. This includes rapid retrieval, easy comparison of adjacent land tracts, and statistical analysis of land transfer trends.
Information Technology Standards: BLM may face obstacles in future data migrations due to the absence of industry standards for the physical or logical formats of 12-inch optical media. These problems are likely to be exacerbated by the continued use of proprietary image compression algorithms and header file formats. The BLM plans to move to a non-proprietary imaging component as soon as standards are adopted.
Role of the Integrator: BLM administrators are pleased with the third-party integrator's performance due to: a proven track record in developing systems with similar functions; intensive interviews with BLM staff in the early design stages to ensure the system met BLM needs; and, a willingness to closely inspect physical records and processing procedures. The contractor's design team posed many questions that led to a re-thinking of the fundamental assumptions underlying traditional access procedures and customer services. BLM staff were especially pleased with the quality and readability of the system design documents.
Cost Recovery: BLM has implemented an automated accounting, fee-based cost recovery public access system. The basic concept includes: an on-line tutorial describing user search techniques; the ability to order FAX copies or print copies by mail; and, acceptance of a credit card as payment for services. BLM Records Administrator/System Administrator expects that the fees will recover the costs of systems access and maintenance.
SITE VISIT REPORT #4
AGENCY: Commodity Futures Trading Commission
SYSTEM: Document Management System
CONTACT: Hunton G. Oliver, Office of Information Resources Management, Commodity Futures Trading Commission, Washington, DC
SUMMARY DESCRIPTION:
The Commodities Futures Trading Commission (CFTC) staff needs direct access to up-to-the-minute market information to effectively fulfill their mission as commodity trade regulators. To achieve this goal, a Document Management System was installed in 1992 that effectively integrates several agency applications. This system provides on-line access to daily newspaper clippings and related financial wire service reports, replacing labor intensive photocopy distribution of pertinent commodities industry data. Other CFTC agency applications, including legal dockets files and correspondence, are also digitally scanned, indexed, stored, and retrieved in the Document Management System. Imaging technology may eventually assume an even greater role in processing the CFTC's document-based information.
The CFTC's imaging system is more than a pilot or test system with reference capabilities, it is a fully functional production operation. Special features include optical character recognition technology to automatically capture textual information. Additionally, the system software provides users with full text retrieval capability. This imaging system was obtained through an 8A contract, and integrated as a new application into the agency's existing local area network. The Document Management System uses conventional and multifunction optical digital data disk equipment to store digital image data on write once, read many (WORM) and rewritable optical media.
The CFTC's optical imaging system is important to this study because of: the successful implementation of imaging technology into an agency's daily operations; the integration of WORM and rewritable optical digital data disk technologies; and, the imaging system's user interface flexibility to support several unrelated applications.
BACKGROUND:
The Commodity Futures Trading Commission promotes healthy economic growth, protects the rights of customers, and ensures fairness and integrity in the marketplace through regulation of futures trading. To this end, it also engages in the analysis of economic issues affected by or affecting futures trading. The Commodity Futures Trading Commission, the Federal regulatory agency for futures trading, was established by the Commodity Futures Trading Commission Act of 1974. The Commission began operation in April 1975, and its authority to regulate futures trading was renewed by Congress in 1978, 1982, and 1986. The Commission consists of five Commissioners appointed by the President with the advice and consent of the Senate. The Commission has five major operating components: the divisions of enforcement, economic analysis, trading and markets, and the offices of the executive director and the general counsel.
The Commission regulates trading on the 13 US futures exchanges, which offer active futures and options contracts. It also regulates the activities of numerous commodity exchange members, public brokerage houses, Commission-registered futures industry salespeople and associated persons, commodity trading advisors, and commodity pool operators. Some off-exchange transaction involving instruments similar in nature to futures contracts fall under Commission jurisdiction. The Commission's regulatory and enforcement efforts are designed to ensure that the futures trading process is fair and that it protects both the rights of customers and the financial integrity of the marketplace. It approves the rules under which an exchange proposes to operate, and monitors exchange enforcement of those rules. It reviews the terms of proposed futures contracts, and registers companies and individuals who handle customer funds or give trading advice. The Commission also protects the public by enforcing rules that require that customer funds be kept in bank accounts separate from accounts maintained by firms for their own use, and that such customer accounts be marked to present market value at the close of trading each day.
Futures contracts for agricultural commodities were traded in the United States for more than 100 years before futures trading was diversified to include trading in contracts for precious metals, raw materials, foreign currencies, commercial interest rates, and US Government and mortgage securities. Contract diversification has grown in exchange trading in both traditional and newer commodities. Large regional offices are maintained in Chicago, IL, and New York, NY where many of the Nation's futures exchanges are located. Smaller regional offices are located in Kansas City, MO, and Los Angeles, CA. A suboffice of the Kansas City regional office is located in Minneapolis, MN.
Origins: The CFTC's management decision to obtain the Document Management System was finalized in June 1991. The imaging system was installed in February 1992, through an 8A procurement with Westco Automated Systems and Sales, Inc. This system was originally designed to support several agency imaging applications, including the distribution of the agency's daily newsclips files and managing the legal docket case records. The system's primary application is digital scanning and dissemination of daily newsclips, previously distributed to staff in a hard copy "read file" format. These newsclips contain pertinent commodities information published in daily newspapers and on-line financial wire services. The Document Management System can also capture, index, store, and retrieve the agency's legal dockets and published Commodity Exchange rules. Researcher access to the imaging system's database is currently limited to CFTC staff (approximately 500 employees). Although public access is not possible at this time due to concerns over data security, a CFTC public access bulletin board is under consideration.
SYSTEM CONFIGURATION:
Date system installed: 1992
The CFTC system operates as a series of interconnected servers using an existing Banyan Vines network. The Scan Station captures raster images of the original documents and verifies image quality. An OCR Server converts the bit mapped images to ASCII text files. The Network Server controls user network access to the document management system. The Image Database Server maintains the physical addresses of the scanned image records. The Optical Server is the permanent storage facility for the scanned digital images. The Retrieval Stations support user access to the full text and image data. The Print Server provides hard copies of requested images received via the network.
Scan Station: Fujitsu 3096 11 x 17-Inch Document Scanner with Auto Feeder; Everex 386/25 MHz Computer; 4MB RAM; 300MB Disk; 19- inch Cornerstone High Resolution Monitor; Xionics Compression/Decompression Board.
OCR Server: Calera MM600 Optical Character Recognition System; Everex 386/25 MHz Computer; 4MB RAM; 100MB Disk.
Network Server: 386/20 MHz Banyan File Server; 80MB & 300MB Disks.
Image Database Server: Everex 386/33 MHz; 4MB RAM; 600MB Disk.
Object Server: LMSI LF 4500 Auto Changer; 28GB WORM Capacity; Everex 486/33 MHz Computer; 8MB RAM; Two Each 600MB Magnetic Hard Disks.
Retrieval Station: Everex 386/25 MHz Computer; 4MB RAM; 100MB Disk; 19- Inch Cornerstone High Resolution Monitor.
Print Server: HP LaserJet III Laser Printer with Video Control; Everex 386/25 MHz Computer; 4MB RAM; 100MB Disk; Xionics Compression/Decompression Board.
The CFTC's Document Management System utilizes Advanced Information Management Systems Plus software for PC-based applications. CFTC expects to implement a systems-wide Windows environment, and several existing workstations are Windows- equipped locally. This conversion will best be accomplished with an agency-wide upgrade to 486/50 DX-2 workstations. The existing imaging system uses 10Base2 and 10Base 5 Ethernet communications.
System Installed by: Intrafed Corporation
System Configuration Changed Since Installation? No
DIGITAL IMAGE CAPTURE:
Scan Station: This station is the system's input device for scanning documents and converting the images into TIFF format bitmap files. The digital files may be directed to spool disk on the object server, or stored locally and processed in batch mode.
Document Preparation: Newsclippings, wire service reports, and agency legal documents require manual preparation prior to scanning. National daily newspapers are perused each day by CFTC staff, and a clippings file is created that contains financial articles of interest. The newsclips are photocopied prior to scanning to improve automated feeding. Wire service financial reports, and CFTC legal documents may also require preparation for scanning.
Image Capture: The desktop Fujitsu 3096 document scanner offers an auto feeder and two-sided scanning. The scanner is controlled by a 386/25 PC with high resolution display and Xionics Corporation image compression hardware.
Disposition of Original Records: The newsclippings are not considered as permanent records for transfer to the National Archives. The CFTC agency's legal staff use the imaging system for documents to be retained for long term.
Scanning Resolution: 300 dpi.
Quality Control: The scanned images are inspected by the conversion operator to verify legibility.
Color and Gray Scale: No color or gray scale images are captured.
Image Enhancement: An image enhancement board was installed in the Fujitsu scanner to improve image quality.
Compression/Decompression: Group 4 compression. Software compression provides digital image files of 50KB to 100KB per newsclip image. Image decompression at the user workstations is under software control.
DOCUMENT INDEXING:
OCR Server: The OCR server converts TIFF bitmap images to ASCII text. The TIFF image files are retrieved from spool disk on the object server. ASCII text files are directed to the network server for indexing by Personal Library Software. A 386/25 PC is the workstation controller for the Calera OCR system. The documents are scanned, OCR processed, and indexed during the data capture operation. Tagged index items include the newspaper headlines, originating news source, and important page topics (approximately 50 categories) are entered as database keywords, later serving as user searchable fields. The Calera OCR processor is able to accurately decipher a significant portion of the newspaper clipping's text, despite difficulty with font characteristics and photocopy qualities. The CFTC input staff performs no manual cleanup or corrections of the OCR files. The converted ASCII text data permanently resides on magnetic hard disks. Documents from other CFTC applications (i.e. legal dockets) are indexed by other criteria, and retained in WordPerfect format.
Index Database Server: A 386/33 MHz PC with 600MB hard disk maintains physical addresses of the image records located on the object server. The database maintains image folder and page information. The GUPTA Corporation structured query language (SQL) database permits manual indexing of scanned documents. The fully scaleable database system also provides security access control over the indexed information.
Creation of Index Database: Index data is captured by OCR technology from the scanned newsclip images. Manual tagging is also performed for headlines, source, and topics of interest. The separate legal documents "Proceedings Court System" uses the document number, complainant, respondent, and document type.
Location of Index Database: Index data resides on magnetic storage disks on a dedicated database server.
Index Structures: Full text search available through Personal Librarian software. OCR index data errors as captured are not "cleaned up".
DATA COMMUNICATIONS:
Network Server: This 386/20 MHz BANYAN file server controls network access for the Document Management System. This server maintains the Personal Library Software, the text representation of images, and indices. This server also stores images for DOS retrieval.
Magnetic Image Cache: Sufficient magnetic cache storage for images is not available at the local workstations; rather, images are retrieved and displayed directly from the optical digital data disks. Although this limitation is not a major problem for users of the newsclip files, access to sizable CFTC legal case file dockets could be enhanced with sufficient local cache memory.
OPTICAL DIGITAL DATA DISK STORAGE:
Optical Server: This optical digital data disk subsystem is the permanent storage facility for the scanned digital images. The two 600 MB magnetic hard disks on the PC workstation store in-process digital images awaiting recording onto the LMSI optical digital data disks.
Image File Headers: Xionics Corporation version of tagged image file format (TIFF) header.
Error Detection/Correction: Supplied by manufacturer (operational specifics unknown).
Recording Process: Two different optical storage systems are used: 1) LMSI 12-inch WORM drive and media; 2) Micro Design International (MDI) 5.25-inch multifunction optical drive with WORM and rewritable capability.
Optical Digital Data Disk Composition: LMSI 12-inch disks are a two-sided glass substrate. The LMSI drives also accept Maxell optical media as a second source.
Capacity: LMSI 12-inch optical media = 5.6 GB each; Panasonic 5.25-inch rewritable optical media = 1 GB each; Panasonic 5.25-inch WORM optical media = 940 MB each.
Number of Optical Digital Data Disks in Use: Five LMSI optical digital data disks are currently used to store CFTC data.
Jukebox: LMSI LF4500 five-cartridge Auto Changer offers a total of 28 GB WORM capacity.
Storage Environment: Computer room with security controls.
RETRIEVAL AND OUTPUT:
Primary System Users: Agency Staff.
Retrieval Workstations: The retrieval subsystem responds to keyword searching, and displays the OCR text of the scanned news article. The 386/25 workstation provides full text index searching and image retrievals. Image retrievals are conducted by folder, by page, or by SQL field values.
Image Display: 19-Inch Cornerstone High Resolution Monitor.
Screen Formats: Newsclip images are displayed on left side of the user screen; OCR text is displayed on right side (with key words highlighted).
Print Server: A 386/25 functions as print server using a Xionics compression/decompression processing board. The system uses a HP LaserJet III Laser Printer to print from compressed image files received via the Document Management System network.
DATA MIGRATION POLICY ISSUES:
Linkages with Other Agency ADP Applications: The imaging system was integrated into an existing BANYAN network with DOS-based workstations.
Network Transmission: The Document Management System is sized to support 400 workstations accessed by 16 concurrent on-line users. CFTC expects to double this to 32 concurrent users.
Backup of Image and Index Data: Backups of the newsclip image files are performed on a routine schedule using 5.25-inch Panasonic WORM and rewritable optical media (multifunction optical drive).
Technical Support and Documentation: Ongoing system maintenance is performed by a contractor, Westco Automated Systems and Sales. Intrafed Corporation supplied the CFTC with complete manufacturer's documentation for each system component to the circuit board level. A system description document was also provided that details the overall integrated configuration.
Interoperability: Intrafed Corporation specified the systems architecture. The CFTC's imaging system uses standard DOS and network configurations, each workstation can operate independently.
Migration Plans: Although no specific upgrade plans exist now, the CFTC agency users (legal staff and others) are pleased with the imaging system and expect to continue using imaging/optical digital data disk technology.
OVERVIEW OF SIGNIFICANT ISSUES:
Business Process Re-Engineering: Were any existing agency procedures changed following the installation of the imaging system? no
User Access: CFTC management plans to double the number of retrieval workstations, eventually connecting Regional Office users. Telecommunications may become a bottleneck with the large image file sizes.
Records Management: The LMSI optical digital data disk autochanger's five disk capacity is adequate for storing several years of CFTC data (2 disks filled/year) and additional devices can be added as needed.
SITE VISIT REPORT #5
AGENCY: Department of the Army (Office of Chief of Staff)
SYSTEM: Captured Gulf War Document Exploitation (DOCEX) System
CONTACT: Major Perkins,
USA-DSMA,
The Pentagon,
Washington, DC
SUMMARY DESCRIPTION:
The Department of the Army developed a special purpose, stand-alone digital imaging system to scan, index, store on optical digital data disks, and retrieve images of captured Persian Gulf War documents. The Army Department's Decision Systems Management Agency (DSMA) specifically developed the Document Exploitation (DOCEX) system for recording the contents of Iraqi filing cabinets and other documentation captured by Allied Forces. Conversion scanning of the more than twelve million pages was performed in several locations including Kuwait City and the Saudi Arabian city of Dhahran. The operations staff encountered difficult environmental field conditions, including excessive heat and abrasive airborne dust particles. The scanned images and ASCII index data were initially written to magnetic hard disk, and then down loaded to digital audio tape (DAT). The DAT media was subsequently returned to the United States for data transfer onto 5.25-inch rewritable optical digital data disks at a Defense Intelligence Agency (DIA) computer system facility. Operating under an electronic filing cabinet concept, the image retrieval system uses the optically stored document images in continued support of document searching, retrievals, and language translations.
The DOCEX system is considered important for this study due to: the imaging system's conformance to existing information technology standards, where possible; the short time frames available to DSMA for developing and implementing complex systems; the use of DAT technology for interim data storage; and, the selection of rewritable optical media as the primary data storage technology.
BACKGROUND:
The DOCEX system was developed under the auspices of the Decision Systems Management Agency (DSMA). The DSMA reports to the Army's Office of Chief of Staff, and provides a full range of automation services upon request to the Army Staff, Secretariats, and the Army's Joint Staff. The DSMA is primarily a task oriented organization, geared for solving specific Army information system needs. The DSMA is also responsible for evaluating emerging information technologies and developing specialized prototype applications for possible use throughout the Department of the Army.
The DOCEX system contains digital images of over 12 million documents. The records were removed from offices, bunkers, storage depots, and other locations seized in January 1991 by the combined Allied troops as they advanced through Iraq and Kuwait. The conversion project was made more complex because most of the documents were non-English (Arabic--Farsi) language. The DOCEX system was initially designed with a multimedia capability. This concept included digitized audio and video recording of captured objects, and the surrounding physical spaces from which the objects were originally removed. Time constraints resulted in implementing a document-only scanning capability.
DOCEX supports the document translation efforts of the Defense Intelligence Agency's (DIA) analytical staff. A major DIA goal was to identify important captured intelligence documents and related materials for possible translation from the original Farsi to English. DSMA staff envisioned the DOCEX system to serve as a possible prototype or model for future systems, although no immediate secondary uses of the scanned documents or the imaging system were anticipated when the original system design criteria were developed. The Army's Litigation Center is developing an automated case file tracking and access system that will utilize similar optical digital data disk storage technology.
Origins: DOCEX is an excellent example of an imaging system successfully implemented in spite of extreme developmental time constraints. A Military Order to develop an imaging system for captured Gulf War documents was issued on the morning of January 26, 1991. Within 24 hours, the DSMA staff developed system specifications and briefed senior officers, who authorized procurement. The decision to utilize digital imaging technology was driven in part by the Army's previous unsatisfactory experience in 1989 with a microfiche-based system for documents captured in Panama during Operation Just Cause. For this project, the Army required the imaging of over six million Spanish and English- language documents. Due to several factors, however, only 10 percent of these documents were converted after six months. The intelligence value of captured documents is frequently based on the ability to immediately retrieve and analyze key information. The useful life of similar captured documents is often only two to three years. Once the initial analysis is completed, interest in the records themselves often declines dramatically. Records determined to have long term retention value may be digitally scanned and also microfilmed to obtain an archival backup copy.
DSMA staff received authorization up front to move quickly with full procurement of necessary DOCEX system components. This strategy was based on one of the Army's primary lessons learned from the Panama experience consisting of: "never go into a project with partial funding". The DOCEX system design criteria ensured image and index data portability through several ensuing formats: data down loaded from magnetic hard disks to DAT tape, followed by data transferred to rewritable optical digital data disks. The DSMA system designers sought commercial off-the-shelf (COTS) hardware and software to minimize specialized, time consuming system modifications and customized features. The high-pressure, rapid development conditions resulted in equipment that met two requirements: conformance to existing information technology standards; and, access to comprehensive vendor support. Total DOCEX system costs were approximately $500,000 in a systems integration contract with Intrafed Inc. of Washington, DC.
Date system installed: 1991
SYSTEM CONFIGURATION:
- Document Scanners--Two (2) Kodak Imagelink 900D scanners equipped with:
8MB scanner memory, automatic document numbering system, OCR
capability, and bar code recognition.
- Imaging Platform--IBM Corp. PS/2 486 class, Model 95, IBM database
management software, OS/2 driver with 16MB memory, and an 8514 XGA
monitor.
- Storage Media--Magnetic hard disks for in-process data; 1.3 gigabyte capacity
digital audio tapes (DAT) (total of 18 tapes); 5.25-inch rewritable optical
digital data disks for image storage.
- Document Retrieval--Index: Knowledge-Based Management System software
from AI Corp. Images: Optical digital data disks stored in an imaging system
running OS/2 from Imara Research Corp.
DIGITAL IMAGE CAPTURE:
The initial digital image conversion of approximately 12 million pages was performed in Kuwait City and Dhahran, Saudi Arabia. Conversion equipment included two Kodak document scanners linked to IBM PS/2 microcomputers and database software. DOCEX systems administrators were particularly pleased with the consistent high quality performance of the scanning equipment and the large memory capacity (16MB) of the IBM OS/2 drivers. This was achieved even with the persistent desert dust and other unavoidable harsh environmental conditions.
Document Scanning: Document conversion and initial manual keyword indexing took one month to complete. During that time, two daily work shifts containing nine production people each prepared the documents for scanning, followed by four teams of two people each who operated the scanners. Ten translators completed the index function by referring to printed key word listings in various languages to assign documents to specific categories.
Quality Control: Conversion staff used display screens to evaluate image quality, adjusting equipment as needed to meet image quality guidelines. DOCEX system administrators highly recommend test targets for calibrating equipment and maintaining consistent image legibility. This was qualified with the following statement reflecting the reality of high-pressure production: "A test target image is almost always better than a piece of paper with a footprint across it."
Scanning Resolution: 200 dots per inch, selected as a compromise between achieving maximum scanning throughput rates and acceptable image legibility.
Color and Gray Scale: Although the document scanners offered color and gray scale capabilities, maximum throughput speeds were achieved in the binary mode (black & white). Due to the scanner's CCD color sensitivity and color drop out, red inks on the original documents were difficult to capture.
Image Enhancement: The Kodak scanner's standard contrast controls provided acceptable image quality, and no special add-on image enhancement technology needed. Other DSMA imaging applications have employed image enhancement technology successfully.
Compression/Decompression: The DOCEX imaging system's compression algorithms conform to CCITT Group 4 standards. DSMA prefers hardware assisted compression and RAM cache memory to achieve high speed image display.
DOCUMENT INDEXING
Conversion staff translators perused each document and annotated work sheets with keyword information, such as the date of capture, location where the document was found, and type of document. The completed worksheets were scanned along with each document folder, subsequently used to create the index database. The indexing operators consulted key word listings pre-printed in various languages. DOCEX system administrators note that due to it's labor intensive, time consuming nature "indexing is the dark side of imaging." They recommend creation of index batches and automated indexing capability.
Creation of Index Database: Basic index information on document content and structure was compiled in Kuwait and Saudi Arabia using the scanned worksheets. Additional indexing of selected images was performed to enhance user access upon return of the system to the United States.
Location of Index Database: The DOCEX images and index data are not stored together on the optical media. Rather, the index database is stored on magnetic hard disks for improved retrieval speeds and ease of update.
OPTICAL DIGITAL DATA DISK STORAGE:
During conversion, magnetic hard disks stored all scanned image and key entered index data. The data was then copied to digital audio tape (DAT) prior to shipment to Washington, DC. The DIA computer center transferred the data from DAT to rewritable optical digital data disks. The rewritable optical media serviced the reference needs of Defense Department intelligence staff.
Image File Headers: The DOCEX system image file headers adhere to the tagged image file format (TIFF). A system administrator noted that the TIFF convention is "thin but effective." Although the IBM supplied processing software, MODCA:IOCA (mixed object document content architecture:image object content architecture) is proprietary, detailed documentation is available.
Error Detection/Correction: DOCEX system administrators experimented with the small computer system interface (SCSI) firmware to evaluate the system's error reporting capability. No optical digital data disk failures or non-retrievable images were experienced. DOCEX managers encountered fewer problems with the optical digital data disk subsystem than with other system components, notably basic computer hardware settings (e.g., dip switches) and incompatible computer software.
Recording Process: 5.25-inch rewritable magneto-optical media.
Optical Digital Data Disk Composition: Polycarbonate substrate, dual-sided media manufactured by Phillips DuPont Optical (PDO).
Capacity: 940 megabytes of data storage per optical digital data disk.
Jukebox: Procurement difficulties and contracting delays ruled out acquisition of an optical disk jukebox. As an alternative, a multi-drive, direct access optical storage device (DASD) in a tower configuration was acquired.
Storage Environment: Document scanning was performed under field conditions, while the retrieval system operates in a normal office environment. A DSMA goal is to eventually install an enterprise-wide system located in a raised floor, computer room environment.
RETRIEVAL AND OUTPUT:
DOCEX users search index data and perform image retrievals assisted by AI Corporation's Knowledge-Based Management System.
Search Techniques: The DOCEX system uses a hierarchical searching scheme supported by folder application software. Users adopt a card catalog or broad subject approach for initial search entry, later refined to a narrower scope as reference needs dictate.
Index Structures: The indexing database software included special features: automatic document numbering; acceptance of bar coded data; and, a "patch coding" technique that preserved the structure of complex, multi-page documents. DOCEX system end users requested that a broad subject thesaurus be included that could be refined at a later date. Information concerning this thesaurus, especially the document form and genre terms, is unavailable. Because inter-indexing inconsistencies were problematic, system designers added an intelligent interface (Knowledge-Based Management System) to increase retrieval effectiveness.
Image Display: Nineteen-inch high resolution monitors provide dual display capability, and laser printers provide hard copy prints on demand.
Primary System Users: US Army Intelligence Staff.
DATA MIGRATION POLICY ISSUES:
Open Systems: Open systems architecture is an on-going DSMA strategic design goal. For the present time, however, efforts to achieve an Army enterprise-wide optical systems development approach are on hold. This is due in part to the often unavoidable bureaucratic complexities involved in implementing inter-organizational information systems. Another contributing factor is that the DSMA typically does not "own" the installed computer system equipment or software, and is only responsible for providing design assistance.
Linkages with Other Agency ADP Applications: DIA analysts use Knowledge- Based Management System software to query the existing index database and retrieve images. The DOCEX system was designed for stand-alone operation, and there are no plans for linking it with any other Defense Department imaging or database applications.
Standards: The DOCEX system was based (to the extent feasible) on existing information technology standards. This reliance was not due to the specific functional requirements of the system per se, but because the DSMA views each system as one step closer to reaching a goal of a generic application model. DSMA expects that this model will eventually meet a variety of mission needs, while easily adopting emerging technology standards.
System Output: The DOCEX system utilized 19-inch diagonal image display screens. Priority was placed primarily on full page display capability, rather than display resolution. DOCEX system administrators consider the term "high resolution" display to be somewhat of a misnomer, given the current state of the screen display technology. Hard copy laser printing remains a significant factor in system developments. This is due to Defense Department staff preferences for paper copies rather than viewing electronic display screens. Persistent attempts of DSMA staff to reorient users away from paper records has had mixed success.
Network Transmission: Off-site information requests by FAX.
Backup of Image and Index Data: No ongoing backup procedures are in place. Currently, the optical digital data disks, and a backup copy on DAT, are the only existing copies. The status of the original paper records is unknown.
Technical Support and Documentation: Each system component has manufacturer supplied technical documentation, supplemented with quick reference sheets to aid in routine maintenance and troubleshooting. Documentation describing the DOCEX system and its capabilities was not complete due to the rapid system development response to the Gulf War situation. The DOCEX system administrator relied to a significant degree on the original equipment manufacturers to maintain system operations.
Interoperability: The system uses the Small Computer Systems Interface (SCSI-1) boards (synchronous transfer--off).
Migration Plans: Due to the short term intelligence value of the DOCEX records, no need exists to migrate the image and/or index data.
OVERVIEW OF SIGNIFICANT ISSUES:
Rapid System Development: Sufficient time is needed to complete critical project milestones such as equipment design, development, installation, calibration, staff training, and technical documentation. DSMA staff consider system development time as a rare luxury, given the short time frames they must operate under. A favorite staff expression is "given enough time, you can always succeed." Imaging systems with off-the-shelf, generic database software often have inherent limitations and limited flexibility. System developers may need to develop custom software, requiring additional time for procurement and integration. DSMA often uses "beta code" in its computer systems, believing that the payoff of system performance justifies any risks of software "bugs." They emphasize that for this approach to be successful, vendor support must be strong. The Army's rapid development model relies on constant input from end users. DSMA strongly believes that lessons learned from developing one system can and should be applied in developing the next one.
Information Technology Standards: DSMA staff are aware of the vicious circle of technology developments and the inherent limitations of off-the-shelf approaches, often forcing system developers to seek out custom integrated solutions. Under these conditions, vendors often develop obscure, proprietary approaches to meet the customer's unique performance requirements. In the defense agency arena in particular, a history of large system contracts under low-bid procurement rules has led to a de facto collection of incompatible systems. If the imaging industry is to reach its full potential in the Federal government domain, the rapid adoption of industry standards supporting data portability and inter-system compatibility is essential. The increased use of a distributed workstation architecture for enterprise-wide imaging, as opposed to reliance on mainframe-based central indexing, is another factor forcing the standards issue. DSMA is seeing increasing acceptance of imaging technology within the Department of the Army despite the lack of adequate industry standards.
New Technology: DSMA systems use "cutting edge" hardware and software as much as possible. The DSMA administrators emphasize that adopting this approach requires ongoing monitoring of industry and vendor trends, and there are advantages to using new technology rather than proven imaging "solutions". Adopting the latest developments increases the likelihood that such technology will conform to relevant existing standards and vendor support will be stronger.
Technology Trends: DSMA staff continually monitor the imaging industry for technology trends that may have an impact on future systems. They note a resurgence of demand for WORM optical digital data disks even as rewritable technology is increasing its market share. The large storage capacity of WORM media make them ideal for smaller imaging systems where all image data may fit on a single platter. DSMA analysts note that digital imaging technology appears to be following a development cycle similar to that of the database industry, experiencing increased compatibility and sophistication with maturity.
SITE VISIT REPORT #6
AGENCY: Department of the Army (PERMS)
SYSTEM: Personnel Electronic Records Management System (PERMS)
CONTACT: Ms. Gail Martin, PERMS Program Manager, Ft. Belvoir, VA
SUMMARY DESCRIPTION:
The United States of America's Armed Forces depend on rapid deployment of troops around the globe to fulfill their missions. The US Army recognizes the importance of accurate, up-to-date personnel records in meeting this objective. The Personnel Electronic Record Management System (PERMS) enhances the Army's ability to store and access tens of millions of document images, replacing a labor intensive records management system based on paper documents and updatable microfiche. An overall PERMS goal is to improve records management using new but proven commercially available technologies. The Army expects to receive other tangible benefits from PERMS such as the ability to respond faster during troop deployments, promotions, school assignments, and benefits processing.
The PERMS data storage architecture dedicates a specific area for document images from an individual soldier's personnel record on each twelve-inch write once, read many (WORM) optical digital data disk. This data recording strategy improves productivity and file integrity, requiring the retrieval of only a single optical digital data disk to access a soldier's complete personnel record. Unlike the existing microfiche system, PERMS can supply information to multiple users simultaneously, within twenty seconds of request. Output is available in hard copy, microfiche, or in digital form. PERMS sites are linked through the Defense Services Network (DSN), and future technology will eventually enable the acquisition and distribution of personnel data through this network.
PERMS is important to the study of digital imaging and optical media systems because of: the multiple site conversion effort underway to load the system with existing personnel information; the system's ability to support high volume scanning of paper documents and microfiche; the use of PERMS as a primary source for mission critical information; and the system's ability to output to paper, microfiche, or magnetic tape.
BACKGROUND:
An Official Military Personnel File (OMPF) is maintained for each soldier, and is useful to the Army in formulating, managing, and evaluating manpower and personnel policies, plans, and programs. The Army's soldier records identification system began in 1917. The Army's records repositories need to maintain the official files accurately, since they contain the official historical, performance, legal record of service, and other information pertaining to an individual both during and after active duty. This information is used by the Headquarters, Department of the Army (HQDA), to support military decisions such as selecting personnel for promotion, retention, schooling, and command assignments. The US Army, under Congressional Mandate, maintains an official personnel record on every soldier regardless of status (active, reserve, discharged, or retired). These records are retired to the National Archives when they are no longer useful to the Army. These records provide an appropriate physical condition, dependency status, military qualifications, civilian occupation skills, availability for service, and other such information as the service secretary concerned may prescribe. Such records are used to manage the troop strength and the careers/employment of the individual soldier. The alternatives for the PERMS project have centered around the physical media on which the records are maintained, such as: paper, microfiche, and optical media.
Prior to the early 1970's, the Army Official Military Personnel Files were a paper- based records management system. The paper file system experienced: extensive storage space needs; labor intensive file integrity and records management processes; only the original records existed (no backup); and, time consuming efforts were needed to service Selection Boards. The 1973 fire at the St. Louis records center consumed a majority of the Army's personnel records created between 1917 and 1959. Since no backup copies existed, this resulted in massive administrative problems in processing Army veterans benefits programs. A special Records Administration in Microform Mode (RAM2) task force studied the Army's existing records processes, analyzed the systems used by the Navy and Air Force military records systems, and evaluated the existing technological marketplace for alternative solutions.
Based on RAM2 task force recommendations, A.B. Dick System 200 updatable microfiche camera/processor systems were selected to convert the paper records to microfiche for storage in automated Access-M retrieval equipment. These micrographic systems still operate at the four personnel records management centers, although only 17 percent of the US Army Reserve Personnel Center (ARPERCEN) records were converted to microfiche. The microfiche Official Military Personnel File format is based on:- Performance Fiche (P-Fiche) used for evaluation and selection boards.
- Service Fiche (S-Fiche) used by career managers for general information.
- Restricted Fiche (R-Fiche) used to store historical information which may be improper for viewing by selection boards and career managers.
In 1983, the consulting firm Austin Associates studied the Army's overall personnel records operations, and made recommendations for improving the existing micrographics processes. The Austin Study, as it is often referred to, also contained recommendations pertaining to digital image technology. Subsequently, a pilot imaging project was tested at the US Army Enlisted Records Evaluation Center (EREC) in 1986, a mission needs statement was developed, and imaging system funding requirements were established at several personnel records centers. In 1986, the Secretary of the Army directed that records management problems at ARPERCEN be corrected, funding requirements were established, and program efforts were initiated to implement Austin Study recommendations. Approval to begin PERMS with ARPERCEN records was obtained in 1989, and transferred to PEO STAMIS for management and oversight. ARPERCEN was selected as the initial digital conversion site, to be followed by the remaining Army personnel records centers.
Origins: ARPERCEN is the designated custodian for Active Reserve member's and retired Army member's personnel records. The combined holdings of 2.7 million records consist of approximately 156 million paper documents and 175 million microfiche images. The holdings require an ever increasing expensive storage space, with an annual growth rate of more than 15 million documents. A filing backlog continually exists due to the record jackets being out of file, with an average ten day wait time to obtain a requested record. Almost 400,000 microfiche duplicates are produced annually, supporting functions such as selection boards, personnel transfers, and other purposes.
The Army's existing updatable microform system was recognized as being technologically outmoded, and was considered to be unresponsive in meeting today's dynamic personnel management needs. Paper records created during the soldier's entry process are converted directly to microfiche. Ensuing documents are stored as a temporary paper file for up to one year before the microfiche official record is updated, unless the soldier's record is scheduled for a HQDA Selection Board review. In this case, the documents are converted to microfiche as soon as possible. A study indicated that up to forty percent of the microfiche images are of poor legibility, and ten percent are unreadable. Contributing to this was th