The National Archives Catalog

Contribution Type: Optical Character Recognition (OCR) Transcription

Mandatory Repeatable Edit Type Data Type Source Level Available Public Element
No No Read Only Variable Character Length (2 GB) AI/Machine Generated
NARA
NARA Partner
File Unit
Item
Item AV
Digital Object
Yes
Index Only

 

Definition:

Optical character recognition (OCR) transcription text of a typewritten or handwritten document is a transcription that is machine generated and produced by NARA or a NARA partner. OCR tools transcribe the words as they are written or typed in or on the document. OCR text is not always accurate, but it increases access and supports indexing and other search functionality.

 

Purpose: To allow users to view NARA or NARA partner generated OCR transcription of a document thereby enhancing the searchability and discoverability of digital objects in the Catalog.

 

Relationship: Transcription validates have an attribution type modifier of AI / Machine Generated and additionally have a NARA Partner Name.

 

Guidance:

OCR transcriptions are generated for textual digital objects automatically during processing and preparation for publication online. The OCR transcription serves as the original source for the related OCR Transcription Validation contribution type.

 



Previous Element
Next Element
Table of Contents
Lifecycle Data Requirements Guide

Top