Appendix A: Tables of File Formats
Computer Aided Design (CAD)
Computer aided design (CAD) – CAD formats are vector graphics files that rely on mathematical expressions to create multi-dimensional computer graphics intended for use in engineering and manufacturing design. CAD programs can generate representations and animations of two and three-dimensional surface projections of objects.
Preferred Formats
Acceptable Formats
Digital Audio
The Digital audio category encompasses formats used to encode recorded sound as machine readable files by converting acoustic sound waves into digital signals. Digital audio formats are generally composed of both a wrapper format, usually the common name associated with the file extension, and an encoding method or codec.
General requirements for digital audio records:
- Digitize to standards appropriate for the accurate preservation of the original audio, when converting analog material (e.g., audio cassettes, record albums, and reel-to-reel audio tapes). Examples of appropriate methods and formats are available on NARA’s Digitization Services Products and Services page;
- Transfer digital audio at a minimum of 16 bits per sample, but 24 bits per sample is encouraged; and
- Transfer digital audio at a minimum sample rate of at least 44.1 KHz, but sampling at 96 KHz is encouraged.
Preferred Formats
Acceptable Formats
Digital Moving Images
Digital moving images consist of bitmap digital images or “frames” displayed in rapid succession at a constant rate, giving the appearance of movement. This category includes two subcategories: digital cinema which encompasses digitized film; and digital video (including both video digitized from analogue sources and born digital video).
General requirements for digital moving image records:
- Agencies must digitize to standards appropriate for accurate preservation of the original video and audio components, when converting analog material. Examples of appropriate methods and formats are available on NARA’s Digitization Services Products and Services page; and
- For reformatted video, 8-bit is acceptable but 10-bit is preferred.
Digital Cinema
Preferred Formats
Digital Video
Preferred Formats
Acceptable Formats
Born-Digital Photographs
Born-digital photographs are raster images of natural, real-world scenes or subjects captured by digital cameras.
Additional special requirements for digital photographs are described in 36 CFR 1237.28.
Preferred Formats
Acceptable Formats
Digitized Paper and Photographic Prints
Digitized paper and photographic prints are digital records created according to the requirements in 36 CFR 1236 E—Digitizing Permanent Federal Records by converting paper or other paper-based formats using reflective techniques to a digital form that is of sufficient authenticity, reliability, usability, and integrity to serve in place of the source record.
The format and compression codec requirements are in § 1236.48-File format requirements. Specifically, the regulation requires agencies to:
(a) Encode, retain, and transfer digitized records in one of the following file formats, either uncompressed or using one of the specified compression codecs in the tables below.
(1) Agencies that combine multiple uncompressed TIFF images into PDF/A files using JPEG2000 compression must perform the quality inspection step specified in 1236.46(d) against the resulting PDF/A files.
(2) When using JPEG 2000 visually lossless compression, agencies must determine the amount of compression to apply, not to exceed 20:1, by performing tests and visually evaluating for compression artifacts that obscure or alter the information content.
Optical Character Recognition
NARA understands that the ability to embed OCR'd text in PDF records enhances access to the records. NARA will accept PDF records that have been OCR'd using processes that do not substitute generated or modified content for the original bit-mapped image.
While it will accept PDF records with uncorrected OCR'd text, NARA will not accept PDF records resulting from OCR processes that alter the visible content, degrade the quality of the original bit-mapped image, or replace the original bit-mapped image with OCR'd text.
NARA will not accept digitized records in PDF that have been saved with lossy compression to reduce file size (e.g., JPEG, JBIG2). Such processes degrade the quality of the original image and may make such images unsuitable for archival preservation.
Applications use different terms for similar output results. Some examples of terms indicating prohibited outputs include:
- “Searchable Image - Compact”
- “Editable text”
- “Editable text and images”
- “PDF Normal”
- “Formatted Text & Graphics”
- “TruePage”
- JBIG1 or JBIG2
- The use of “lossy” compression
Terms indicating acceptable output include, but are not limited to:
- Searchable Image - Exact
- Searchable Image Text (original image)
- Searchable Image - Best Quality
Acceptable Formats for Digitized Permanent Paper Records
Requirements for digitized permanent photographic print records include the following:
- The agency must encode, retain, and transfer digitized photographic print records in one of the following file formats, either uncompressed or with one of the compression codecs specified in the table below.
- For a series of predominantly textual records with interspersed photographic prints, use the formats in the table for paper records. For a series of predominantly printed photographs, including those with paper records interspersed, use the file formats in the table below for photographic print records.
Acceptable Formats for Digitized Permanent Photographic Print Records
Born-Digital Posters
Born-Digital posters are posters created on a computer using graphics software. Posters are generally large in format and usually printed and displayed for advertising and publicizing purposes. Paper posters must be digitized according to the requirements in 36 CFR 1236 E.
Preferred Formats
Acceptable Formats
Geospatial Formats
Geospatial records include digital cartographic data files and aerial photography that are created and processed in Geographic Information Systems (GIS) or other software applications for spatial analysis.
Preferred Formats
Acceptable Formats
Acceptable for Imminent Transfer Formats
Presentation Formats
Presentation formats are used to convey graphical information to audiences in the form of a slide show. Presentation formats are not acceptable for use as transfer containers for permanent digital still images.
Preferred Formats
Acceptable Formats
Born-Digital Textual Data
The born-digital textual data category refers to two general content types: unformatted (plain text) or formatted. Unformatted plain text (defined in MIME as text/plain) contains basic character information and control or non-printing characters but lacks styling information. Formatted text files include all of the attributes of plain text files but have extended formatting capabilities, for “stylized” or “rich” text features including italics, bold, colors, hyper-linking, etc.
Agencies must identify the character encoding method used with each text file.
Preferred Formats
Acceptable Formats
PDF Collections
PDF collections, commonly known as PDF Portfolios or containers, are a special type of PDF file that allow you to embed additional files. Collections provide a useful way to package together case files. For example, documents related to court cases or other investigations or any other logical groupings of records in different file formats.
Embedded files are stored in their original formats, and are displayed in the PDF in a manner similar to that of file system explorers or zip containers. PDF collections are also conventional PDF files and can include pages of text, scanned text, forms, or any other features supported by the PDF format.
Files in any other format, including other PDF files, may be embedded in PDF collections although the software used to create or view them may prohibit embedding or extracting certain file types, such as executable files, for security reasons.
Unless directly supported by the PDF viewing software, an embedded file must be extracted from the PDF and opened by an application capable of rendering its file format.
As they facilitate distribution of potentially large and diverse sets of content, collections also elevate the risk that personally identifiable information (PII) or other sensitive information could inadvertently be made public if contained in an embedded file. To mitigate that risk the following requirements must be met for transfers of PDF collection files.
General Requirements:
- PDF collections and their embedded files must relate to one another as components of a record. For example, a PDF collection might contain the files that make up a medical case file or a court proceeding.
- Agencies may include any number of email or text messages that are components of a case file as embedded files. However, PDF collections must not be used to transfer aggregations of email records under a Capstone schedule. Acceptable formats for use when transferring email records are listed in the Email section of this table.
- All PDF files including both the PDF collection itself and any embedded PDF file must have all fonts, including the base 14 fonts, embedded within them. This requirement is met by embedding subsets of all fonts used in the document, or by saving as a PDF/A-3 or PDF/A-4f file.
- Agencies must deactivate any security settings (for example, self-sign security, user passwords, and/or permissions) on the PDF collection that prevent NARA from opening, viewing, or printing the record.
- Agencies must deactivate any security settings (for example, encryption, passwords, and/or permissions) on any embedded files that prevent NARA from opening, viewing, or printing the embedded file.
- Embedded files must be in a format identified, on this page, in NARA Bulletin 2014-04 Appendix A, as being either preferred or acceptable for use for their record type.
- All data must be plainly viewable and not hidden. Many formats have the ability to hide sections of data, for example, a hidden column in a Microsoft Excel spreadsheet.
- Embedded files in PDF Collections must be visible. For example, pages in PDF Collections should not contain hidden files such as file attachment annotations.
- PDF collections must not use folders to store or organize embedded files.
- Embedded files should not contain additional embedded files. PDF collections should only contain a total of two hierarchical levels, the PDF collection itself, and any embedded files.
PDF Collection Index Metadata Fields for embedded files:
- In addition to the CSV file required by NARA Bulletin 2015-04, PDF collections must be accompanied by an external index listing all embedded files. The index must be either in a Microsoft Excel or CSV file. The index file should also be contained within the collection.
- Indexes must have the same name as the PDF collection files they describe. Example: Case-File-001.pdf and Case-File-001.csv.
Index of Required Metadata Fields:
Metadata Element | Label | Definition | Usage |
---|---|---|---|
File Name | Identifier:FileName | The complete name of the embedded file including its file extension. File names should be unique and indicate the type of content the file contains. For example: Deposition-case-file-001.docx. [Category-case file number-sequential number.file_extension] | Mandatory |
Message Digest | MessageDigest | The MD5 Checksum value of the embedded file | Mandatory |
Title | Title | The descriptive name given to the embedded file. For example: Docket, Notes, Correspondence, Evidence | Optional. Mandatory if available |
Creation Date | CreationDate | The date the embedded file was originally created | Mandatory |
Modified Date | ModifiedDate | The date the embedded file was last changed | Mandatory |
Access Restrictions | AccessRestrictions | Any PII, CUI, or National Security Restrictions that apply to the embedded file. If no restrictions apply label: No Access Restrictions | Mandatory |
Preferred Formats
Acceptable Formats
Structured Data Formats
Structured data comprises the broad category of data that is stored in defined fields. Categories for structured data are as follows:
- Database formats are organized collections of associated data that conform to a logical structure. Database formats are determined by “data models” that describe specific data structures used to model an application and generally include navigational, relational, and hybrid models;
- Spreadsheets are tables made up of columns and rows and which contain cells of data. Relationships between cells can be pre-defined as mathematical formulas;
- Statistical data is the result of quantitative research and analysis. Statistical data formats contain collections of data presented in both tabular and non-tabular form; and
- Scientific data refers to research data collected by instrumentation tools during the scientific process. Scientific data formats are either domain specific within a single field of study, or are multi-domain formats used for transfer of scientific data between domains.
General requirements for structured data include the following:
- Agencies must transfer structured data that is both well-formed according to the syntactical conventions of the format, and valid according to the structural rules defined in any associated schemas or document type definitions (DTDs);
- Value Separated Files, e.g. CSV or comma separated value files, may use a character other than the comma. The pipe or caret are recommended delimiters because they are not commonly found in free text fields. Alternatively, text files encoded with ASCII characters and where each field is a fixed width, is also an acceptable transfer format for use with structured data, even though ASCII is technically a data encoding type. ASCII text files must be accompanied by complete documentation of the record lengths and field widths;
- Data files and databases shall be transferred as flat files or as rectangular tables, that is, as two-dimensional arrays, lists or tables. All records in a database, or rows (tuples) in a relational database, should have the same logical format. Each data element within a record should contain only one data value. A record should not contain nested repeating groups of data items; and
- Structured data must be transferred together with any associated files necessary to verify the validity of the data, e.g., DTDs, schemas, and data dictionaries.
Preferred Formats
Acceptable Formats
Acceptable for Imminent Transfer Formats
Email is defined as discrete electronic communications transmitted over the Simple Mail Transfer Protocol (SMTP), between two or more people or entities, in compliance with applicable IETF’s Request for Comments (RFC) specifications. Email does not include other functions commonly available via email programs such as calendars, tasks, appointments, newsgroups, or instant messaging. In order for information in a calendar, contact list, address book etc. to be transferred to NARA, it must be scheduled as a separate item.
Please note that NARA considers email attachments to be a component of the email record and does not require that unseparated email attachments meet the transfer standards specified by the format category under which the attachment alone would fall.
General requirements for email:
- Transfers of email records must consist of an identifiable, organized body of records (not necessarily a traditional series);
- Email messages should include delimiters that indicate the beginning and end of each message and the beginning and end of each attachment, if any. Each attachment must be differentiated from the body of the message, and uniquely identified;
- Email messages transferred as XML files must be accompanied by any associated document type definitions (dtds), schemas, and/or data dictionaries;
- Labels to identify each part of the message (Date, To [all recipients, including cc: and bc: copies], From, Subject, Body, and Attachment) including transmission and receipt information (Time Sent, Time Opened, Message Size, File Name, and similar information, if available). To ensure identification of the sender and addressee(s), agencies that use an email system that identifies users by codes or nicknames, or identifies addressees only by the name of a distribution list should include information with the transfer-level documentation; and
- Email converted to formats not natively used by the email program, and which do not maintain header information (such as RTF or Word documents) are not accepted. Printouts of emails are also not accepted under this Bulletin.
Preferred Formats for Individual Messages
Acceptable Formats for Individual Messages
Preferred Formats for Aggregations of Email
Web Records
Web records consist of web sites and social media sites created and maintained to provide information and services of the United States Government via the World Wide Web. This Bulletin applies to web records managed by an agency that have been appraised and scheduled for permanent retention by NARA. Agencies should harvest websites using a utility that will package component files in a manner that meets the following general requirements.
General requirements for web content records:
- Web records must be accessible via Hypertext Transfer Protocol (HTTP) from a server to a client browser when a URL has been activated;
- Web content records that share a domain name including content managed under formal agreement and residing on another site must be transferred together;
- All component parts of web content records that have been appraised as permanent including image, audio, video and all other proprietary formats, must be transferred in a manner that maintains all of the original links, functionality and data integrity;
- Dynamic content such as calendars or databases either must be transferred in an acceptable format, or be made accessible as static content;
- All internally referenced URLs must be included with the transfer set; and
- All control information from the harvesting protocol must be maintained.
The following will not be accepted for transfer under this Bulletin:
- Program or administrative records documenting the management of web sites;
- Externally referenced content (e.g., accessed via hyperlink) that resides in a different domain and is not managed for an agency under a formal agreement;
- Static images, (such as screen shots), of web content records, because they do not retain hypertext functionality.
Preferred Formats
Acceptable Formats
Calendars
Electronic calendars allow users or groups to create events that can be exchanged between applications or systems. Users can manage and view events across defined time periods, such as hours, days, weeks, months, and years. The format, iCalendar or iCal, contains a variety of functions: events, to-do lists, journal entries, time zone, availability designated as free or busy, and notifications. The format can be exchanged through a variety of methods including Simple Mail Transfer Protocol (SMTP), HyperText Transfer Protocol (HTTP), a file system, and protocols, such as memory-based clipboard or drag and drop functions.
Calendar events can include attachments in multiple file formats, such as documents or spreadsheets or other file formats not specified within this appendix. Attachments are not required to be transferred separately. As components of the calendar record, attachments are transferred within the calendar file.
General requirements for calendar files:
- Transfers of calendar records must consist of an identifiable, organized body of records (not necessarily a traditional series);
- Calendars must include labels to identify each component. Delimiters must indicate the beginning and end of each event and the beginning and end of each attachment, if any. Each attachment must be differentiated from the body of the calendar. Each attachment must be uniquely identified; and
- NARA will not accept transfers of calendars converted to PDF files.
Preferred Formats
Navigational Charts
Electronic navigational charts are tools used to provide safe navigation for all classes of vessels. Official charts, issued by a government, hydrographic office, or other government institution, comply with standards developed by the International Hydrographic Organization (IHO). These charts show detailed configurations of the seabed, characteristics of the coast, routes, and aids to navigation.
There are two types of electronic navigational charts—raster charts and vector charts:
- Electronic Navigational Charts (ENC) are vector data sets of all the objects (points, lines, and areas) represented on a chart in a digital database.
- Raster Navigational Charts (RNC) are digitized raster copies of official paper charts.
Preferred Formats
Seismic Data
Seismic data formats capture data traces of seismic responses at specific surface recording locations. Transfers of seismic data in the SEG-Y format should include all associated metadata and support documentation needed to comprehensively understand the record. Additionally, metadata documenting the process of creating the data (if available) should be included. Examples of associated files and support documentation that should be included with transfers include: Load Sheets, Observer’s Reports, Readme files, and Velocity Files.
Preferred Formats
Acceptable Formats
Updated October 2024