National Archives Catalog on the AWS Registry of Open Data

The National Archives and Records Administration (NARA), in partnership with Amazon Web Services (AWS), published the National Archives Catalog dataset  to the AWS Registry of Open Data. This documentation guides users in how to access the data.

Table of Contents

About the Dataset

The National Archives Catalog dataset on the AWS Registry of Open Data - over 261 gigabytes of data - includes the archival descriptions and authority records from the National Archives Catalog (as of April 05, 2025), including the URLs for over 148 million digital objects and data from citizen archivist contributions. 

The National Archives Catalog is the primary public access site for NARA's archival records, including both archival descriptions of records and digital copies of records. Archival descriptions in the National Archives Catalog dataset are organized according to records groups and collections, and authority records are organized by authority type - organizations, people, topical subjects, geographic references, and specific records types. This dataset will be updated twice per year.

The AWS Registry of Open Data , started in 2008, is a service provided by AWS to store open, public datasets for free so that they can be accessed and analyzed on AWS.

Access Methods

Users can access the full National Archives Catalog dataset using the Amazon Resource Name (ARN), a method to uniquely identify resources on AWS so that users can locate the dataset.

Additionally, users can access both the full dataset and specific portions of the dataset using the AWS Command Line Interface (CLI) , an open source tool that enables users to interact with AWS services using commands in their command-line. Documentation for AWS CLI is available here .

Accessing the Full Dataset

The dataset can be downloaded as zip files at the following locations:

          https://nara-national-archives-catalog.s3.amazonaws.com/zip/nac_export_authorities_2025-04-08.zip (43 MB)
          https://nara-national-archives-catalog.s3.amazonaws.com/zip/nac_export_descriptions_2025-04-08.zip (87 GB)

The full dataset can be accessed with the following ARN:

          arn:aws:s3:::nara-national-archives-catalog

To list the full dataset using AWS CLI, use the following command:

          aws s3 ls s3://nara-national-archives-catalog/ --no-sign-request

To pull the full dataset using AWS CLI, use the following command:

          aws s3 sync s3://nara-national-archives-catalog/ [destination] --no-sign-request

Accessing Portions of the Dataset

To pull portions of the dataset, i.e. only the descriptions, only the authority records, or only data for record groups or collections, the following AWS CLI commands can be used:

Dataset Portion Description AWS CLI Command
Descriptions - All Data created to identify and represent archival records aws s3 sync s3://nara-national-archives-catalog/descriptions/ [destination] --no-sign-request
Descriptions - Record Groups
Data for a group of records that share the same provenance or were created in the same administrative unit, e.g. a federal agency.
aws s3 sync s3://nara-national-archives-catalog/descriptions/record-groups/ [destination] --no-sign-request
Descriptions - Collections
Records assembled by a person, organization, or repository often from a variety of sources, e.g. donated records.
aws s3 sync s3://nara-national-archives-catalog/descriptions/collections/ [destination] --no-sign-request
Authority Records - All
Entries about the preferred forms of terms, such as for organizations, people, topical subjects, geographic references, and specific records types.
aws s3 sync s3://nara-national-archives-catalog/authority-records/ [destination] --no-sign-request
Authority Records - Organizations
Corporate bodies that created, maintained, or are referenced in the archival materials.
aws s3 sync s3://nara-national-archives-catalog/authority-records/organizations/ [destination] --no-sign-request
Authority Records - People
Individuals who created, maintained, or are referenced in the archival materials.
aws s3 sync s3://nara-national-archives-catalog/authority-records/people/ [destination] --no-sign-request
Authority Records - Topical Subjects
Topics represented in the archival materials.
aws s3 sync s3://nara-national-archives-catalog/authority-records/topical-subjects/ [destination] --no-sign-request
Authority Records - Geographic References
Geographic areas represented in the archival materials.
aws s3 sync s3://nara-national-archives-catalog/authority-records/geographic-references/ [destination] --no-sign-request
Authority Records - Specific Records Types
Formats of the archival materials.
aws s3 sync s3://nara-national-archives-catalog/authority-records/specific-records-types/ [destination] --no-sign-request

Accessing Record Groups or Collections

To pull descriptions for specific record groups or collections, the following AWS CLI commands can be used.

 

Record Groups

For pulling data for record groups (sorted numerically by Record Group Number):

Collections

For pulling data for collections (sorted alphabetically by Collection Identifier):

Structure of JSON Files within Record Group and Collection Directories

Each record group or collection directory contains a sequence of JSON files within that represent the data in that record group/collection - all the archival descriptions including the record group/collection descriptions, series descriptions, file unit descriptions, and item descriptions. Each JSON file contains data for up to 10,000 descriptions, at which point a new file continues where the previous file left off. The parent/child relationship of series to file units/items is conveyed for each record through the parentSeries, parentFileUnit, etc. elements within the JSON. The files appear within the directories based on the example below:

.../descriptions/record-groups/rg_021/

          rg_021-0.json

          rg_021-1.json

          rg_021-2.json

Structure of JSON Files within Authority Record Directories

The authority record directories contain a sequence of JSON files within that represent the data for each respective authority record type. Each JSON file contains data for up to 10,000 authority records, at which point a new file continues where the previous file left off. The files appear within the directories based on the example below:

.../authority-records/organizations/

          organizations-0.json

          organizations-1.json

          organizations-2.json

 

Top