Federal Records Management

NARA Guidance on Managing Web Records

January 2005

Return to main page - NARA Guidance on Managing Web Records


MANAGING WEB RECORDS


Introduction

  1. What are trustworthy records?
    1.1 What are the characteristics of trustworthy records?
    1.2 How can I maintain a trustworthy web site?
  2. Risk and risk assessment
    2.1 What are the records management risks associated with web sites?
    2.2 How can I conduct a records management risk assessment?
    2.3 How do I determine the unit of analysis for doing a risk assessment?
    2.4 Who is involved in assessing risk?
    2.5 What do I do with the results of my risk assessment?
  3. Mitigating risk
    3.1 What steps should I follow to help mitigate risk in managing web records?
    3.2 How should an agency manage web site content in order to mitigate risk?
    3.3 How frequently should I capture a snapshot of my site's web content records?
    3.4 How do I track changes to web site content pages between snapshots?
    3.5 When preserving long-term web content records, how can I treat hyperlinks?
  4. Roles and responsibilities
    4.1 Who is responsible for managing web content records?
    4.2 Who is responsible for managing web management and operations records?

Introduction

This guide is intended to assist agency staff in managing their web records. It is particularly geared to the needs of program officials, who provide the information posted on web sites, and those staff who manage agency web sites, including webmasters and IT staff.


1. What are trustworthy records?

Trustworthy records are essential for an agency to meet its legal and internal business needs. Reliability, authenticity, integrity, and usability are the characteristics used to describe trustworthy records from a records management perspective. How these terms apply to web sites and web records is discussed more fully in section 1.1.

Creating and maintaining trustworthy records requires resources. Agencies need to conduct a risk analysis to balance the level of trustworthiness of records against costs and risks. The level of resources used to ensure these characteristics depends on the agency's business needs and perception of risk. (See section 2 for a discussion of risk assessment.) Web site operations that are critical to agency business need a greater assurance level that they are reliable and authentic, maintain integrity, and are usable over a longer period of time than less critical operations.1


1.1 What are the characteristics of trustworthy records?

Reliability. A reliable web site is one whose content can be trusted as a full and accurate representation of the transactions, activities, or facts to which it attests and therefore can be depended upon in the course of subsequent transactions or activities.

Authenticity. An authentic web site is one that is proven to be what it purports to be and to have been created by the agency with which it is identified.

Web site-related records should be created by individuals who have direct knowledge of the facts or by instruments routinely used within the business to conduct the transaction.

To demonstrate the authenticity of a web site, agencies should implement and document policies and procedures that control the creation, transmission, receipt, and maintenance of web site records to ensure that records creators are authorized and identified and that records are protected against unauthorized addition, deletion, and alteration (e.g., via hacking).

Integrity. The integrity of a web content record refers to it being complete and unaltered.

The agency's web management policies and procedures for routinely updating and modifying their web sites help ensure integrity. As stated in the ISO Technical Report 15489-22, sec. 7.2.4, "records systems should maintain audit trails or other elements sufficient to demonstrate that records were effectively protected from unauthorized alteration or destruction." The web management policies should prescribe how changes to the web site are to be documented.

Another aspect of integrity is the structural integrity of a web site's content-related records. The structure of a web site, that is, its physical and logical format and the relationships between the pages and content elements composing the site, should remain physically or logically intact. Failure to maintain the web site's structural integrity may impair its reliability and authenticity.

Usability. A usable web site is one that can be located, retrieved, presented, and interpreted. In retrieval and use, you should be able to directly connect the web site to the business activity or transaction that produced it. You should be able to identify both the site and its content within the context of broader business activities and functions. The links between content, contextual, and structural web site-related records that document agency web site activities should be maintained. These contextual linkages should provide an understanding of the transactions that created and used them.


1.2 How can I maintain a trustworthy web site?

For web site records to have integrity and remain reliable, authentic, and useable for as long as they are needed, you must maintain the content, context, and sometimes structure of the site. A trustworthy web site includes not only the content pages but also information about the web site that relates to the context in which it was created and used. Specific contextual information varies depending upon the business, legal, and regulatory requirements of the business activity. Structural information on the organization of the web site supports its long-term integrity.

Federal web site-related records that support content, context, and structure are:

Content: The actual HTML-encoded pages themselves and additional content files referenced therein or content created by end users interacting with the web site. Maintenance of these web content records is necessary to support all of the characteristics of trustworthiness: reliability, authenticity, integrity, and usability.

Context: Administrative and technical records necessary for or produced during the management of an agency web site. Maintenance of these records provides a context for web operations, which attests to the reliability, authenticity, and integrity of an agency's web site.

Structure: For those web sites (or portions) that have been appraised as permanent and for high-risk temporary sites, a site map indicating the arrangement of a web site's content pages and software configuration files of content management systems. Maintenance of this record provides a structure for content records and thereby enables the integrity and usability of both current and preserved versions of an agency web site.

Records in all of these categories contribute to the adequate documentation of agency web site operations. A risk assessment of web site operations advises which records are necessary to ensure the operation's trustworthiness, how the records should be maintained appropriately, and how long those records are to be retained (see section 3).


2. Risk and risk assessment

Typically, agencies conduct risk assessments in order to establish appropriate levels of management controls prior to undertaking new program initiatives. NARA assumes that such risk assessments have been conducted for development of agency web site operations. These risk assessments can also be used to establish records management controls.

Agency records management practices are based on operational needs and perceptions of risks. Operational needs (e.g., providing public information, documenting transactions with the public) determine the way agencies address the trustworthiness of web site operations (see section 1.1). Risk assessment and risk mitigation, along with other techniques, are used to establish both management controls for and documentation requirements of agency activities. The emphasis in this guidance on risk assessment relates to Clinger-Cohen requirements for incorporation of risk management into program activities, particularly for those that are dependent upon information technology (e.g., web site operations).


2.1 What are the records management risks associated with web sites?

From a records management perspective, risk relates to (1) challenge to the trustworthiness of the records (e.g., legal challenge) that can be expected over the life of the record; and, (2) unauthorized loss or destruction of records. Consequences are measured by the degree of loss that the agency or citizens would suffer if the trustworthiness of the web site-related records could not be verified or if there were unauthorized loss or destruction.

Examples of records management-related risks associated with agency web sites are mainly technical risks. Loss of information could result from:

  • an inability to document or validate transactions that occur via an agency web site front end;

  • an inability to reconstruct views of web content that was created dynamically and existed only virtually for the time that they were viewed;

  • compromise of e-Government transactions; and

  • an inability to track web-assisted policy development or document agency decisions relating to agency web operations.

A variety of negative programmatic consequences can result from any of these technical risks:

  • litigation or liability if an agency is unable to verify what was on its site at a given point in time;

  • impairment of program operations or an inability to detect or punish fraud, false statements, or other illegal behavior because of a lack of valid or probative records;

  • an inability to produce records that document accountability and stewardship of materials posted to the agency web site; dissemination of misinformation;

  • financial losses due to compromising the citizens' or government's rights;

  • compromise of the agency's mission;

  • negative reactions of agency stakeholders (e.g., the Executive or Legislative branch); and

  • unfavorable media attention.


2.2 How can I conduct a records management risk assessment?

A risk assessment should address the possible consequences of untrustworthy, lost, or unrecoverable records, including the legal risk and financial costs of losses, the likelihood that a damaging event will occur, and the costs of taking corrective actions. Agencies may have formal risk assessment procedures that may be applied to agency web site operations.

The assessment factors may include records management threats, visibility, consequences, and sensitivity.

Records management threats relate to the likelihood of experiencing technical risks discussed in section 2.1 (e.g., risks of unauthorized destruction of web site-related records, litigation risks associated with inability to reconstruct views of web sites at specific points in time, risks associated with inability to document web site policy decisions, etc.).

Visibility is the level of active public awareness of an agency's web site operations.

Consequences describes the level of negative organizational, economic, or programmatic impact if web records are untrustworthy, lost, or unrecoverable.

Sensitivity characterizes the agency's assessment of the importance of web site operations.

The results of an assessment will support agency programs by providing a basis for determining what types of web site records should be created, how they should be maintained, and how long they should be maintained. The assessment will help agencies ensure that the level of risk is tolerable and that resources are properly allocated. Assessment results can also aid in the development of web site records schedules.


2.3 How do I determine the unit of analysis for doing a risk assessment?

One key aspect of conducting a risk assessment is determining the appropriate unit of analysis; i.e., whether the web site will be assessed as a single entity3 or whether assessments will be conducted for different portions of the site. This is important because it affects the choice of management controls (see section 3.1 ) and scheduling (see SCHEDULING WEB RECORDS, section 5). The concept of the appropriate unit is flexible to allow you to adapt it for your particular site and management needs. Possible units of analysis include the entire site, portions of the site related to specific functions or organizations, clusters of pages on a specific subject, etc.

Basic options for analysis

  • Evaluate the web site in toto. Note that this option is not advisable if the web site has multiple types of content (e.g., e-commerce transactions and static publications) or functions served. Records management risk and required management controls vary for those different portions of the web site

  • Evaluate groupings of web sites referenced by an agency's main portal entry page

  • Evaluate the web site basically as a whole, minus one or two portions that exhibit substantially different characteristics

  • Substantially break out clusters or groups of web site pages based on function or other characteristics. Note that this option does not anticipate a page-by-page risk analysis of your web site.

First consider whether the site has a single level of risk or varying levels of risk. Use the risk assessment factors. If the level you have chosen for analysis has more than one answer to any of the factors, you may need to consider breaking out those portions. Note that changes in any of the four factors could affect the risk level.


Example:

One means of portioning the NARA web site for risk assessment is dividing it by program areas; for example, the Records Management portion of the NARA web site (see http://www.archives.gov/records-mgmt/index.html).

A final example of portioning the NARA web site would be on the basis of the nature of the content pages; for example, those pages composing the National Archives Catalog database (see /research/search/) or unique, one-time exhibits such as those on the Charters of Freedom (see http://www.archives.gov/exhibits/charters/index.html).


Determine the unit of risk assessment in consultation with other agency staff associated with the web site (see GENERAL BACKGROUND, RESPONSIBILITIES, AND REQUIREMENTS, Section 2).

If you decide, for operational reasons, to evaluate the web site as a single unit, all components will be treated the same in terms of risk. You will need to manage all parts of the site in accordance with the highest level of risk determined for any portion of the site. When applying this guidance to portals that are primarily federations of an agency's web sites, you must manage all of the agency web sites at the highest level of risk encountered in the aggregation.


2.4 What types of duties/functions should be involved in assessing risk?

Staff in various roles throughout your agency should contribute to your risk assessment. They will bring knowledge and experience about these aspects of a web site:

  • The nature of the information on the site, who uses it, and what problems might arise if information on a site is incorrect, out-of-date, or lost. An example is the program staff that produce content to be posted to the web site.

  • How information is placed on the site, revised, and removed and, in addition, know what records are created or should be created when these actions take place. An example is the webmaster.

  • Relating business processes to the records that result from those processes, developing procedures for ensuring the trustworthiness of those resulting records, determining appropriate retention periods for those records, and obtaining approved retention schedules reflecting such. An example is a records manager.

  • Expertise in computer technology and the risks that its use can cause or mitigate. An example is the IT staff associated with web site operations.

  • The legal requirements that the agency must follow, unique legal risk that might arise from web site operations, and an understanding of those types of records that may be required in legal proceedings. An example is the legal staff.


2.5 What do I do with the results of my risk assessment?

After you have determined the level of records management risks for the site or portions of the site, you will need to protect the records appropriately. Review any web management policies and procedures that you already have in place to determine whether additional steps are needed. Develop a plan to address records issues (e.g., types of records needed to document the web-based activity, length of time they are needed to support the business purposes), as well as IT issues (e.g., security of the site and information exchanged over the site) and management/internal controls on the processes. The agency's program, web, and IT staff, the agency records officer, and the General Counsel should contribute to developing the plan.


3. Mitigating risk

Risk mitigation issues are of particular relevance to program staff responsible for web content and to webmasters. These issues include how to mitigate risk by producing a web snapshot and other means of documenting web site content, how changes to sites between snapshots can be tracked, and how hyperlinks may be treated when preserving long-term web content pages. This section also addresses the roles of the web management and agency program staff in schedule implementation.


3.1 What steps should I follow to help mitigate risk in managing web records?

Some of the steps outlined here are the same as for other kinds of records. You should address each of these steps:

  • Document the systems used to create and maintain your web records.

  • Ensure that your web records are created and maintained in a secure environment that protects the records from unauthorized alteration or destruction.

  • Implement standard operating procedures for the creation, use, and management of your web records and maintain adequate written documentation of those procedures.

  • Create and maintain your web records according to these documented standard operating procedures.

  • Train agency staff in the standard operating procedures.

  • Develop a retention schedule for your web records and obtain official NARA approval of that retention schedule. (See SCHEDULING WEB RECORDS.) You will need to cite the official disposition authorities found in your schedule if your agency is faced with legal challenges to produce records that have been destroyed.

The results of your risk assessment will indicate the level of effort necessary to mitigate your risks.


3.2 How should an agency manage web site content in order to mitigate risk?

You must preserve the records as long as they are needed for business operations. Traditional records management techniques apply fairly easily to relatively stable contextual and structural web site records. Managing web content pages is more complex. Web content pages may be frequently changed or updated, and when updates or redesign of web site maps change the relation/organization of web content, it may be deemed necessary to set aside a new recordkeeping copy of web site content.

Agencies may preserve web content records by (1) producing a stand-alone copy or snapshot 4 of all content pages on the site at a particular time and (2) accompanying this snapshot with a site map that shows the relationship (i.e., directory structure) of those pages to each other. If your agency decides to take snapshots, you must decide:

  • how frequently a new snapshot should be captured;

  • if it is necessary to track changes in both the content pages and the site map that occur between snapshots; and

  • if it is, how to track these changes (see section 3.4 ). The answers to these questions depend on your risk assessment of web site operations.

Content management systems (CMS) can be used to manage the content of a web site. The system consists of a content management application (CMA) and a content delivery application (CDA). The CMA can relieve the webmaster of many of the decisions and actions required to manage the creation, modification, and removal of content from a web site. A CDA uses and compiles the content management information to update the web site. CMSs can be used to create audit trails associated with content that is created on-the-fly.

To ensure availability of current web content, you may use web server back-up software or an Internet-based service to preserve copies of files or databases to restore the content in case of equipment failure or other catastrophe.

Instead of snapshots or preservation of web content in a records management application (RMA), agencies may decide to manage the live versions of web site content pages while the pages are up on the web site. For low risk web sites, the current posted version of a site plus the standard operating procedures in place used to manage the site and a log of changes may be sufficient for business purposes. Please note that because this option does not set aside recordkeeping copies, it may not be appropriate for medium- and high-risk sites .


3.3 How frequently should I capture a snapshot of my site's web content records?

Determine the frequency of snapshots of a site's web content records and site map by using the risk-profiling factors described in section 2.2 . The unit(s) of analysis for the risk assessment would correspond to the unit considered for the snapshot. Portions of a web site considered of higher records management risk are likely to require more frequent snapshots. The stakeholders discussed in GENERAL BACKGROUND, RESPONSIBILITIES, AND REQUIREMENTS, section 2 should cooperate in deciding how frequently snapshots should be taken.


3.4 How do I track changes to web site content pages between snapshots?

Four types of changes can occur to a web site's content between snapshots:

  1. Changes to the content of an individual page without changing its placement in the overall organization of the web site

  2. Wholesale replacement of an individual page (or sections of pages) without changing its placement in the overall organization of the web site

  3. Changes in location of a page (or groups of pages)

  4. Combinations of changes of these first three types.

Changes of the first two types (i.e., changes to content without changing the page's placement in the overall organization of the web site) can be treated as a version-control issue. You must decide how to best keep track of the versions of content pages.

The most fundamental, non automated approach to tracking web site content, particularly for relatively stable sites, is to "print and file" a recordkeeping copy in the manual recordkeeping system. Another non automated approach to version control is to annotate changes of content pages as a comment in the HTML coding. The comment, which will not appear when the page is displayed in a browser, could indicate when the page was changed (e.g., <!--Updated by MDG on 03/02/03--> ) or could reference the page which it wholesale replaced (e.g., <!--This page replaced content page Introduction_1.html on 09/10/02--> ). Another manual approach would be to maintain a log file of content changes of the first two types of changes. (Keep in mind that neither of these approaches would allow you to actually reconstruct views presented at a particular time. This may be found acceptable per your risk assessment).

Alternatively, you may use content management software (CMS) to track versions of web content in the first two cases. CMS would also offer limited page view reconstruction capabilities-default settings for the databases that support most CMS software would retain only recent changes.

You can handle major changes to the site's directory structure by producing a new site map at the time of major revision. This could be accomplished in a manual or automated manner.

One automated way to track changes is to manage the web content records with a DoD 5015.2-certified records management application (RMA). A DoD 5015.2-certified RMA allows you to impose version control over changed copies of documents. If you use the RMA to store iterative copies of individual web pages as they are changed, you will be able to see how many times and when each page was changed. Web content may be added to an RMA's repository manually or via any of the automated tools discussed below. Please note that DoD-certified RMAs have been endorsed by NARA for civilian agency use because they comply with records management regulations. However, none of the other tools described below were designed for records management.

Another tool is a type of search engine called "web harvester." Also called a "spider" or "crawler," a harvester is a program that visits web sites and reads their pages and other information in order to create entries for a search engine index. You can use harvester software to identify changes to web site content and to gather content related to specific site (sub)units.

When justified by risk assessment, you may want to be able to closely reconstruct the content and structure of a site by combining records of updates to web content pages with snapshots of web sites. The degree of exactness to which a web site may be reproduced depends on whether changes to all static and dynamic files referenced within HTML-encoded content pages were also tracked between snapshots.


3.5 When preserving long-term web content records, how can I treat hyperlinks?

Web content pages use hyperlinks to: (1) jump to another location within the page, (2) jump to a location on other pages within the web site, or (3) jump to a page on another web site. Depending on the preservation strategy chosen, it is possible, and in many cases likely, that these hyperlinks will not continue to function in the preservation copy of the web content records. If the site does not follow external-link-liability-transference policies such as those employing pop-up window notifications, agencies might want to use the following suggestions, to enhance the usability of preservation copies of long-term web content records. For hyperlinks within web content records appraised as permanent, agencies must adhere to NARA's Transfer Instructions for Permanent Web Content Records when transferring the records to NARA.

Suggestions for Managing Hyperlinks in Web Content for Long-Term Preservation

Internal target hyperlinks For hyperlinks that simply send the user to a different location within the same page (aka internal target), no additional work is required, as the link will continue to function when the content page is interpreted by a browser application.
Hyperlinks not under local records management control For hyperlinks that send the user to either a different page or another web site that is not under the agency's records management control, NARA suggests that agencies consider requiring web site content developers to modify the HTML syntax of web content pages containing such hyperlinks on a day-forward basis. This modification would include the insertion of an HTML comment after the hyperlink that described, in plain English, name of the site (and perhaps portion of site) or page to which the hyperlink transfers. For example, the hyperlink in the records management portion of the NARA web site discussing DoD 5015.2-STD that links to the Joint Interoperability Test Command's web site, expressed in HTML (emphasis added) as
<a href="http://jitc.fhu.disa.mil
/recmgt/#standard">
DoD Standard 5015.2</a>
would be modified, inserting an appropriate title attribute, per accessibility requirements, that describes it, as follows: <a href="/global_pages/exit.html?
link=http://jitc.fhu.disa.mil/recmgt/#standard" title="Joint Interoperability Test Command's 5015.2 - STD Records Management Application Design Criteria Standard">DoD Standard 5015.2</a>
Hyperlink to new page within same web site When a page includes a hyperlink that sends the user to another page in the same web site, it would be necessary to insert comments describing the hyperlink only when the site was not being scheduled in toto for the same retention (and those comments could reference the series containing the destination of the hyperlink).


Another alternative would be to produce what is in effect a bibliography for all of the hyperlinks referenced within the content pages composing a site. List all of the URLs referenced by hyperlinks, along with a description of the hyperlinked page (much as in the comment used in the previously suggested method).


4. Roles and responsibilities

4.1 Who is responsible for managing web content records?

Content pages on the web site may originate in many program areas within the agency and may be created by agency staff or contractors. The agency should establish clear guidelines for managing records on web sites. The guidelines should specify whether the program office or the webmaster's office (or other office responsible for the web site) is responsible for implementing the agency's records management policies for these records. The program office and the personnel responsible for agency web operations may each have specific responsibilities in this area. Ideally, agencies should have a team of individuals, including program staff, web management staff, and records management staff, who develop records management plans for the web site. Among their responsibilities are both the development of records schedules and managing the retention and disposition of the web records.

Web content pages may be scheduled as records of the (program office) content owner or of the web services providers. The schedule decision should be based on which office is assigned the responsibility for keeping content current, setting security levels, and identifying access requirements (See SCHEDULING WEB RECORDS for additional information on scheduling web records.)

Each agency decides which office will be responsible for implementing records schedules. If the web content pages are scheduled as records of the relevant program office, that office also implements the schedule. The program office will need to establish procedures to ensure that the schedule is properly implemented, including notification to the web operations staff when web content records need to be destroyed.


4.2 Who is responsible for managing web management and operations records?

Agency personnel who manage the web site are responsible for managing the contextual and structural records necessary to adequately document agency web site operations. Web management records are managed the same as other program records in the agency. Web management records provide context and structure for web sites and do not present the same complexities as content records, which are frequently revised or replaced. Hence, standard records management techniques should be sufficient.


1 For guidance on whether records are trustworthy for legal purposes, consult your Office of General Counsel.

2 ISO/TR 15489-2:2001, Information and documentation - Records management - Part 2: Guidelines. See http://webstore.ansi.org/ansidocstore/product.asp?sku=ISO%2FTR+15489%2D2%3A2001.

3 In cases when assessing a portal web site as a single entity, it is necessary to manage all sites in the portal to the highest level of risk encountered by any individual site in that portal.

4 NOTE: A snapshot captures a web site as it existed at a particular point in time (e.g., by harvesting, exporting to an image format, simple device backup). For web content records appraised as permanent, agencies must use capture method(s) that retain hypertext functionality (e.g., harvesting) as described in NARA's Transfer Instructions for Permanent Web Content Records.

Top