Federal Records Management

Implications of Recent Web Technologies for NARA Web Guidance

ATTENTION! This product is no longer current. For the most recent NARA guidance, please visit our Records Management Policy page.

Purpose

Federal Agencies are already required by 44 U.S.C ch. 31 and 35, Office of Management and Budget (OMB) Circular A-130 and NARA regulations in 36 CFR ch. XII, subchapter B to have effective and comprehensive records management program for all of their records.

This document explores some of the applications that characterize the emerging web and their impact on records management. In its early days, the web was seen largely as a place to post static documents that were Internet-accessible. More recently, it is seen as a tool for facilitating collaboration across geographic and institutional boundaries. This document examines four applications that create content likely to exist only on the web. Agencies must continue to manage content created via these applications in compliance with NARA's records management guidance, including its Web Management and Transfer Policies.

Introduction

Section 1.2 of Part 2 of the NARA Guidance on Managing Web Records (hereafter NARA Web Guidance) defines web content as information with a Uniform Resource Identifier (URI) 1 over an internet-based protocol. By that definition, all web applications create, deliver, or manage web content even though those applications may look very different. Keep in mind, content on government websites is owned by the government, not individual creators, and is likely to be agency record material.

Web Portals, Really Simple Syndication (RSS), Web Logs (Blogs) and Wikis are the four increasingly popular web applications discussed here. These web applications underscore growing sophistication in how the web is used. New uses imply different types of content, possibly with different records management considerations. Regardless of the differences in the timeliness, presentation, context, or completeness of information yielded by the various web applications, web content may be a record and should be managed as such.

This document includes:

  1. A very brief description of each technology (for further description please see the glossary).
  2. Implications in managing the records produced by these technologies.
  3. Applying the existing NARA Web Guidance and, for that content appraised as permanent, the Transfer Instructions for Permanent Electronic Records guidance, to content created, delivered, or managed by those applications.
  4. Glossary of terms.

This is a preliminary examination of NARA's Web Guidance. A more thorough analysis will follow.

 


1. What are web portals, RSS, blogs and wikis?
Web managers use these applications to meet information management needs and to connect to people in new ways. As a result these applications facilitate a business process that both creates and manages content. This section provides brief descriptions of each of the applications.

1.1. Web Portals
Portals began as "super" web sites but are technically no different from traditional web sites. While they may contain original content, web portals generally function as web sites that provide a starting point or gateway to other existing web resources and are generally created in Hypertext Transfer Markup Language (HTML) and read by web browsers. A portal is a single Uniform Resource Locator (URL) that points to a variety of other existing URLs.

Portals provide:
  • Easy access to a collection of web resources focused on a particular type of information or subject from a variety of sources and,
  • Easy access to different types of web services, such as search engines.

For example, USA.Gov is described as "The U.S. government's official web portal." ( USA.gov). The portal's intent is to give users a gateway to all government content on the web.

1.2. Really Simple Syndication (RSS)
RSS is most often interpreted as Really Simple Syndication alluding to its major function, that of content web syndication. It is an application that provides a mechanism for "pushing" or "feeding" content (a "feed") to subscribing consumers on the web and can be written in any one of a family of eXtensible Markup Language (XML) "syndication" schemas.

RSS uses include:

  • Automatically integrating content from other web sources into web sites;
  • Updating desired information automatically based on posting date (i.e., updated weather reports linked on an agency site);
  • Aggregating desired content into one web site.

A weather report, such as can be found at NOAA's weather site http://www.nws.noaa.gov/, is a good example of the type of information often managed by RSS.

1.3. Web Logs (Blogs)
Blogs are web sites that consist of periodic postings that often focus on a particular subject. Each "blogpost" usually consists of a "Title," "Body," and URL to an article or other media content, and a post date. Blogs also are easily created and uploaded to the web via a browser.

A blog's uses include:

  • A forum or "diary" for the creator;
  • A method to collect comments on postings (if allowed).

The blog for NASA's Launch of Shuttle Mission STS-121, http://www.nasa.gov/mission_pages/shuttle/launch/sts-121/launch-vlcc.html is an example of a government blog application. It provides a multi-media record of the mission

.

1.4. Wikis
A wiki brings together a "community of interest" around a simple web content management application to post and edit web content. Unlike blogs, anyone (with authorization if the site has been restricted in some way) can edit content. Each posting is versioned so that postings can be compared. The wiki feature, in which all past entries are kept in a log as versions of the evolving discussion, makes it a powerful tool for contributing to, and collaborating on, a project in a geographically distributed environment.

More than any other of the applications, a wiki promotes direct and active participation by a distributed network of collaborators in one of the uses listed below. That community of interest is not necessarily limited by Federal agency or even private or public community affiliation.

Wiki uses include:

  • A method of collaborative writing;
  • A method of collaborating on projects;
  • A method of finding consensus around an issue or concept; Virtual meetings;
  • Vocabulary development.

An example of government wiki involvement is http://wiki.na-mic.org/Wiki/index.php/Main_Page. The wiki, administered by the National Alliance for Medical Image Computing and funded by the National Institutes of Health, states "This system is meant to encourage quick and efficient communication among the participating investigators and the interested users." The site encourages use of this wiki for a number of projects of health-related matters, which are all listed and managed separately on the site.

 


2. Implications for Managing Records
When evaluating these applications the significance of their content cannot be predetermined by the technology any more than it can be by the media used for distribution. However, they do suggest certain types of content that have implications for records management. Applications often have more than one function and those functions may be employed differently by different users. As a result, classifying applications by what they do in general is imprecise. Associating an application to the business process it supports is easier to do and may have the added benefit of connecting traditional record inventories with the Government's evolving Federal Enterprise Architecture (http://www.whitehouse.gov/omb/egov/a-1-fea.html) Lines of Business (LOB) if LOBs are in fact used to describe the business process.

While there are standard ways to describe protocols and interfaces, there is no standard vocabulary to describe functional aspects that affect content. This makes characterizing content even more problematic. In general, and to varying degrees, applications discussed here link community participation (processes), put content in new contextual patterns, or actively deliver content. The following terms are not exact descriptions but are used here to loosely identify characteristics that can impact records management.

Those terms are:

  • Interactive aspects;
  • Collaboration;
  • Aggregation;
  • Incremental content;
  • Content replication.

The following table associates these terms with the specific applications

   Interactive Aspects Collaboration Aggregation Incremental Content Content Replication
Portal N/A Brings together different web sites for a common purpose. Bring links of previously disparate sites and resources to a single location. N/A Web sites already exists on their own elsewhere on the web.
RSS Keeps those registered with the feed "updated" on an issue. Feeds assume the recipient trusts what the creator is providing. Can take a number of "feeds" and "aggregate" them on a single web site. Main functional objective is to feed "updated" content. The "feeds" are syndicated to a list of web sites and may become content in multiple locations.
Blog Comments are allowed. Assumes a common interest and may allow comments. Brings together comments about a posting. New postings are often updates of an issue. Original content: comments made on blog postings.
Wiki Content can actually be edited by a community. Allows active participation in a process. Brings together input from a community of interest. Edits are tracked and logged sequentially. Original content: editing content and maintaining those edits as sequential versions.


2.1. Interactive aspects
The applications have interactive functions in that in each case a community of interest is assumed but the extent to which that community may participate varies. Wikis suggest greater participation and an ongoing process of creation leading to a product which may or may not be part of the wiki. Blogs, on the other hand, suggest a series of statements or observations which may or may not lead to something specific. They, like an RSS feed, may only be a series of notifications about an event but have the potential for influencing an agenda. In the private sector they are given the credit for being able to "socially construct an agenda or interpretive frame." ( "The Power and Politics of Blogs"). A blog's impact in the public sector is not as clear yet, however, both wikis and blogs have been seen as tools that can change the way the government operates ( "The Wiki and the Blog: Toward a Complex Adaptive Intelligence Community").

2.2. Collaboration
Collaboration is a major feature of wikis but other applications may also have collaborative aspects. Wikis allow a degree of community involvement not previously experienced on the web. The ease with which participants with web access can contribute means that geography is no longer a hindrance to project participation or document creation.

2.3. Aggregation
All applications have some sort of aggregative nature by bringing together often otherwise available but disparate resources, possibly in a new context. Moving content into a new contextual environment may change the significance of the original content

.

2.4. Incremental content
All web content is, in a sense, incremental because eventually it gets updated. However, RSS, wikis and blogs all assume a more fluid approach to content management.

The content of all three applications is described as "fragile," in that it is intended to have a short "shelf life" either because it becomes outdated by events or is continuously changed by collaborators

.

2.5. Content Replication
The web often presents the same content in different locations or formats or replicates content that has already been made available on other media. This may be to highlight certain content in different ways, make it available to different audiences, or to provide a choice in how the content is downloaded. Portals, for instance, bring existing web sites together in different ways, thus providing a new context. Wikis and blogs, by their nature, contain completely original content.

 


3. Some Example Questions for Content Managers

  • Does the interactive nature of these applications affect the management of their content/records? The interactive/collaborative nature of these applications broadens the range of authorship involved in content creation. This involvement can extend beyond the traditional realm of an agency's records management policies. This content may be managed according to protocols used to manage inter-agency case files.
     
  • Does the fact that these applications may re-use content from other sources have any impact on the ability to effectively manage these records? All content residing in these applications, even that originating outside of the agency, should be treated as unique. For example, external data layers incorporated into an agency's Geospatial Information System should be managed by an agency's records management program.
     
  • What are the management implications resulting from the frequent update of the content in these applications? These applications may be used either for finite projects or indefinite, on-going collaborations. Agencies should conduct a risk assessment of the nature of the project to determine the records management procedures for setting aside recordkeeping copies of this content, including how frequently this is to be accomplished.
     
  • How does an agency ensure the trustworthiness of information maintained in these applications? Content derived from external sources may lack sufficient information to establish the integrity, authenticity, reliability and usability of the information maintained in these applications. The managing entity of each application, in conjunction with advice from records management staff, need to develop procedures to ensure the application is configured to capture such information. These include specification of appropriate metadata.

 


4. Implications for Applying NARA Web Records Management and Transfer Guidance

4.1. NARA Web Guidance
The applications discussed here are all examples of the changing nature of the web and were not necessarily widely used in the Federal Government when the NARA Web Guidance was created. That guidance still applies to web content but NARA recognizes that there are some areas in that guidance that require special consideration and interpretation. This section focuses on some of those areas.

The NARA Web Guidance discusses general background information and responsibilities for web content and the management and scheduling of web content. A discussion of some records management issues follows.

4.1.1. Background and Responsibilities
Any one of the discussed applications fall under the Federal web usage categories described in the NARA Web Guidance as a "fluid repository" or a "communication tool", and all may be a "customer service window" (Part 1, Table 1). In the case of wikis, or RSS feeds, the scheduling agency may have no control over how often the content is changed, putting emphasis on "fluid" and that becomes one more records management concern.

The responsibility for wikis, blogs, RSS feeds and portals is no different than that outlined in the NARA Web Guidance. The recommended "team approach" may have increased significance for these applications (especially in the case of wikis) because of their interactive and collaborative nature. All contributors to any of those applications on an agency's web site must be aware of the potential record value of that content in accordance with 44 U.S.C. 3103 and understand it may need to be managed as record material. In addition content may originate outside the agency.

4.1.2. Risk Assessment
As with any content, the management of web records is essentially a risk management consideration for the agency creating and/or taking responsibility for that content.

"Managing web records properly is essential to effective web site operations, especially the mitigation of the risks an agency faces by using the web to carry out agency business." ( NARA Web Guidance Cover Page)

The NARA Web Guidance suggests that "risk" should be evaluated in light of the characteristics inherent in any particular web application.

The considerations of a record's trustworthiness, reliability, authenticity, integrity and usability cited in NARA Web Guidance are all applicable to content created from these four applications. The five distinguishing characteristics of these applications identified in the preceding table (interactive aspects, collaboration, aggregation, incremental content and content replication) are issues for risk assessment and enter into those considerations.

As delineated in NARA Web Guidancehttp://wiki.na-mic.org/Wiki/index.php/Main_Page, agencies must determine whether the loss of content within these applications may result in: litigation, increased litigation risk, or liability, impairment of program operations, inability to detect fraud, false statements or other illegal behavior, or to account for the stewardship of Government information or property or finances any of which might result in the compromise of citizens' rights or the agency's mission. One approach to addressing that concern is to estimate whether the application documents some aspect of the Agency's business process that is not fully captured in record form elsewhere. That business process would include the environment, and/or cultural climate surrounding the process or the context of the process and not just the facts of the process.

4.1.3. Scheduling
Responsibilities for scheduling are as stated in the NARA Web Guidance.
"The head of the agency... (see U.S.C. Chapter 31) has responsibility for the agency's records management program. The agency Records Officer is responsible for ensuring adequate management and control over agency records, including agency web site-related records.

Other records management-related responsibilities for web records are diffused throughout the agency to programs and functions that create web content..."
( NARA Web Guidance Part 1, Section 2)

The contributions of others outside of the agency may be considered in scheduling. In the case of a portal, where content has been linked from other sources, that same content may not need to be rescheduled. Content created with interactive software is probably unique and should be managed as records.

Comments to postings on a blog may have to be considered for scheduling in addition to the actual postings. In the case of a wiki it must be determined whether the collaborative process documented by the wiki and leading to a finished product should be scheduled along with the product. In addition, because a wiki depends on a collaborative community to provide content, how much content is required to make the wiki significant or "authoritative" from a record perspective has to be determined on a case-by-case basis.

The guidance states there are a number of unit levels for scheduling content from an entire web site to any particular page.
"One key aspect of conducting a risk assessment is determining the appropriate unit of analysis; i.e., whether the web site will be assessed as a single entity or whether assessments will be conducted for different portions of the site." ( NARA Web Guidance, Part 2, Section 2.3) Agencies incorporating these applications should reassess their existing web schedules to identify the appropriate "unit of analysis." For example a wiki may be scheduled as a finite project.

4.2. Transfer
Transfer Instructions for Permanent Electronic Records (NARA Transfer Guidance) is appropriate in all its recommendations for permanent web content arising from use of these applications, including metadata requirements. The Transfer Instructions state that NARA will accept the source documents from a server as either HTML and readable by a standard browser or as XML (with schema and style sheet).

It further suggests that one efficient method of capturing the browser-readable content is to harvest it. In the case of wikis, blogs and portals that have been scheduled as permanent, harvesting is an effective capture mechanism. For RSS feeds however, unless they can be harvested with the help of an "aggregator", the content of a feed should be retrieved from a server as XML (with schema and style sheet) as described in the transfer guidance.

4.3. Implications For Both Management and Transfer
One issue that impacts both the management and transfer guidance is ownership of the content. As a practical matter the NARA Transfer Guidance assumes that the domain of a portal, web site or application determines the ownership of that content.

"A domain name defines the administrative boundaries and content of an agency's web site unless a formal web management agreement specifically allows agency content to reside on a non-agency domain" ( Transfer Instructions for Permanent Electronic Records section 3.3)

In some collaborative situations, however, one agency may assume the hosting of an application like a wiki or portal because they have the resources available but another agency may assume responsibility of the project and content. This may occur without a "formal web management agreement." If this is the case, ownership must not be assumed to be explicitly defined by the domain but must be documented in the scheduling and appraisal process with NARA. The collaborative nature of portals and wikis especially, underscores the increasing need for records managers to document the responsibilities of stakeholders involved in projects driven by these technologies.

 


5. NARA Assistance in Applying this Guidance
NARA's Life Cycle Management Division provides assistance and advice to agency records officers of agencies headquartered in the Washington, DC, area. The Records Management staff in NARA's regional offices provides assistance and advice to agency records officers of agencies headquartered in the field and - in consultation with NWML agency liaison staff - of field offices subordinate to agencies headquartered in Washington. Your agency's records officer may contact the NARA appraiser or records analyst with whom your agency normally works. A list of the appraisal and scheduling work group and regional contacts is posted on the NARA web site at http://www.archives.gov/records-mgmt/appraisal/. The Records Management staff in NARA's regional offices provides assistance to agency records officers across the country. A complete list of NARA regional facilities may be found at http://www.archives.gov/locations/index.html.

 

Glossary

Aggregator
"RSS-aware programs called news aggregators [aggregators search for updates of specified information] are popular in the weblogging community. Many weblogs make content available in RSS. A news aggregator can help you keep up with all your favorite weblogs by checking their RSS feeds and displaying new items from each of them." ( http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html, accessed on 08/10/06)

Blog
"A weblog, which is usually shortened to blog, is a type of web site where entries are made (such as in a journal or diary), displayed in a reverse chronological order. Blogs often provide commentary or news on a particular subject, such as food, politics, or local news; some function as more personal online diaries. A typical blog combines text, images, and links to other blogs, web pages, and other media related to its topic." ( http://en.wikipedia.org/wiki/Blog accessed on 08/10/06)

"The major difference between a blog and a wiki is that a blog is more directly under the control of the owner(s) and the primary objective of a blog is for the owner(s) to express themselves to their target audience. A wiki on the other hand is about collaboration (in a general sense) rather than expressing views." ( http://www.samizdata.net/blog/glossary.html accessed on 08/10/06)

Blogpost
An entry to a blog. ("to blog," meaning "to edit one's weblog or to post to one's weblog"). ( http://en.wikipedia.org/wiki/Blog accessed on 08/10/06)

Feeds
"A web feed is a document (often XML-based) which contains content items, often summaries of stories or weblog posts with web links to longer versions." ( http://en.wikipedia.org/wiki/Web_feed accessed on 08/10/06)

HTML
"The Hypertext Markup Language (HTML) is a simple markup language used to create hypertext documents that are platform independent." ( http://ftp.ics.uci.edu/pub/ietf/html/rfc1866.txt accessed on 08/10/06)

Internet-based protocol
Applications that can run on the internet. "The Internet ... can be briefly understood as "a network of networks". Specifically, it is the worldwide, publicly accessible network of interconnected computer networks that transmit data by packet switching using the standard Internet Protocol (IP). It consists of millions of smaller domestic, academic, business, and governmental networks, which together carry various information and services, such as electronic mail, online chat, file transfer, and the interlinked Web pages and other documents of the World Wide Web." ( http://en.wikipedia.org/wiki/Internet accessed on 08/10/06)

Online Diaries
"Online diaries started in 1995 and were the precursor to the modern blog (online diaries are sometimes referred to as personal blogs). They were also known as online journals. The running updates of online diarists combined with links inspired the term "web logs" which was eventually contracted into the word blog." ( http://en.wikipedia.org/wiki/Online_diary accessed on 08/10/06)

RSS
"RSS is a format for syndicating news and the content of news-like sites, including major news sites, news-oriented community sites , and personal weblogs. But it's not just for news. Pretty much anything that can be broken down into discrete items can be syndicated. Once information about each item is in RSS format, an RSS-aware program can check the feed for changes and react to the changes in an appropriate way. ... The name "RSS" is an umbrella term for a format that spans several different versions of at least two different (but parallel) formats." ( http://www.xml.com/pub/a/2002/12/18/dive-into-xml.html accessed on 08/10/06)

Schema
"An XML Schema describes the structure of an XML document." ( http://www.w3schools.com/schema/default.asp) "XML Schemas express shared vocabularies and allow machines to carry out rules made by people. They provide a means for defining the structure, content and semantics of XML documents in more detail." ( http://www.w3.org/XML/Schema accessed on 08/10/06)

Style Sheet
"Style sheets describe how documents are presented on screens, in print, or perhaps how they are pronounced." ( http://www.w3.org/Style/ accessed on 08/10/06)

URI
"A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource." ( http://www.gbiv.com/protocols/uri/rfc/rfc3986.html accessed on 08/10/06)

URL
"A Uniform Resource Locator (URL) is a compact string representation of the location for a resource that is available via the Internet." ( http://www.ietf.org/rfc/rfc2718.txt accessed on 08/10/06)

W3C
"The World Wide Web Consortium (W3C) develops interoperable technologies (specifications, guidelines, software, and tools) to lead the Web to its full potential. W3C is a forum for information, commerce, communication, and collective understanding." ( http://www.w3.org/ accessed on 08/10/06)

Web Portals
"Commonly referred to as simply a portal, a Web site or service that offers a broad array of resources and services, such as e-mail, forums, search engines, and on-line shopping malls. The first Web portals were online services, such as AOL, that provided access to the Web, but by now most of the traditional search engines have transformed themselves into Web portals to attract and keep a larger audience." ( http://www.webopedia.com/TERM/W/Web_portal.html accessed on 08/10/06)

Web Syndication
"Web syndication is a form of syndication in which a section of a web site is made available for other sites to use. ...(I)n general, web syndication refers to making Web feeds available from a site in order to provide other people an updated list of content from it (for example one's latest forum postings, etc.)." ( http://en.wikipedia.org/wiki/Web_syndication accessed on 08/10/06)


Wiki "Wiki is a piece of server software that allows users to freely create and edit Web page content using any Web browser. Wiki supports hyperlinks and has a simple text syntax for creating new pages and crosslinks between internal pages on the fly." ( http://wiki.org/wiki.cgi?WhatIsWiki) Referring to the speed with which a web site can be created." "Wiki-wiki" means "hurry quick" in Hawaiian. ( http://en.wikipedia.org/wiki/Wiki accessed on 08/10/06)

"The major difference between a blog and a wiki is that a blog is more directly under the control of the owner(s) and the primary objective of a blog is for the owner(s) to express themselves to their target audience. A wiki on the other hand is about collaboration (in a general sense) rather than expressing views." ( http://www.samizdata.net/blog/glossary.html accessed on 08/10/06)

XML
"Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere." ( http://www.w3.org/XML/ accessed on 08/10/06)


1Terms shown in italics the first time they are used are defined in the Glossary at the end of this document.

 

Top