Oversight Hearing on the National Archives and Records Administration, October 20, 1999
Subcommittee Chairman: Congressman Stephen Horn
H. Thomas Hickerson is President of the Society of American Archivists and Associate University Librarian for Information Technology and Special Collections at Cornell University
It is an honor and a pleasure to have this opportunity to appear here today to provide testimony regarding issues critical to the National Archives and Records Administration. While I will address several general issues, I will, as requested, focus my remarks on the application of digital technologies and particularly on the management of electronic records. My comments will reflect three areas of experience:
(1) My experience of nearly thirty years' involvement in archival practice and extensive professional leadership, including my present service as President of the Society of American Archivists;
(2) My direction at Cornell University of the principal archival and rare book programs, my development over the last eight years of an institute at Cornell dedicated to the building of digital collections based on cultural and scientific holdings, and my present responsibility for library information technologies at Cornell; and
(3) My United States citizenship. While the first two of these provide the basis for my authority and expertise in these proceedings, my citizenship also provides a strong incentive for my the interest in the successful operation of the National Archives and Records Administration (NARA).
In recognizing the importance of NARA for every U.S. citizen, we have only to look at recent experience in Kosovo to see that invading forces sought to systematically destroy land, financial, citizenship, and genealogical records in an effort to destroy economic and political rights and community and cultural identity. In the United States, responsibility for maintaining the archival record is broadly distributed among state and municipal archives, university, corporate, and religious repositories, research libraries, and historical societies and museums, but no institution other than the National Archives is so central and fundamental to the rights of every citizen and to the process of democratic governance. That we dedicate only $200 million annually to this large, complex, and vital undertaking is, on one hand, regrettable, and on the other, a rather remarkable bargain. For I do feel that in spite of the extent of the challenge, overall our National Archives has served us well. But it could have done more in the past, and it must do more in the coming years. The future of the archival record is at a critical juncture. Congressman Horn, I urge that you and your colleagues vigorously assist in this critical transition.
We are living in a time of dramatic and continual change, both large and small. Against a seemingly constantly evolving tableau, the dominant transformation of our time, the information revolution, is taking place. This revolution focuses on information creation, dissemination, use, management, storage, and preservation. As a result, archivists are facing some of the most vexing challenges of modern-day information management, confronting issues essential to government, corporations, institutions, and organizations in operating effectively and in fulfilling their legal requirements. At the same time, archivists continue to be dedicated to preserving and supporting the use of large existing collections documenting historic achievements, social and industrial development, and the experience of everyday life.
As we enter the new century, the pace of change and a growing sense of the seeming fragility of the human record have served to significantly expand the appeal of original materials, as well as the interest in access to digital facsimiles. Museums and other cultural repositories are enjoying extraordinary growth in attendance. In Texas, for example, the Johnson Presidential Library is second in number of visitors only to the Alamo, itself an historic site and museum. Thus, we are faced with the paradox of society's increased interest in historical documents, images, and objects, both in artifactual form and in digital representations, while current records are increasingly generated in electronic form, and e-mail, e-commerce, and Web sites are the predominant means of written communication. In reviewing the mandate and priorities of the National Archives, we must keep in mind this confluence of pervasive interest in our documentary heritage and of transformative changes taking place worldwide. Ideally, an integrated continuum will be established between the records and services of this century and those of the next.
In my specific comments, I will concentrate on three aspects of NARA's mission that I feel are central to their success in the next decade. The first is managing records generated in electronic form. The second is leadership in innovative applications of new technologies. The third is the need to extend services to users and broaden the value of the nation's archives for the American public. Although this third topic is more general, it is related to the other two. While I will express significant concern regarding progress on these issues, I will also emphasize my belief that significant change is underway in all three areas.
Managing Electronic Records
In a report of the House Committee on Government Operations, "Taking a Byte out of History: The Archival Preservation of Federal Computer Records," submitted November 6, 1990, many of the difficulties inherent in the selection, preservation, and use of electronic records over time were clearly identified. Though the nature of the problem and its importance were perceptively stated, recommended actions were explorative in approach rather than action-oriented. No new research was funded; no new programs inaugurated. It is now nearly a decade later, and there is not yet a scalable working model in place for realistically addressing these issues.
While I feel strongly that NARA has been slow to dedicate the necessary resources to this challenge, others have also lagged. We are all a decade behind, and we are only now beginning to confront the issues of long-term preservation and use of digital information in a serious fashion. It is not surprising that the technology industry has not focused attention on the impermanence of digital information, and when they have, they have talked principally about the lifespan of particular media, such as magnetic tape or CD-ROMs. While media permanence is important, it is not the principal challenge. If the bits survive, will we continue to be able to read them when hardware and software generations come and go with increasing speed? We must decide at the point of data generation which information should be retained and usable over time and design a path for migrating those records from one software and hardware generation to the next. At this point, we should not feel secure that the necessary procedures are in place.
The need for research and testing of methods of migrating information from one technology generation to the next or the development of other means of retaining the capacity to use today's information tomorrow is urgent. Government-funded research has not yet made this issue a priority.
Of the six projects selected four years ago in the first phase of the National Science Foundation's (NSF) Digital Library Initiative, none highlighted this issue. In the recently completed Digital Library Initiative Phase 2 competition, in which NSF was joined by several additional federal partners, including active participation by the National Endowment for the Humanities (NEH) and the Library of Congress, greater attention was directed to long-term viability. Nonetheless, out of thirty-three projects funded, it appears that only two projects, those at the University of Michigan and at Cornell University, are focused on preservation issues. In describing the funding of the Cornell project, a joint project of the Cornell Library and the Computer Science Department, William Ferris, Chairman of the National Endowment for the Humanities, reported, "NEH, the National Science Foundation and other federal agencies have begun the process by funding a pioneering, $2.3 million preservation project at Cornell University. This project will develop a standard way of organizing computerized collections, preventing data loss in these collections by alerting managers to the periodic need to upgrade aging CD-ROMs and tapes, and making the collections fully accessible on the Internet. All Americans will benefit because the project will ensure that computerized materials important for the study of America will be preserved and accessible for generations to come." I would like to make two responses to these inspiring remarks. First, while I appreciate Bill Ferris' words of confidence, Cornell's project will only contribute in generating a viable solution to this momentous problem, and second, while he is correct when he says that it is pioneering effort, it should not be. We are all way behind the curve on this issue.
Regrettably, the corporate and institutional sectors do not yet seem to have made significant steps forward either. In part, this is due to the reluctance of the technology industry to bring attention to this issue. Perhaps more importantly, however, it is because of a division of responsibility between those responsible for paper records, frequently corporate and institutional archivists, and those responsible for computing systems, data processing professionals. System designers and programmers have seldom reflected an archival viewpoint, and now that records are frequently available in digital form only, this division of management and perspective will have significant repercussions. I foresee the potential for a 500,000-person sub-industry developing around this issue, and a significant number of those will be archivists, equipped with new skills but embodying traditional archival knowledge and values. The Society of American Archivists has been offering electronic records workshops since the 1980s, and a new distance learning course is so heavily subscribed that we are now taking applications for next year.
While NARA has a long record of active involvement with the management of electronic records, this responsibility must now become a priority in the allocation of resources within the agency. This change and others basic to the new digital environment may be traumatic, but they are necessary. Applied research will be important within NARA, but viable solutions will only be developed and implemented through partnerships with other agencies and with academic, corporate, and professional partners. I am very impressed by recent NARA initiatives of this nature. The Collection-Based Long-Term Preservation Project at the San Diego Supercomputer Center is an outstanding example. Scientists at the Center are working with several federal agencies to develop and test means of preserving the organization of digital collections simultaneously with the digital objects that comprise the collection. NARA is also actively involved in the InterPares Project (International Research on Permanent Authentic Records Electronic Systems), an international theoretical and methodological research project. This project highlights the global nature of this issue. It is supported by funds from agencies in several different countries, including the National Historical Publications and Records Commission, a small funding agency located within NARA that has proven vital to research and development in this area. NARA's involvement in collaborative efforts is essential, and the National Archives should maintain an open professional dialogue regarding successes and failures. Knowledge of their experience will benefit archivists worldwide.
In closing my comments on this issue, I want to emphasize that solving this issue is not just a technological one. It is also a political, social, organizational, and economic issue as well. And it doesn't just require more and differently allocated resources for NARA. An example is the planned retention of data in the individual responses generated by the Year 2000 Census. My understanding is that current plans are for the transfer through optical character recognition (OCR) of the information on the forms at a 98% accuracy rate and storage of that information in ASCII (a basic standard for recording electronic data) on magnetic tapes for retention by NARA. Alternatives would be to create digital image copies of each form and/or create microform copies. It seems that the present choice is based on the project contractor's projected costs. While the technical suitability of each option is deserving of professional investigation, we must also factor usability by those citizens most affected by our choices into such decisions.
I have raised the issue of the Census only as an example of the need for various criteria to be considered in making these technical decisions. In developing solutions to this critical challenge, we must balance our interest in technical efficiency with the requirements necessary for effective governmental and citizen use over time. Our models must incorporate these factors effectively, and we are already a decade late in implementing them. We are losing valuable information today, and more will be lost tomorrow. This is the Y2K that will not go away next year.
Innovative Use of Technology
In the 1970s, when I first began to experiment in the use of computing technologies for archival management, I employed a software package called SPINDEX II (Selective Permutation INDEXing). This software, though based on a precursor created at the Library of Congress, was developed and maintained by the National Archives. A later version was used by Cornell to build a database describing archives and manuscript held by some 1,100 repositories across New York state. In the mid-80s, this information was transferred into the Research Libraries Information Network, an online network that is now the principal international catalog for archival holdings.
In developing and testing SPINDEX, NARA established a partnership of ten institutions, including state, federal, university, and corporate repositories. For those of us who began with SPINDEX, this experience and NARA's leadership were very important. In the early 1980s, however, NARA turned inward in its systems development efforts. At a time when many repositories were adopting common cataloging standards that facilitated the use of existing systems and online access to research information, NARA chose not to adopt these standards. Asserting the unique requirements of the National Archives and refusing to modify existing practice, NARA developed multiple, mutually incompatible systems in-house. To my knowledge, none of these systems have survived, and the goal of SPINDEX development twenty- five years earlier, providing automated access to summary descriptions of all NARA holdings is yet to be realized.
The National Archives has now again embarked on an initiative to provide comprehensive access to cataloging for all NARA holdings. While I still applaud this goal, I am concerned that they have chosen a British system not widely employed in this country. I am not presently able to evaluate the basis of their selection. I have, however, just completed directing the last stages of a four-year selection process to choose a new management system for the Cornell Library. It happens that the system chosen by Cornell has also recently been chosen in exhaustive competitions at the Library of Congress and the National Library of Medicine. I am not suggesting that this system would be ideal for NARA, and I readily acknowledge their differences from these other institutions. Nonetheless, in this age of system interoperability and Internet access, I fear that unique internal needs may be guiding their choices when the ability to provide easy access for agency staff, researchers, and the public should be paramount. The intent of my comments is not to urge use of particular software, but to emphasize that common standards and solutions developed and applied in cooperation with other agencies and institutions are required for success in today's information environment.
My fear that paper-based management and service procedures are still dominating strategic policy is further heightened by the present decision to delay further production of digital copies of NARA holdings for public access via the Internet. As John Carlin has explained to me personally, NARA has chosen to focus on the complexities of electronic records, those originally generated in electronic form, rather than to devote present resources to the creation of digital facsimiles of existing materials via scanners or digital cameras. While I understand the basis of his choice, I must emphasize that the fundamental nature of access to information is changing, and that users expect the availability of both information created in digital form and distinctive holdings copied digitally.
I am not suggesting that NARA will ever convert the majority of its existing holdings to digital form, but the effort by the Library of Congress' American Memory Project to build a virtual collection of 5,000,000 images is broadly perceived as an outstanding success. The Cornell University Library has nearly 2,000,000 images from historical, artistic, and scientific collections available, and a recent survey conducted by the Association of Research Libraries, some 120 of the largest research libraries in North America, found that over 90% of their members were presently conducting or planning projects to digitally convert unique holdings. I do not believe that NARA's decision to suspend conversion efforts at roughly 122,000 documents and visual images is in the best interests of the National Archives nor its global public. Our archives should be available in classrooms, lecture halls, libraries, offices, and homes, as well as everywhere else in a wireless world.
As the closing paragraph in the introduction of Newweek's October 11, 1999 section on "e-Life" explains, "We're at the beginning of a new way of working, shopping, playing, and communicating. At Newsweek, we're calling this phenomenon e-life, and it's just in time. Because the day is approaching when no one will describe the digital, Net-based, computer-connected gestalt with such a transitory term. We'll just call it life." Our nation's archives have to be part of that life.
Extending Services and Broadening the Value of the Nation's Archives
This third topic is very closely related to the preceding discussion of broadening use of NARA's holdings and services via new technologies. A similar interest must be employed to make the experience of the National Archives compelling for those onsite. I am absolutely thrilled by Congressional support for the renovation of the National Archives Building on the Capitol Mall. The planned renovation will dramatically improve storage conditions for records housed there, and it will provide a state-of-the-art technology infrastructure for both staff and researchers. Both of these improvements are critical and long, long overdue, but I must admit that I am most excited by the plans to implement a new concept in the display of the Charters of Freedom, the Declaration of Independence, the Constitution, and the Bill of Rights, and in other exhibition areas throughout the building. As designed, these displays will heightened the significance and level of engagement of viewers through carefully realized pictorial presentations. The planned theater offers the opportunity for new multi-media presentations, similar to those that can be made available through the Web. Although I am aware that private fundraising is required to complete this effort, I urge your fullest support. I applaud the imagination and vision of John Carlin and his colleagues in designing such a wondrous home for these remarkable documents. I think that it will generate a new spirit among visitors, researchers, and staff.
In conclusion, I would like to express on behalf of the Society of American Archivists my warm appreciation to John Carlin for his efforts to develop a cordial and synergistic relationship between NARA and the Society. Mutually beneficial collaborations have developed, and I am confident that our cooperation will grow. The archival profession needs a strong National Archives. I believe that John Carlin is providing effective leadership in confronting current challenges. I urge your support of his efforts and greatly appreciate the opportunity to address you today regarding the future of this distinguished institution so important to each of us.