The U.S. Library of Congress is well known for being the world’s largest library. That is, in the traditional, paper format. Now, the library is on the way to hosting the largest digital collection in the world with more than 700 terabytes of data.
Converting holdings to a digital format
This is the new look of the U.S. Library of Congress: blinking lights, lots of cables and an ocean of digital information with more than 50 million individual files. This fancy tower is one of several Web servers that brings most of the information to the Internet.
Jane Mandelbaum manages information technology services at the library. “All the data on our website is here,” she explains.
So far, the library has a total of 700 terabytes of data. But because of copyright issues, only 200 of those are available on the Web.
“A terabyte is about 1,600 CDs or about 330 hours of TV or about 2,000 books and we have about 500 terabytes that we keep in our long term preservation systems,” she adds.
At the Library of Congress, the numbers can be mind-boggling. Experts estimate they have more than 120 million books, 36,000 feature films, hundreds of thousands of music sheets and recordings, and the large collections of manuscripts, Web sites, posters and photography. Yet only one percent of it has been digitized.
Thomas Youkel is the senior systems engineer. “We have a scan lab here that scans anywhere from four to six million items a year,” he says, “I don’t guarantee that all those are put on the web, but a lot of it is.”
Technology used for preservation
Most of the library’s digital collection is for preservation reasons. But it is the one percent of the collection that has been digitized for the web that serves most of its customers: 85 million a year.