The Web’s Cluttered Attic is About to Get a Little Neater

9420645961_03e537cb5a_bThe Internet Archive has been tracking the Web’s shifting landscape for nearly twenty years now (since 1996, to be exact). Anyone interested in B&N's earliest recorded homepage or forgotten early ebook startups merely has to type the desired URL into the Wayback Machine's search box, and it will (usually) handle the rest.

It has filled this attic of forgotten websites by periodically crawling the web and making copies (with permission) of the sites it found, and as of Wednesday it had captured more that 439 billion webpages, videos, and images.

And that is just the beginning.

As the Wayback Machine approaches its twentieth anniversary, the Internet Archive has set out to rebuild its aging architecture so it can better preserve the modern web. The IA announced last week that it had received a grant from the Laura and John Arnold Foundation (LJAF) which it plans to use to develop the nest generation of the Wayback Machine.

wayback machine

When the new Wayback Machine is up and running in 2017, users will find a faster service that will both be easier to search and will let users search for keywords on archived homepages (a huge improvement on the existing search box, which only lets you look for a URL).

The IA is also planning to rewrite the codebase so that the service is faster and that users will be able to identify who selected a website or webpage for collection by the Internet Archive. Other goals include better support for interactive and rich-media websites (which will be a great improvement on the many earliest archival pages which don't even have the original images).

And last but not least, the Internet Archive is going to extend the Wayback Machine's reach beyond the IA.

The Wayback Machine is great as an archive but its current iteration has a major limitation: you have to go to it to find the page you're looking for.  But with the Wayback Machine 2.0, the IA will be working with partner websites to help those partners identify when a source link has gone dead so that the link can be redirected to a page in the Wayback Machine.

To name one example, the IA is working with the Wikimedia Foundation to identify broken links in Wikipedia articles so they can be replaced with links leading to archived pages in the Wayback Machine. (I for one would also like to see funds used to develop a WP plugin that could serve the same purpose.)

To put it simply, when it goes live some time in 2017 the Wayback Machine 2.0 will resembles less the musty attic we've all used and more a modern archive like Google Books, HathiTrust, etc.

And given the central role the web plays in so many lives, this change is long past due.

"Today, people’s work, and to some extent their lives, are conducted and shared largely online,” said Wendy Hanamura, director of partnerships at the Internet Archive. “That means a portion of the world’s cultural heritage now resides only on the Web. And we estimate the average life of a Web page is only one hundred days before it is either altered or deleted."

VentureBeat, Internet Archive

image by Matt From London

About Nate Hoffelder (11466 Articles)
Nate Hoffelder is the founder and editor of The Digital Reader: "I've been into reading ebooks since forever, but I only got my first ereader in July 2007. Everything quickly spiraled out of control from there. Before I started this blog in January 2010 I covered ebooks, ebook readers, and digital publishing for about 2 years as a part of MobileRead Forums. It's a great community, and being a member is a joy. But I thought I could make something out of how I covered the news for MobileRead, so I started this blog."

Leave a comment

Your email address will not be published.