Web Archiving

CJ-Moki here. here's a post about a topic that I find really interesting:

Some of you might have heard of something called web archiving. According to Wikipedia (and as its name implies), Web archiving is the process of collecting portions of the World Wide Web to ensure the information is preserved in an archive for future researchers, historians, and the public. Archiving is usually done through the use of "crawlers", or bots that automatically go over certain pages to archive them.

The two most well-known web archive services are as follows:

  • The first is the Wayback Machine, a crawler that has been functioning since 1996. The Wayback Machine stores various "snapshots" of webpages on certain dates and times, preserving the page for future looks. The Wayback Machine is interesting to look at old versions of well-known webpages (such as the Nintendo website, circa 1996).
  • The second is Archive.is, run by a former Russian figure skater. In contrast to the Wayback Machine, Archive.is only archives a page at user (or bot) request, and does not obey robots.txt. Archive.is can archive some webpages that the Wayback Machine cannot, and vice-versa.
  1. According to Archive.is' FAQ page, "Pages which violate our hoster's rules (cracks, porn, etc) may be deleted. Also, completely empty pages (or pages which have nothing but text like “502 Server Timeout”) may be deleted."
  2. An example of a page archived using Archive.is is a copy of this blog, dated 1 March (for any future generations interested in obscure blog posts written by young teenagers):

CJ-Moki Live
archived 1 Mar 2017 17:55:35 UTC

I think that web archiving is really interesting.
CJ-Moki, signing off.


Popular posts from this blog

Yes, The Confederate Flag Is Racist

A Good Day For Democracy

The Moss Monster