[SCC_Active_Members] Capturing information from the WWW
Paul McJones
paul at mcjones.org
Tue Jan 23 07:21:49 PST 2007
Here's a "random example": look for www.bobbemer.com at www.archive.org.
It says the site's robots.txt blocked their spider. Bob Bemer had a very
interesting web site that went offline a few months after he died, in
2004. Perhaps the Internet Archive actually has a copy, but who's to
know; Al and I and various other people made copies. The reason the
Museum can and should maintain archives of web sites include:
Having specific goals of what's worth keeping (curation);
Having a commitment for long-term preservation;
Having a commitment to work with authors of historically-relevant
server-based web sites to mirror their underlying databases, not just to
crawl their content;
Etc.
Paul
Larry Masinter wrote:
> This is really an area where there is already an organization
> that archives public web sites and makes the archives available,
>
> http://www.archive.org/web/web.php
>
> Why should CHM do its own spidering? What am I missing?
>
>
> Larry
>
>
> _______________________________________________
> SCC_active mailing list
> SCC_active at computerhistory.org
> http://mail.computerhistory.org/mailman/listinfo/scc_active
>
More information about the SCC_active
mailing list