[SCC_Active_Members] Capturing information from the WWW

Tue Jan 23 07:21:49 PST 2007

Here's a "random example": look for www.bobbemer.com at www.archive.org. 
It says the site's robots.txt blocked their spider. Bob Bemer had a very 
interesting web site that went offline a few months after he died, in 
2004. Perhaps the Internet Archive actually has a copy, but who's to 
know; Al and I and various other people made copies. The reason the 
Museum can and should maintain archives of web sites include:

    Having specific goals of what's worth keeping (curation);
    Having a commitment for long-term preservation;
    Having a commitment to work with authors of historically-relevant 
server-based web sites to mirror their underlying databases, not just to 
crawl their content;
    Etc.

Paul

Larry Masinter wrote:
> This is really an area where there is already an organization
> that archives public web sites and makes the archives available,
>
> http://www.archive.org/web/web.php
>
> Why should CHM do its own spidering? What am I missing?
>
>
> Larry
>
>
> _______________________________________________
> SCC_active mailing list
> SCC_active at computerhistory.org
> http://mail.computerhistory.org/mailman/listinfo/scc_active
>