[SCC_Active_Members] Capturing information from the WWW

Tue Jan 23 13:22:43 PST 2007

I was looking at this to solve my own problem.
My own Microsoft site is at Archive.org.
Was really being lazy and wanting the museum to offer a service to host dusty old and deceased artifacts and people...like museums do.
Was just trying to solve my own problem --- so will get myself a site to maintain or not bother with it on leaving Microsoft.
I have several domains including MyLifeBits.com  that I can use.

g

-----Original Message-----
From: scc_active-bounces at computerhistory.org [mailto:scc_active-bounces at computerhistory.org] On Behalf Of Len Shustek
Sent: Tuesday, January 23, 2007 8:19 AM
To: Paul McJones; Larry Masinter
Cc: 'SCC at CHM'
Subject: Re: [SCC_Active_Members] Capturing information from the WWW

Paul's points are excellent.  Archive.org is doing a great service,
and they've gotten considerably better at making acceptable copies of
some pretty fancy websites.  But it breaks on some sophisticated
scripts (particularly Java-based) and most dynamically-generated
pages.  We have most of our collections catalog online, for example,
and archive.org captures none of it.
-- Len

At 07:21 AM 1/23/2007, Paul McJones wrote:
>Here's a "random example": look for www.bobbemer.com at
>www.archive.org. It says the site's robots.txt blocked their spider.
>Bob Bemer had a very interesting web site that went offline a few
>months after he died, in 2004. Perhaps the Internet Archive actually
>has a copy, but who's to know; Al and I and various other people
>made copies. The reason the Museum can and should maintain archives
>of web sites include:
>
>    Having specific goals of what's worth keeping (curation);
>    Having a commitment for long-term preservation;
>    Having a commitment to work with authors of
> historically-relevant server-based web sites to mirror their
> underlying databases, not just to crawl their content;
>    Etc.
>
>
>Paul
>
>Larry Masinter wrote:
>>This is really an area where there is already an organization
>>that archives public web sites and makes the archives available,
>>
>>http://www.archive.org/web/web.php
>>
>>Why should CHM do its own spidering? What am I missing?
>>
>>
>>Larry
>>
>>
>>_______________________________________________
>>SCC_active mailing list
>>SCC_active at computerhistory.org
>>http://mail.computerhistory.org/mailman/listinfo/scc_active
>>
>
>_______________________________________________
>SCC_active mailing list
>SCC_active at computerhistory.org
>http://mail.computerhistory.org/mailman/listinfo/scc_active

_______________________________________________
SCC_active mailing list
SCC_active at computerhistory.org
http://mail.computerhistory.org/mailman/listinfo/scc_active