[SCC_Active_Members] Capturing information from the WWW

Al Kossow aek at bitsavers.org
Mon Jan 22 09:27:14 PST 2007


H.M. Gladney wrote:
> Relative to the SPG emphasis on capturing stuff, doing so with a view to 
> classifying, accessioning, obtaining authorization later, etc., it will 
> from time to time be of interest to capture a large set of files in the 
> directory tree hanging from some some interesting Web Page.
> 
> There probably are several available tools to accomplish this.

I normally use some variant of the Unix 'wget' command.

Whatever you use, make sure it preserves the dates of the original
files.

I now have about 100 site snapshots on the RAID at CHM containing about
2 million files.





More information about the SCC_active mailing list