[SCC_Active_Members] Subversion as a basis for software archive

H.M. Gladney hgladney at pacbell.net
Thu Apr 12 11:27:49 PDT 2007


As other notes have suggested, the e-mail conversations within which the
attachment occurs hide assumptions that might differ among authors.

I would like to draw the participants' attention to one such apparent
unspoken assumption: that the pace of collecting "digital stuff" should be
coupled to the pace of accession.  Particularly if one has formal accession
into a CHM collection in mind, close coupling of these paces seems to me
impractical, because formal accession might need to be a painstaking
resource-intensive process (which might be part of why Al is thinking about
requirements right now), whereas "getting our hands on digital stuff"
includes an objective of acquiring such stuff before some of it disappears
forever.  (Since we cannot predict what might still exist today, but be lost
in the next few years, a reasonable tactic is to collect without excessive
concern for quality, sorting out the best stuff years from now.) (**)

A crude quantification estimate might help to illustrate the point, using
the Snobol stuff that we currently hold as an example.  This contains
roughly 30,000 unique files (in the total 65,000 files we received from
Arizona).  We already know that this collection is an incomplete
representation of the program family closely related to Snobol.  One reason
for poking into this collection (as Bob Goldberg and Paul McJones have
started to do) is to identify major omissions which we might then seek.
Guessing that we have roughly 30% of what a "complete" collection might
hold, the eventual Snobol-related collection would be represented by about
100,000 files.

A very crude estimate of the number of collection topics that are of similar
scale to Snobol-family is that there might be between 100 and 500 such
topics of interest for a virtual software museum, with a total file count of
something between 10,000,000 and 50,000,000.  So far, SPG has a start on
about a half dozen such.collections.  Is it reasonable to plan that it will
take SPG between 20 and 100 years to acquire the rest? (#)

The eventual Snobol collection might be describable by between 10 and 50
subtopics.  For formal accession, each subtopic might require the time and
effort of a curator for between a week and two months.  (Creating metadata,
investigating provenance, selecting files for which the provenance is
relatively well known or discoverable, providing access paths for historical
scholars, etc.)  If this is correct, accessioning the Snobol collection
might cost between 10 man-weeks and 8 man-years.(#)  Accessioning the whole
potential body would take between 20 man-years and 4000 man-years. (##)

To me, this line of reasoning suggests that we should avoid close coupling
between the pace of collection activities and that of accessioning work.
The readers of this note might want to infer additional or different
tactical implications.

Cheerio, Henry

(**) I do not intend to imply undiscriminating collection.  Furthermore, I
believe that some affordable level of provenance annotation should be part
of current "getting our hands on digital stuff" activities.

(#) The qualitative points that emerge are unchanged even if I am
over-estimating by an order of magnitude!

(##) Obviously, this will not be accomplished in the foreseeable future.
Instead, people will pick their favorite topics for resource expenditures.

-----Original Message-----
From: scc_active-bounces at computerhistory.org
[mailto:scc_active-bounces at computerhistory.org] On Behalf Of Al Kossow
Sent: Friday, April 06, 2007 7:38 PM
Cc: scc_active at computerhistory.org
Subject: Re: [SCC_Active_Members] Subversion as a basis for software archive

Van Snyder wrote:

 > How about CVS or SCCS?  I think these are based on plain files.
 >

Would the collective 'we' on this list please refrain from suggesting
solutions to this 'problem' until the museum staff has time to actually
generate a REQUIREMENTS document?

I'm sorry about being so blunt, but as an engineer I cannot understand how
any sort of rational discussion on this subject can be made until the actual
problem to solve is presented.


_______________________________________________
SCC_active mailing list
SCC_active at computerhistory.org
http://mail.computerhistory.org/mailman/listinfo/scc_active
-------------- next part --------------
An HTML attachment was scrubbed...
URL: ../attachments/20070412/15ba1c74/attachment.html


More information about the SCC_active mailing list