[SCC_Active_Members] Software Archive Problem Statements?

Sun Apr 8 11:50:10 PDT 2007

Hello,

  Here is my 2 cents on the issues. As has been stated, when digital matieral
is restored, we don't always know what it on the media. I have recovered 
several versions of IBSYS from various tape images. Some with names like
1D. So after the media has been read, someone needs to go through it and
see what it is on it. Sometimes this can be very hard. I have one tape image
that I have recovered a version of IBSYS off of, but there is 4 or 5 other
files that I have not been able to figure out, all I know is that I think
they are card deck images from the record lengths. I point this out because
the researcher in 2048 has to be able to figure out what we write today. 
If we today have trouble figuring out what is on the media, the researcher
in 2048 who has only seen pictures of mag tape will be in worse situation
than us. So all storage systems have to save data in human readable and machine
readable format.

  Generally a system we need will allow for us to archive many documents and
media together. Typicaly mainframe software was distributed with a short
document explaining how to load it. Also we need to track the source of the
software, and any possible licensing restrictions on it's use. Also it is
my opinion that information should be stored in a uncompressed format. Disk
space is cheap now, and a droped bit in a uncompressed stream will not render
it totaly unusable. Access to the system will be primaraly read, once a
document/software is added, I would not want to see it changed. But it would
be nice to be able to append to the file/archive as new information is 
recovered. Most source control systems will only store the current file
and changes to the previous versions. If the current version gets corrupt all
versions may be unrecoverable.

  I recommend storing recovered documents/software in XML based archives. These
could be auto converted into PDF (or whatever will replace it), or zip's tar's
etc as needed. Also information about origin, dates etc could be placed in the
file. This information could then be loaded into a SQL database. Since the
database is only a copy of the information this can be changed as our usage 
develops and can also be upgraded to include any other information that we
may find to be usefull in searching the data in the future. I have been working
on a program to combine recovered source, listings, documents and objects
into one PDF file. It would not be a hard change to modify it to read and
generate from a XML file on the fly.

  I am not sure of a good media for storing the digital data to be read at any
time in the future. Maybe the smithsonian can provid us with some long term
stable storage medias. I also recommend multiple backups, both online and
offline in different parts of the world. This way if natural disaster should
hit primary archive site, other copies will still be accessable. 

Rich

-- 
==========================================================================
Richard Cornwell
skyvis at skyvis.best.vwh.net
http://skyvis.best.vwh.net
==========================================================================