[SCC_Active_Members] Digital Data Format problems
H.M. Gladney
hgladney at pacbell.net
Tue Jul 10 18:25:05 PDT 2007
Randall's general point is well taken, but I cannot endorse his specifics as
close to adequate. (Randall does not assert that they are!)
Notice that in the June meeting I did not represent what's been done so far
with GSDL to be preservation-worthy. GSDL is an adequate content management
base on top of which preservation methodology is feasible. I have doing
that in mind starting in 2008, following the only technical scheme that I am
confident is workable in the face of all known risks (including the
disappearance of institutions such as CHM).
To achieve it once we have a conventional content management infrastructure
reliably in place, there are at least 3 challenges.
(1) Some software needs to be written for durable coding and trustworthy
packaging of content that is reliably bound to adequate provenance metadata.
This is not much software from the point of view of any moderate size
development shop, but I believe it to be more than I can write, test, and
deploy all on my own with suitable quality. However, I believe that SPG
could accomplish it if two or three volunteers would join me in such a
subproject.
(2) SPG and CHM would have to work out the details of metadata schema with a
target of eventual endorsement as "sufficient and official" for the purposes
and mission of CHM. This is not a matter of broad outlines (the METS
standard provides that), but instead requires specification of very detailed
design (and validation programs to test input), perhaps for several levels
of certification as adequate. (For instance, CHM might require that an
archivally-trained employee provide or endorse the final packaging for a
formal accession into a CHM collect, whereas packaging under less stringent
rules would almost certainly be the most we could expect from the many
volunteers that it will take to create a broadly representative software
collection.)
(3) The biggest challenge is that proper representation of digital objects
for long-term value involves work that people find tedious. For instance,
creation of adequate metadata for preservation is not today an institutional
practice in any archive that I know of. (Instead, people fritter away time
and resources with local variants of the Dublin Core metadata scheme.) So
the challenge for SPG is to figure out how significant numbers of
contributors can be persuaded to do this work with sufficient quality.
To some extent the task can be eased by software that makes it easy to
package closely related information with reliable links and with adequate
metadata (ease of use extensions to the work alluded to as (1) above).
However, I think it will take a great deal of persuasion even after we have
such software in hand.
References:
(1) My slides from the February 2007 SPG meeting
(http://www.hgladney.net/PDI.pdf; also http://www.hgladney.net/ARCHpres.htm)
(2) My slides from the June 2007 meeting
(http://www.hgladney.net/CHMpres.pdf)
(3) H.M. Gladney, Preserving Digital Information, Springer Verlag, 2007.
ISBN 978-3-540-37886-0 (http://home.pacbell.net/hgladney/PDIf.pdf)
My Message:
(1) Tailoring GSDL to be an adequate CM base for SPG collecting on an
expanded scale will take us at least until year-end 2007. There is little
point in tackling long-term preservation until we have a reliable CM
infrastructure that people are comfortable with. The slides (2) sketch some
specifics of what's needed.
(2) I'm seeking volunteers to work with me on creating the software tools
and institutional methodology for long-term preservation of software. This
can be an amusing task.
Cheerio, Henry
-----Original Message-----
From: scc_active-bounces at mail.computerhistory.org
[mailto:scc_active-bounces at mail.computerhistory.org] On Behalf Of Randall
Neff
Sent: Thursday, July 05, 2007 9:53 PM
To: SCC_active at computerhistory.org
Subject: [SCC_Active_Members] Digital Data Format problems
At the last meeting, Henry was demonstrating saving PowerPoint files into
his demo content managment system. I was concerned about saving files in
secret, proprietorial formats. Also, the metadata did not record WHICH
version PowerPoint format.
The British National Archives has the same sort of problems, and Microsoft
is going to save the day through virtualization technology.
Of course, you need a copy of every Microsoft OS and application version to
make it work.
The article doesn't mention different floppy disk sizes, and ignores other
operating systems and non Microsoft applications.
Warning of data ticking time bomb
http://news.bbc.co.uk/1/hi/technology/6265976.stm
My suggestion for SPG content is we store it in multiple forms, while we
have access to the proprietary software application.
So we save the original proprietary binary format, a image sequence format
such as .png, and a text format like .txt or .rtf (for searching).
Another problem for future display is that some document formats assume
specific fonts and font metrics are available in the computer excuting the
application.
Randall.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: ../attachments/20070710/d591fc33/attachment-0002.html
More information about the SCC_active
mailing list