<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.5.7036.0">
<TITLE>RE: [SCC_Active_Members] Digital Data Format problems</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/rtf format -->
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Randall's general point</FONT> <FONT SIZE=2 FACE="Arial">is well taken, but I cannot endorse his specifics as close to adequate. (Randall does not assert that they are!)</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Notice that in the June meeting I did not represent what's been done so far with GSDL to be preservation-worthy. GSDL is an adequate content management base on top of which preservation methodology is feasible. I have doing that in mind starting in 2008, following the only technical scheme that I am confident is workable in the face of all known risks (including the disappearance of institutions such as CHM). </FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">To achieve it once we have a conventional content management infrastructure reliably in place, there are at least 3 challenges.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">(1) Some software needs to be written for durable coding and trustworthy packaging of content that is reliably bound to adequate provenance metadata. This is not much software from the point of view of any moderate size development shop, but I believe it to be more than I can write, test, and deploy all on my own with suitable quality. However, I believe that SPG could accomplish it if two or three volunteers would join me in such a subproject.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">(2) SPG and CHM would have to work out the details of metadata schema with a target of eventual endorsement as "sufficient and official" for the purposes and mission of CHM. This is not a matter of broad outlines (the METS standard provides that), but instead requires specification of very detailed design (and validation programs to test input), perhaps for several levels of certification as adequate. (For instance, CHM might require that an archivally-trained employee provide or endorse the final packaging for a formal accession into a CHM collect, whereas packaging under less stringent rules would almost certainly be the most we could expect from the many volunteers that it will take to create a broadly representative software collection.)</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">(3) The biggest challenge is that proper representation of digital objects for long-term value involves work that people find tedious. For instance, creation of adequate metadata for preservation is not today an institutional practice in any archive that I know of. (Instead, people fritter away time and resources with local variants of the Dublin Core metadata scheme.) So the challenge for SPG is to figure out how significant numbers of contributors can be persuaded to do this work with sufficient quality. </FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">To some extent the task can be eased by software that makes it easy to package closely related information with reliable links and with adequate metadata (ease of use extensions to the work alluded to as (1) above). However, I think it will take a great deal of persuasion even after we have such software in hand.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><B><FONT SIZE=2 FACE="Arial">References: </FONT></B></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">(1) My slides from the February 2007 SPG meeting (</FONT></SPAN><A HREF="http://www.hgladney.net/PDI.pdf"><SPAN LANG="en-us"><U><FONT COLOR="#0000FF" SIZE=2 FACE="Arial">http://www.hgladney.net/PDI.pdf</FONT></U></SPAN></A><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">; also </FONT></SPAN><A HREF="http://www.hgladney.net/ARCHpres.htm"><SPAN LANG="en-us"><U><FONT COLOR="#0000FF" SIZE=2 FACE="Arial">http://www.hgladney.net/ARCHpres.htm</FONT></U></SPAN></A><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">)</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">(2) My slides from the June 2007 meeting (</FONT></SPAN><A HREF="http://www.hgladney.net/CHMpres.pdf"><SPAN LANG="en-us"><U><FONT COLOR="#0000FF" SIZE=2 FACE="Arial">http://www.hgladney.net/CHMpres.pdf</FONT></U></SPAN></A><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">)</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">(3) H.M. Gladney,<I> Preserving Digital Information,</I> Springer Verlag, 2007. ISBN 978-3-540-37886-0 (</FONT></SPAN><A HREF="http://home.pacbell.net/hgladney/PDIf.pdf"><SPAN LANG="en-us"><U><FONT COLOR="#0000FF" SIZE=2 FACE="Arial">http://home.pacbell.net/hgladney/PDIf.pdf</FONT></U></SPAN></A><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">)</FONT></SPAN></P>
<P><SPAN LANG="en-us"><B><FONT SIZE=2 FACE="Arial">My Message:</FONT></B></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">(1) Tailoring GSDL to be an adequate CM base for SPG collecting on an expanded scale will take us at least until year-end 2007. There is little point in tackling long-term preservation until we have a reliable CM infrastructure that people are comfortable with. The slides (2) sketch some specifics of what's needed.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">(2)</FONT><B> <FONT SIZE=2 FACE="Arial">I'm seeking volunteers</FONT></B> <FONT SIZE=2 FACE="Arial">to work with me on creating the software tools and institutional methodology for long-term preservation of software. This can be</FONT><B> <FONT SIZE=2 FACE="Arial">an amusing task</FONT></B><FONT SIZE=2 FACE="Arial">.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Cheerio, Henry</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">-----Original Message-----</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">From: scc_active-bounces@mail.computerhistory.org [</FONT></SPAN><A HREF="mailto:scc_active-bounces@mail.computerhistory.org"><SPAN LANG="en-us"><U><FONT COLOR="#0000FF" SIZE=2 FACE="Arial">mailto:scc_active-bounces@mail.computerhistory.org</FONT></U></SPAN></A><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">] On Behalf Of Randall Neff</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Sent: Thursday, July 05, 2007 9:53 PM</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">To: SCC_active@computerhistory.org</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Subject: [SCC_Active_Members] Digital Data Format problems</FONT></SPAN>
</P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">At the last meeting, Henry was demonstrating saving PowerPoint files into his demo content managment system. I was concerned about saving files in secret, proprietorial formats. Also, the metadata did not record WHICH version PowerPoint format.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">The British National Archives has the same sort of problems, and Microsoft is going to save the day through virtualization technology.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Of course, you need a copy of every Microsoft OS and application version to make it work.</FONT></SPAN>
<BR><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">The article doesn't mention different floppy disk sizes, and ignores other operating systems and non Microsoft applications.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial"> Warning of data ticking time bomb</FONT></SPAN>
<BR><SPAN LANG="en-us"></SPAN><A HREF="http://news.bbc.co.uk/1/hi/technology/6265976.stm"><SPAN LANG="en-us"><U><FONT COLOR="#0000FF" SIZE=2 FACE="Arial">http://news.bbc.co.uk/1/hi/technology/6265976.stm</FONT></U></SPAN></A><SPAN LANG="en-us"></SPAN>
</P>
<BR>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">My suggestion for SPG content is we store it in multiple forms, while we have access to the proprietary software application.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">So we save the original proprietary binary format, a image sequence format such as .png, and a text format like .txt or .rtf (for searching).</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Another problem for future display is that some document formats assume specific fonts and font metrics are available in the computer excuting the application.</FONT></SPAN></P>
<P><SPAN LANG="en-us"><FONT SIZE=2 FACE="Arial">Randall.</FONT></SPAN>
</P>
</BODY>
</HTML>