[SCC_Active_Members] Content Management pilot for CHM Software Preservation Group

H.M. Gladney hgladney at gmail.com
Sun Jun 24 19:10:24 PDT 2007


We have had many discussions about digital content management for enabling
SPG work.  At our June 26 meeting I will introduce you to a Greenstone
Digital Library (GSDL) pilot installation.  It begins creation of
infrastructure that can serve us well for 3-5 years for collecting,
organizing, and showing off whatever software collections SPG can achieve.  

A draft of presentation slides I plan to use is available at
http://www.hgladney.net/CHMpres.pdf.  These diagrams sketch what is in place
today, and what tailoring is required to make this GSDL instance an
attractive tool for SPG volunteers.  

GSDL aspects that attracted me to this open-source package, but that are not
obvious from its documentation, include that:
	-- a GSDL server can scale to about 100 collections each holding up
to about 100,000 files within 10Gbyte;
	-- SPG volunteers will be able to create and tailor GSDL collections
from anywhere in the world;
	-- GSDL creates search indices automatically for files of any format
for which an information extraction plugin is provided;
	-- Web presentations for museum visitors can exploit the content of
a GSDL (as they can with any other CM offering);
	-- GSDL supports arbitrary metadata schema (DC and METS are
available today); and
	-- GSDL can export its collections in a standard format that other
CM offerings can import.

I have ingested the Snobol collection from Arizona, and Paul McJones'
Fortran, Lisp, and C++ collections. GSDL visitors can poke at those.  No
curatorial work has been done on these collections yet.  Moreover, you will
quickly discover that searching finds text documents and images, but not
programs.  As far as we know, no-one in the world has created subroutines to
extract useful search indices from program collections.  Doing this well
might be a nontrivial challenge.  I hope to find volunteers to help tackle
it, starting with obvious heuristics.  (If we can invent useful algorithms,
programming them as GSDL plugins will be neither difficult nor laborious.)

The current GSDL instance is a pilot, and will remain a pilot at least until
year-end.  I intend to provide library update privilege only to a few SPG
volunteers until we are confident that enough GSDL functionality is working
reliably, is attractive to its users, and makes them productive without much
technical help. Tailoring required is suggested at
http://www.hgladney.net/SPGprojects.htm.

The current GSDL for SPG server machine (called HMG3) is a Linux PC running
in my home.  If the pilot proves as useful as I expect, we'll probably need
to move the service to a more robust, better controlled, and faster platform
before calling it a production service.  Learning the practicalities of such
a step is part of the purpose of the pilot.  A few SPG members have been
discussing this quietly, and will continue to think about.  This is not an
urgent matter, because tailoring the pilot for CHM-SPG work will not reach
sufficient refinement before 2008.  During 2007, comments, questions, and
criticism of the pilot will be invaluable.  Please don't hesitate.

Cheerio, Henry
H.M. Gladney, Ph.D.  http://home.pacbell.net/hgladney

Web pages from which you can examine this activity are:

Pilot CHM-SPG DL: http://www.hgladney.net/gsdl/cgi-bin/library
Help for SPG librarians: http://www.hgladney.net/GSDLibrarian.htm
GSDL bibliography: http://www.hgladney.net/GSDLbib.htm
GSDL examples: http://www.hgladney.net/GSDLsites.htm
Projects to refine the CHM-SPG DL: http://www.hgladney.net/GSDLibrarian.htm
An overview with links to the resources above and more is available at
http://www.hgladney.net/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: ../attachments/20070624/35488872/attachment-0002.html


More information about the SCC_active mailing list