[SCC_Active_Members] Trip Report - 2005 Workshop on Mining Software Repositories

Van Snyder van.snyder at jpl.nasa.gov
Mon May 23 15:55:43 PDT 2005


In addition to LAMP, it might be useful to investigate
http://www.netlib.org, an archive of mathematical software, mostly in
Fortran 77.  Very little of this, however, includes defect etc. reports.

An interesting use of defect reports and other software developers'
diaries can be found in "Comparing Development Costs of C and Ada" by
Stephen F. Zeigler at http://www.adaic.com/whyada/ada-vs-c/cada_art.html

-- 
Van Snyder                    |  What fraction of Americans believe 
Van.Snyder at jpl.nasa.gov       |  Wrestling is real and NASA is fake?
Any alleged opinions are my own and have not been approved or disapproved
by JPL, CalTech, NASA, Frederick Gregory, George Bush, or anybody else.
On Mon, 2005-05-23 at 13:55 -0700, Lee Courtney wrote:

> Hi all,
> 
> Last week I attended the 2005 Workshop on Mining Software Repositories (MSR)
> (http://msr.uwaterloo.ca/msr2005/) held in conjunction with the ICSE. My
> purpose for attending the Workshop was to:
> 
> 1) explore use of a CHM Museum archive by the MSR researcher/user community,
> 
> 2) meet and establish contacts with attendees at this Workshop,
> 
> 3) learn about other software repositories and tools in already in place
> that could be used by the Museum, and
> 
> 4) assess state of the art and practice, mapping these onto what our
> potential repository of historic software artifacts.
> 
> Bottom line: interested in this group as a significant user of the Museum's
> Software Collection and source of guidance in developing our archive and
> community.
> 
> Notes
> =====
> Mining Software Repositories (MSR) is a fast growing area of interest in the
> Computer Science community. Focused on developing tools and techniques that
> when applied to software repositories, can be used to increase efficiency
> and quality of software development product and process. Workshop overview
> at http://msr.uwaterloo.ca/msr2005/slides/msrConcl2005.pdf.
> 
> This group is very focused on Open Source (OS) LAMP (Linux, Apache, Mozilla,
> PhP) repositories, as this is the only corpus of "data" easily available to
> the community. LAMP repositories contain not only source code, but defect,
> author, change control, and project information. This data is examined both
> along a temporal dimension and at individual data points.
> 
> [SCC take-away: if possible it would be a Good Thing for our archive to have
> multiple versions of a software system artifact, along with associated
> information. For example, having the defect database associated with a
> software system documents a great deal about not only the software itself,
> but the project, development methodologies used, etc. Recently there was a
> question if the Museum should accept paper version of defect database for
> the XDS CP-V operating system. In light of the discussion from MSR the
> obvious answer is "Yes!", we want to collect this "ancillary" information
> which would otherwise be lost.
> 
> Our challenge is going to be building tools to 'parse' languages and
> structures that are not current or part of the (much) more narrow open
> source software stack.]
> 
> In lunch and break conversations the message I got was that researchers
> would be very interested in having available a body of material based on the
> Museum's Software Collection.
> 
> [SCC takeaway: There is interest in having a dataset other than current open
> source. This group would LOVE to have access to significant repository data
> from commercial and government organizations. The more data we can collect
> around software artifacts, e.g. multiple versions, change logs, defect data,
> the more useful the data will be to this group. At a minimum we should take
> this into consideration when designed the archive.]
> 
> My impression was that attendance at this years workshop was significantly
> larger (2x?) than last years meeting. Out of the 80-90 attendees at the
> Workshop, all but 5-6 were academics. Other organizations registered
> included Microsoft Research, IBM Research, Barclay's Bank, NASA (and
> Computer History Museum). Unfortunately next year is scheduled for the ICSE
> meeting in Shanghai. :-(
> 
> At the Workshop wrap-up lots of feedback to make the meeting two days. One
> day was much to short to dive into the subject matter. The schedule enforced
> at the Workshop allowed a SINGLE question for each presenter, with more
> in-depth discussion vectored off-line.
> 
> 
> Papers and presentations are available at the Workshop website
> (http://msr.uwaterloo.ca/msr2005/program.html). Presentations possibly
> relevant to SCC work:
> 
> - Recovering System Specific Rules from Software Repositories
> http://msr.uwaterloo.ca/msr2005/slides/4.pdf
> 
> - Collaboration Using OSSmole: A repository of FLOSS data and analyses
> http://msr.uwaterloo.ca/msr2005/slides/25.pdf
> Megan Conklin of Elon University et al are doing very interesting work.
> Although her group is tracking a 'live' body of code, verses our 'dead' body
> of artifacts, possibly relevent to our work.
> 
> Jonathan Maletic (Kent State University) Integration and Collaboration
> session chair is also keen on a repository of historic software.
> 
> All the Session 4D: Taxonomies & Formal Representations talks were of
> interest and likely applicable to us.
> 
> Towards a Taxonomy of Approaches for Mining of Source Code Repositories
> http://msr.uwaterloo.ca/msr2005/slides/26.pdf
> 
> A Framework for Describing and Understanding Mining Tools in Software
> Development http://msr.uwaterloo.ca/msr2005/slides/36.pdf
> 
> SCQL: A formal model and a query language for source control repositories
> http://msr.uwaterloo.ca/msr2005/slides/13.pdf
> 
> I see this community being a very interested user of a fully populated and
> rich repository of software artifacts. Especially if we can leverage some of
> the commercial repositories (e.g. IBM, HP, DoD) that have been discussed. On
> the flip side I suspect if we can engage with the MSR community, the Museum
> can benefit from technical knowledge, resources, and tools gleaned from
> current FLOSS work.
> 
> Mary and I have identified "Acamdemics" as one of the five major users of
> the Musuem's software archive, and this group is at ground zero of lot of
> the work that would/could use our data.
> 
> How will we measure the success of our efforts? Number of artifacts
> preserved is obviously one. But MSR-2005 made me also think we could add
> number of users, number of citations in other papers.
> 
> 
> Notes for our CHM May 2006 Workshop on Software Preservation
> ============================================================
> I also used this meeting to think about logistics and flow of the workshop
> we are planning next May. Some quick thoughts:
> 
> - one day was way to short a time to present and discuss the amount of
> material in the MSR Workshop. I suspect we'll have the same issue and 2 days
> is the right length.
> 
> - MSR had a session devoted to "Lightening Talks". Quick 5-minute
> presentations with questions held until a discussion period following.
> Purpose of this session was to present 'position papers' verses 'research
> results'. Some talks worked really well, others not at all. One suggestion
> for next year is to have a set of 2-minute lightening talks at the beginning
> of the day following by concurrent more detailed presentations later. People
> could pick several to attend after getting an overview in the morning.
> 
> - We need to dry run all the logistics (sound, AV) before the meeting to
> insure glitches don't bite us. Not a problem at MSR, but things were so fast
> paced that any AV glitches could have derailed the meeting.
> 
> - This has not been explicitly stated, but I strongly believe we must ask
> for written papers and slide presentations on a date that allows us to have
> the material available for consumption before the meeting.
> 
> - MSR proceedings (and larger ICSE) are only available on-line. No paper to
> follow-up on. My impression this is becoming the norm and not the exception.
> Would be great to publish results of our meeting on the web site
> immediately.
> 
> - Lunches, breaks, and late afternoon/early evening get-togethers greatly
> facilitated
> 
> - Couple times during MSR Workshop time was set aside for guided discussion.
> For example, after the lightening talks (were no questions were entertained)
> there was a discussion period. I think lumping some of these topics together
> facilitated cross-pollination and illuminated relationships between the
> different topics. For example,
> 
> - At end of Workshop organizers presented a summary and thoughts for future
> directions, followed by discussion. Again excellent. Sorry I don't have the
> wrap-up slides.
> 
> - We should include an informal social occasion. This can be as simple as
> meeting for dinner at a location where people can easily interact. Note -
> someplace where the atmosphere (e.g. music) doesn't drown out community
> building.
> 
> As always call/email with questions, comments, or if I can be of assistance.
> 
> Lee Courtney
> 
> MontaVista Software
> 1237 East Arques Avenue
> Sunnyvale, California 94085
> (408) 328-9238	voice
> (408) 328-9204	fax
> Yahoo IM: charlesleecourtney
> 
> _______________________________________________
> SCC_active mailing list
> SCC_active at computerhistory.org
> http://mail.computerhistory.org/mailman/listinfo/scc_active




More information about the SCC_active mailing list