[SCC_Active_Members] Trip Report - 2005 Workshop on Mining Software Repositories

Lee Courtney lcourtney at mvista.com
Mon May 23 13:55:35 PDT 2005


Hi all,

Last week I attended the 2005 Workshop on Mining Software Repositories (MSR)
(http://msr.uwaterloo.ca/msr2005/) held in conjunction with the ICSE. My
purpose for attending the Workshop was to:

1) explore use of a CHM Museum archive by the MSR researcher/user community,

2) meet and establish contacts with attendees at this Workshop,

3) learn about other software repositories and tools in already in place
that could be used by the Museum, and

4) assess state of the art and practice, mapping these onto what our
potential repository of historic software artifacts.

Bottom line: interested in this group as a significant user of the Museum's
Software Collection and source of guidance in developing our archive and
community.

Notes
=====
Mining Software Repositories (MSR) is a fast growing area of interest in the
Computer Science community. Focused on developing tools and techniques that
when applied to software repositories, can be used to increase efficiency
and quality of software development product and process. Workshop overview
at http://msr.uwaterloo.ca/msr2005/slides/msrConcl2005.pdf.

This group is very focused on Open Source (OS) LAMP (Linux, Apache, Mozilla,
PhP) repositories, as this is the only corpus of "data" easily available to
the community. LAMP repositories contain not only source code, but defect,
author, change control, and project information. This data is examined both
along a temporal dimension and at individual data points.

[SCC take-away: if possible it would be a Good Thing for our archive to have
multiple versions of a software system artifact, along with associated
information. For example, having the defect database associated with a
software system documents a great deal about not only the software itself,
but the project, development methodologies used, etc. Recently there was a
question if the Museum should accept paper version of defect database for
the XDS CP-V operating system. In light of the discussion from MSR the
obvious answer is "Yes!", we want to collect this "ancillary" information
which would otherwise be lost.

Our challenge is going to be building tools to 'parse' languages and
structures that are not current or part of the (much) more narrow open
source software stack.]

In lunch and break conversations the message I got was that researchers
would be very interested in having available a body of material based on the
Museum's Software Collection.

[SCC takeaway: There is interest in having a dataset other than current open
source. This group would LOVE to have access to significant repository data
from commercial and government organizations. The more data we can collect
around software artifacts, e.g. multiple versions, change logs, defect data,
the more useful the data will be to this group. At a minimum we should take
this into consideration when designed the archive.]

My impression was that attendance at this years workshop was significantly
larger (2x?) than last years meeting. Out of the 80-90 attendees at the
Workshop, all but 5-6 were academics. Other organizations registered
included Microsoft Research, IBM Research, Barclay's Bank, NASA (and
Computer History Museum). Unfortunately next year is scheduled for the ICSE
meeting in Shanghai. :-(

At the Workshop wrap-up lots of feedback to make the meeting two days. One
day was much to short to dive into the subject matter. The schedule enforced
at the Workshop allowed a SINGLE question for each presenter, with more
in-depth discussion vectored off-line.


Papers and presentations are available at the Workshop website
(http://msr.uwaterloo.ca/msr2005/program.html). Presentations possibly
relevant to SCC work:

- Recovering System Specific Rules from Software Repositories
http://msr.uwaterloo.ca/msr2005/slides/4.pdf

- Collaboration Using OSSmole: A repository of FLOSS data and analyses
http://msr.uwaterloo.ca/msr2005/slides/25.pdf
Megan Conklin of Elon University et al are doing very interesting work.
Although her group is tracking a 'live' body of code, verses our 'dead' body
of artifacts, possibly relevent to our work.

Jonathan Maletic (Kent State University) Integration and Collaboration
session chair is also keen on a repository of historic software.

All the Session 4D: Taxonomies & Formal Representations talks were of
interest and likely applicable to us.

Towards a Taxonomy of Approaches for Mining of Source Code Repositories
http://msr.uwaterloo.ca/msr2005/slides/26.pdf

A Framework for Describing and Understanding Mining Tools in Software
Development http://msr.uwaterloo.ca/msr2005/slides/36.pdf

SCQL: A formal model and a query language for source control repositories
http://msr.uwaterloo.ca/msr2005/slides/13.pdf

I see this community being a very interested user of a fully populated and
rich repository of software artifacts. Especially if we can leverage some of
the commercial repositories (e.g. IBM, HP, DoD) that have been discussed. On
the flip side I suspect if we can engage with the MSR community, the Museum
can benefit from technical knowledge, resources, and tools gleaned from
current FLOSS work.

Mary and I have identified "Acamdemics" as one of the five major users of
the Musuem's software archive, and this group is at ground zero of lot of
the work that would/could use our data.

How will we measure the success of our efforts? Number of artifacts
preserved is obviously one. But MSR-2005 made me also think we could add
number of users, number of citations in other papers.


Notes for our CHM May 2006 Workshop on Software Preservation
============================================================
I also used this meeting to think about logistics and flow of the workshop
we are planning next May. Some quick thoughts:

- one day was way to short a time to present and discuss the amount of
material in the MSR Workshop. I suspect we'll have the same issue and 2 days
is the right length.

- MSR had a session devoted to "Lightening Talks". Quick 5-minute
presentations with questions held until a discussion period following.
Purpose of this session was to present 'position papers' verses 'research
results'. Some talks worked really well, others not at all. One suggestion
for next year is to have a set of 2-minute lightening talks at the beginning
of the day following by concurrent more detailed presentations later. People
could pick several to attend after getting an overview in the morning.

- We need to dry run all the logistics (sound, AV) before the meeting to
insure glitches don't bite us. Not a problem at MSR, but things were so fast
paced that any AV glitches could have derailed the meeting.

- This has not been explicitly stated, but I strongly believe we must ask
for written papers and slide presentations on a date that allows us to have
the material available for consumption before the meeting.

- MSR proceedings (and larger ICSE) are only available on-line. No paper to
follow-up on. My impression this is becoming the norm and not the exception.
Would be great to publish results of our meeting on the web site
immediately.

- Lunches, breaks, and late afternoon/early evening get-togethers greatly
facilitated

- Couple times during MSR Workshop time was set aside for guided discussion.
For example, after the lightening talks (were no questions were entertained)
there was a discussion period. I think lumping some of these topics together
facilitated cross-pollination and illuminated relationships between the
different topics. For example,

- At end of Workshop organizers presented a summary and thoughts for future
directions, followed by discussion. Again excellent. Sorry I don't have the
wrap-up slides.

- We should include an informal social occasion. This can be as simple as
meeting for dinner at a location where people can easily interact. Note -
someplace where the atmosphere (e.g. music) doesn't drown out community
building.

As always call/email with questions, comments, or if I can be of assistance.

Lee Courtney

MontaVista Software
1237 East Arques Avenue
Sunnyvale, California 94085
(408) 328-9238	voice
(408) 328-9204	fax
Yahoo IM: charlesleecourtney




More information about the SCC_active mailing list