[SCC_Active_Members] Trip Report - 2005 Workshop on Mining Software Repositories

Ike Nassi nassi at nassi.com
Mon May 23 18:11:54 PDT 2005


Ada was certainly under-appreciated.  (In the interest of full disclosure, 
I helped design it.)
---
Ike

At 03:55 PM 5/23/2005, Van Snyder wrote:
>In addition to LAMP, it might be useful to investigate
>http://www.netlib.org, an archive of mathematical software, mostly in
>Fortran 77.  Very little of this, however, includes defect etc. reports.
>
>An interesting use of defect reports and other software developers'
>diaries can be found in "Comparing Development Costs of C and Ada" by
>Stephen F. Zeigler at http://www.adaic.com/whyada/ada-vs-c/cada_art.html
>
>--
>Van Snyder                    |  What fraction of Americans believe
>Van.Snyder at jpl.nasa.gov       |  Wrestling is real and NASA is fake?
>Any alleged opinions are my own and have not been approved or disapproved
>by JPL, CalTech, NASA, Frederick Gregory, George Bush, or anybody else.
>On Mon, 2005-05-23 at 13:55 -0700, Lee Courtney wrote:
>
> > Hi all,
> >
> > Last week I attended the 2005 Workshop on Mining Software Repositories 
> (MSR)
> > (http://msr.uwaterloo.ca/msr2005/) held in conjunction with the ICSE. My
> > purpose for attending the Workshop was to:
> >
> > 1) explore use of a CHM Museum archive by the MSR researcher/user 
> community,
> >
> > 2) meet and establish contacts with attendees at this Workshop,
> >
> > 3) learn about other software repositories and tools in already in place
> > that could be used by the Museum, and
> >
> > 4) assess state of the art and practice, mapping these onto what our
> > potential repository of historic software artifacts.
> >
> > Bottom line: interested in this group as a significant user of the Museum's
> > Software Collection and source of guidance in developing our archive and
> > community.
> >
> > Notes
> > =====
> > Mining Software Repositories (MSR) is a fast growing area of interest 
> in the
> > Computer Science community. Focused on developing tools and techniques that
> > when applied to software repositories, can be used to increase efficiency
> > and quality of software development product and process. Workshop overview
> > at http://msr.uwaterloo.ca/msr2005/slides/msrConcl2005.pdf.
> >
> > This group is very focused on Open Source (OS) LAMP (Linux, Apache, 
> Mozilla,
> > PhP) repositories, as this is the only corpus of "data" easily available to
> > the community. LAMP repositories contain not only source code, but defect,
> > author, change control, and project information. This data is examined both
> > along a temporal dimension and at individual data points.
> >
> > [SCC take-away: if possible it would be a Good Thing for our archive to 
> have
> > multiple versions of a software system artifact, along with associated
> > information. For example, having the defect database associated with a
> > software system documents a great deal about not only the software itself,
> > but the project, development methodologies used, etc. Recently there was a
> > question if the Museum should accept paper version of defect database for
> > the XDS CP-V operating system. In light of the discussion from MSR the
> > obvious answer is "Yes!", we want to collect this "ancillary" information
> > which would otherwise be lost.
> >
> > Our challenge is going to be building tools to 'parse' languages and
> > structures that are not current or part of the (much) more narrow open
> > source software stack.]
> >
> > In lunch and break conversations the message I got was that researchers
> > would be very interested in having available a body of material based 
> on the
> > Museum's Software Collection.
> >
> > [SCC takeaway: There is interest in having a dataset other than current 
> open
> > source. This group would LOVE to have access to significant repository data
> > from commercial and government organizations. The more data we can collect
> > around software artifacts, e.g. multiple versions, change logs, defect 
> data,
> > the more useful the data will be to this group. At a minimum we should take
> > this into consideration when designed the archive.]
> >
> > My impression was that attendance at this years workshop was significantly
> > larger (2x?) than last years meeting. Out of the 80-90 attendees at the
> > Workshop, all but 5-6 were academics. Other organizations registered
> > included Microsoft Research, IBM Research, Barclay's Bank, NASA (and
> > Computer History Museum). Unfortunately next year is scheduled for the ICSE
> > meeting in Shanghai. :-(
> >
> > At the Workshop wrap-up lots of feedback to make the meeting two days. One
> > day was much to short to dive into the subject matter. The schedule 
> enforced
> > at the Workshop allowed a SINGLE question for each presenter, with more
> > in-depth discussion vectored off-line.
> >
> >
> > Papers and presentations are available at the Workshop website
> > (http://msr.uwaterloo.ca/msr2005/program.html). Presentations possibly
> > relevant to SCC work:
> >
> > - Recovering System Specific Rules from Software Repositories
> > http://msr.uwaterloo.ca/msr2005/slides/4.pdf
> >
> > - Collaboration Using OSSmole: A repository of FLOSS data and analyses
> > http://msr.uwaterloo.ca/msr2005/slides/25.pdf
> > Megan Conklin of Elon University et al are doing very interesting work.
> > Although her group is tracking a 'live' body of code, verses our 'dead' 
> body
> > of artifacts, possibly relevent to our work.
> >
> > Jonathan Maletic (Kent State University) Integration and Collaboration
> > session chair is also keen on a repository of historic software.
> >
> > All the Session 4D: Taxonomies & Formal Representations talks were of
> > interest and likely applicable to us.
> >
> > Towards a Taxonomy of Approaches for Mining of Source Code Repositories
> > http://msr.uwaterloo.ca/msr2005/slides/26.pdf
> >
> > A Framework for Describing and Understanding Mining Tools in Software
> > Development http://msr.uwaterloo.ca/msr2005/slides/36.pdf
> >
> > SCQL: A formal model and a query language for source control repositories
> > http://msr.uwaterloo.ca/msr2005/slides/13.pdf
> >
> > I see this community being a very interested user of a fully populated and
> > rich repository of software artifacts. Especially if we can leverage 
> some of
> > the commercial repositories (e.g. IBM, HP, DoD) that have been 
> discussed. On
> > the flip side I suspect if we can engage with the MSR community, the Museum
> > can benefit from technical knowledge, resources, and tools gleaned from
> > current FLOSS work.
> >
> > Mary and I have identified "Acamdemics" as one of the five major users of
> > the Musuem's software archive, and this group is at ground zero of lot of
> > the work that would/could use our data.
> >
> > How will we measure the success of our efforts? Number of artifacts
> > preserved is obviously one. But MSR-2005 made me also think we could add
> > number of users, number of citations in other papers.
> >
> >
> > Notes for our CHM May 2006 Workshop on Software Preservation
> > ============================================================
> > I also used this meeting to think about logistics and flow of the workshop
> > we are planning next May. Some quick thoughts:
> >
> > - one day was way to short a time to present and discuss the amount of
> > material in the MSR Workshop. I suspect we'll have the same issue and 2 
> days
> > is the right length.
> >
> > - MSR had a session devoted to "Lightening Talks". Quick 5-minute
> > presentations with questions held until a discussion period following.
> > Purpose of this session was to present 'position papers' verses 'research
> > results'. Some talks worked really well, others not at all. One suggestion
> > for next year is to have a set of 2-minute lightening talks at the 
> beginning
> > of the day following by concurrent more detailed presentations later. 
> People
> > could pick several to attend after getting an overview in the morning.
> >
> > - We need to dry run all the logistics (sound, AV) before the meeting to
> > insure glitches don't bite us. Not a problem at MSR, but things were so 
> fast
> > paced that any AV glitches could have derailed the meeting.
> >
> > - This has not been explicitly stated, but I strongly believe we must ask
> > for written papers and slide presentations on a date that allows us to have
> > the material available for consumption before the meeting.
> >
> > - MSR proceedings (and larger ICSE) are only available on-line. No paper to
> > follow-up on. My impression this is becoming the norm and not the 
> exception.
> > Would be great to publish results of our meeting on the web site
> > immediately.
> >
> > - Lunches, breaks, and late afternoon/early evening get-togethers greatly
> > facilitated
> >
> > - Couple times during MSR Workshop time was set aside for guided 
> discussion.
> > For example, after the lightening talks (were no questions were 
> entertained)
> > there was a discussion period. I think lumping some of these topics 
> together
> > facilitated cross-pollination and illuminated relationships between the
> > different topics. For example,
> >
> > - At end of Workshop organizers presented a summary and thoughts for future
> > directions, followed by discussion. Again excellent. Sorry I don't have the
> > wrap-up slides.
> >
> > - We should include an informal social occasion. This can be as simple as
> > meeting for dinner at a location where people can easily interact. Note -
> > someplace where the atmosphere (e.g. music) doesn't drown out community
> > building.
> >
> > As always call/email with questions, comments, or if I can be of 
> assistance.
> >
> > Lee Courtney
> >
> > MontaVista Software
> > 1237 East Arques Avenue
> > Sunnyvale, California 94085
> > (408) 328-9238        voice
> > (408) 328-9204        fax
> > Yahoo IM: charlesleecourtney
> >
> > _______________________________________________
> > SCC_active mailing list
> > SCC_active at computerhistory.org
> > http://mail.computerhistory.org/mailman/listinfo/scc_active
>
>_______________________________________________
>SCC_active mailing list
>SCC_active at computerhistory.org
>http://mail.computerhistory.org/mailman/listinfo/scc_active

-=-=-=-=-=-=-=-=-=-=-=-=-
Ike Nassi, Ph.D.
+1-408-390-8281 (mobile)
Skype me: inassi
nassi at nassi.com
www.nassi.com 8-)




More information about the SCC_active mailing list