Saturday, February 1, 2014

Making life easier for future software archaeologists

Yesterday I went to "The First International Conference on Software Archaeology" run by Robert Chatley and Duncan McGregor - it was excellent. There were "lightning talks" run by Tim Mackinnon - here is a blog version of my talk.


If you have worked on a piece of software that is running in production, but hasn't been changed for a while, you may have had to do some software archaeology to work out how to make changes to it.

In this article, I list some problems that I've encountered when doing software archaeology, and some suggestions for making life easier for future software archaeologists.

My suggestions are not always applicable - but please consider them carefully. It is valuable to your client to make it easier for future software archaeologists to work with your systems. If your systems are any good they will probably be used for much longer than you think.

Where's the source?

Sometimes source code is lost (e.g. because of a VCS migration and some repositories don't get converted because nobody thinks they are needed any more). For Java projects there is a simple way to avoid losing the source code - include the source in your binary jars.

Where is the documentation?

Although it is possible that the source code will be lost, more commonly, source code repositories do survive. However, documentation systems (for example wikis) are likely to be decommissioned sooner.

Even if a documentation system isn't decommissioned, the information related to old projects can get deleted, become out of date or inconsistent with the version actually running.

In order to keep documentation consistent with the system, please commit it to the same VCS repository as the code. Depending on the VCS system used, you might be able to serve documentation to users directly from your source control system.

Where are the dependencies?

Sometimes artifact repositories are decommissioned. For Java projects, instead of using an artifact repository and ivy/maven/gradle etc, commit your dependencies into a "lib" folder and refer to them there. I know this is very controversial approach - it goes against current trends, but is actually a very practical approach. It is likely that the source code repository will outlive the artifact repository.

How do I build the software?

Sometimes build tools go out of fashion and it is difficult to set up a working build for archaeological code. Therefore, at the very least, include instructions about how to build the code in the VCS repository. Even better for the future archaeologist, commit the build tools and any setup scripts.

How do I work on the software?

In addition to being able to build the software, there may be development tools needed to work on it. For example, if the software is partially generated by some (usually hideous) tool. In such cases, some of the source code isn't really what the developer works on (e.g. GUI builder generated code).

Include (at the very least) instructions for how to set up a suitable development environment. Even better, commit the development tools and any setup scripts.

How do I run the code in the production environment?

For a large system, it can be difficult to work out how the production servers are meant to be set up. Therefore, include instructions, or even better, scripts (like Puppet or Chef), for setting up any servers etc.

How did it get to be like it is?

When looking at an old system, can be useful to see the history of decisions about how a system got to be like it is. It can be useful to have a changelog checked into the source code repository. In my lightning talk, Nat Pryce said that for a home project, he committed the complete bug tracker system; that could be very useful for a future archaeologist.

In conclusion

Fashions change (e.g. tools become obsolete), reorganizations happen and systems get migrated (and sometimes things get lost in the process). If you want to do the best for your client, remember that successful software can last a really long time, so you should leave better clues for future software archaeologists.

Copyright © 2014 Ivan Moore

Sunday, August 18, 2013

FFS! Learn how to use source control properly

This article was prompted by a tweet by Nat Pryce:
I thought - similarly - how about source control?

I have seen flaws with how source control is used at almost every company I have worked at. This article is about the ones that make me think "FFS! Learn how to use source control properly". The terminology I have used is from subversion - but the ideas are applicable to all source control systems.

What everyone agrees on (right?)

All files needed to run the build should be in source control, or generated from files (e.g. build files) that are in source control. That is, having done a clean check out from source control, on a new development machine, I should be able to run the build and it should work. Everything needed for the build to work, e.g. libraries or things that need to be installed, should either be in source control or copied/installed on the machine as a result of running the build. (That is the ideal - sometimes it is tolerable to expect some things to already be installed - but that is not ideal).

Don't make me think about which files should or shouldn't be committed

All files generated by your build should be ignored by source control. That is, having run a build, if I haven't added any files myself, I shouldn't see any files available for adding to source control. That is so I don't accidentally commit files that shouldn't be committed.

Running the build should not modify any files that are in source control. That is, having run a build, if I haven't modified any files myself, I shouldn't see any file modifications available for committing to source control. That is so I can see only the changes I've actually made and commit all of them.

Every-day use of source control

Before committing, a developer should update and run the build to check that it is safe to commit. (There are some valid exceptions - I won't go into that now). Some source control systems allow a commit to the central repository even when the working copy is not up-to-date ("svn commit" allows this in some cases, "git push" doesn't allow this at all). If you commit without updating, then the first time the code has been integrated is in source control and so you can't be sure it's going to work).

Commit everything - all modifications, additions and deletions. If you don't commit everything then you might have committed a non self-consistent set of files which don't work. If you have multiple logical changes that need to be committed, you might want to commit different sets of files in different commits in order to describe each change. That can be OK, but it is better to only have one logical change in your workspace at a time and commit that in its entirety. You will probably benefit from working in smaller steps, committing each logical change separately from the other. (When I say "one logical change" that might be as small as a one character change to fix a typo.) If you commit logically separate changes one at a time in their entirety you will end up committing many times a day, and that is a good thing. Having a mechanism to shelve/unshelve changes can be useful for allowing you to make logically separate changes one at a time, safely.


There are many different ways to use source control that are perfectly acceptable. Please comment if you disagree with the things I have written in this article.

Copyright © 2013 Ivan Moore

Monday, January 24, 2011

Do what works for you

I've received literally no emails complaining that my "do the right thing" methodology is too prescriptive, so I've come up with a new methodology called "do what works for you" which I hope will work for those people who find the "do the right thing" methodology too prescriptive.

What you do in this methodology is whatever works for you.

Copyright © 2011 Ivan Moore

Sunday, January 16, 2011

SPA 2011

My favourite conference (SPA) is open for registrations and session proposals.

This year I'm honoured to be conference co-chair with Mike Hill. The programme will be organised by Willem Van Den Ende and Rob Bowley.

Please book your place, or propose a session (do that real soon - you'll have some time to improve your session once it is submitted).

Copyright © 2011 Ivan Moore

Saturday, September 18, 2010

The plural of anecdote is not data

I've just finished reading "Bad Science" and it made me think of how little science has been done about software development.

The only two areas where there has been some research that I can think of (off the top of my head - without doing any research about research) (and that seems very relevant to my day-to-day work) are about pair programming and test driven development.

While I think it is commendable that people have done some research for these topics - it's just not enough (in particular, not by enough different groups or people).

So - what good science is there about software development? Comments welcome.

Copyright © 2010 Ivan Moore

Saturday, September 4, 2010

IDE effecting code

I don't write Java code exactly the same as I used to. Some of the ways my code has changed are due to using an excellent IDE.

One of these I mention in a previous article was about having fields public in some circumstances - I won't repeat the arguments here, I just want to mention that one reason I'd now make a field public when before I would have had a getter/setter is that it is trivial to convert a public field into a getter/setter when you need to (a refactoring called "encapsulate field"** in IDEA). (I still prefer not to have the internals of a class public whether accessed by fields or getters/setters though - tell don't ask.)

Another example is that now I only introduce an interface when it's really needed and not in anticipation of it being needed. Again, it's trivial to introduce the interface when it's needed using the "extract interface" refactoring of the IDE.

I was chatting to Nat Pryce about this and he was agreeing that using an excellent IDE has also changed the way he writes Java. I hope he and others will add comments to mention other ways their programming style has changed as a result of better IDEs.

I know for some people the idea of changing your programming style as a result of what a tool supports is heresy - but I think good development practice means using the tools and language so they work well together.

The examples given above refer to Java development where the team owns all the code rather than when writing an API - I'm not addressing API design here.

** BTW, I think calling the refactoring "encapsulate" is a bit of an exaggeration - really it is just replacing one form of non-encapsulation with another.

Copyright © 2010 Ivan Moore

Wednesday, August 18, 2010

miniSPA 2010 - Friday September 10th - BCS London

Mike Hill and I have been volunteered to co-chair the miniSPA conference.

SPA is a fantastic conference - miniSPA is a condensed (and free!) version - it'll be great - all the places will go, so book now.

Here's the announcement and registration link (sorry the registration system is really horrible):

Experience some of the most popular sessions from this year's BCS SPA conference, for free, at miniSPA2010 on Friday September 10th at BCS London (near Covent Garden).

The miniSPA2010 one-day programme features five sessions, in two streams, that give an excellent guide to the variety and quality you'll find at every SPA conference.

We hope that attending miniSPA will encourage you to submit a session proposal for SPA2011, which will be taking place from June 12-15 (also at BCS London).

For more information visit Booking is essential. Places are limited so reserve yours now.

See for details of our programme of regular events.

©2010 BCS SPA | 5 Southampton Street | London | WC2E 7HA

Copyright © 2010 Ivan Moore