There was one topic in the talk that I think is so important that I decided to write an article about it. Suprisingly, it's a topic that many users of continuous integration (CI) servers (and even the developers of them in some cases!) don't seem to have thought much about.
Continuous integration
This article assumes you already know what continuous integration is.
The problem with conventional continuous integration servers
Imagine there are three developers, Tom, Dick and Harriet, and a continuous integration server. There is some code in a source code repository and these three developers start with the code cleanly checked out.
- Tom makes some changes and commits his changes.
- The CI server starts running a build.
- Harriet makes some changes and commits her changes.
- Dick makes some changes and commits his changes.
- The CI server reports that the build is OK.
- The CI server starts running a build (because there are new changes for it to run the build on).
- Tom makes some changes and commits his changes.
- The CI server reports that the build is broken.
The question is - who broke the build? (Left as an exercise for the reader). The problem with conventional continuous integration servers is that they can't tell you. A situation like this is inevitable with the vast majority of continuous integration server installations (that actually exist - I know you could install your favourite CI server differently if you had enough money - read on ...).
BTW - Step 7 is a bit of a red herring. It is not necessary for the purpose of the main point of this article, but a common enough situation. Tom thinks he's committed on a green build, but really the build is already broken in terms of the committed code, just not in terms of what the CI server is saying.
Why it is a problem
The problem with not knowing which commit broke the build is that it takes longer to work out who should look at the problem and how they can fix it. If you know which commit broke the build you know who should look at it and they can review their changes to work out why it broke. Furthermore, if you know which commit broke the build (even if you can't work out why), you can revert that change set while the problem is fixed "off line" from the rest of the team.
I am convinced (from years of using CI servers on many teams) that not knowing which commit broke the build is a major contributor to sloppy CI practice - builds staying red for ages, nobody taking responsibility, reduction in commit frequency etc etc. Just knowing which test failed, or "why" the build broke isn't enough. The symptoms don't always tell you the cause. Knowing the cause - i.e. which commit - is what you need to know.
When do you get this problem?
You suffer this problem more as the team gets larger, commits become more frequent, and the length of the build goes up. (Oh, and if developers get sloppier).
You often can't do anything about the size of the team, you'd like to encourage people to commit more frequently and making the build faster can be really hard. (And you might be able to get rid of sloppy developers, but that's quite a different blog post). Ideally the build should be fast, but that is often easier said than done, and even if the build is fast, it doesn't completely eliminate the problem described in this article.
Solutions
Most continuous integration servers don't solve this problem, but some do. These are the solutions that I know about. The CI server can:
a) run the build for the commits (revisions) between the last known good and the first known bad (provided that there is enough capacity in the build farm, e.g. people stop committing while the build is broken).
b) check that the build passes (on a build agent) before committing changes.
c) run multiple (preferably all) commits in parallel on different build agents in a build farm.
build-o-matic does (a) automatically - using a binary search. TeamCity (version 4 EAP) and Pulse (and maybe others) allow running of a previous revision so you can do the equivalent manually (maybe someone will write a plug-in for TeamCity to do what build-o-matic does? There is a precedent).
TeamCity and Pulse do (b) (build-o-matic doesn't - it's a cool feature; I've used it in TeamCity and it works well but you do need a lot of build agents).
build-o-matic does (c). I think TeamCity and possibly some others will too if you have enough build agents. The problem with this is approach is that you really might need a lot of build agents, particularly if the team is large, commit frequently and the build is long.
My preferred solution is to buy enough build agents to do (c) - computers are very cheap. Note however that just because a CI server supports a build farm, even if the build farm is infinitely large it doesn't necessarily mean it'll run the build for all the commits - check your CI server documentation for details. I believe Bamboo has something up it's sleeve on this topic - but I'm not sure if it's public yet (I'll find out and add a comment as appropriate).
There is another solution which is not to use a CI server at all, but instead have a "build token" or an "integration machine" - i.e. serialize all commits, that way you never commit while the build is running. That only works well for small co-located teams with a fast build. But when conditions are suitable, it really works well!
Conclusion
There are lots of CI servers to choose from. I consider working out which commit broke the build as a basic minimum feature of a CI server installation but suprisingly not all CI servers are capable of telling you - and you really need to understand this before you choose a CI server that will leave you with a broken build and not knowing which commit caused it.
Copyright © 2008 Ivan Moore