Putting the tea into team: May 2018

These days, everyone says they are doing "continuous integration" - but are you really?

Do you build your pull requests and feature branches on a CI server? Do you have individually built company-internal libraries each in their own repository (or built as separately versioned artifacts)? If so, you probably aren't doing "as soon as you can" integration (i.e. what "continuous integration" originally meant).

I ran a session at CITCON with Thierry de Pauw about "as soon as you can" integration. It's a name for a style of "trunk based development", but that term can also be taken to have a less extreme meaning than what I'm describing here.

The following practices all go together to support "as soon as you can" integration:

"trunk based development" (an extreme version - just one branch; not even short lived branches)
"monorepos" or "multi-app repos" (building against all your source, i.e. no separately versioned, company specific, shared libraries/code)
pair programming or mob programming (and/or after-the-fact code reviews if you must, but not code reviews that delay integration)
separation of deployment and release of software, i.e. your software should always be deployable even if it isn't all releasable, e.g. using feature toggles

It is the combination of these which enables "as soon as you can" integration.
The most effective teams I've worked on have used this combination of practices.

Limitations

What I'm suggesting here isn't suitable for every team, but has been suitable for almost every team I've worked on. Maybe your situation is special (in particular, the problems identified here are smaller for smaller teams), but please consider whether you could integrate sooner and what benefits that would give you.

Symptoms

Here are some problems that I've seen in teams thinking they are doing continuous integration - but they aren't really:

merge conflicts

which is a problem because resolving merge conflicts requires human intervention which leads to mistakes

people holding off doing a refactoring in order to avoid causing anyone else merge conflicts
large refactorings causing other people merge conflicts
waste from working on code that has been refactored by someone else but you don't know it yet
waste making the same improvement as someone else in parallel
not making small improvements to the code unrelated to what you are "working on" due to overheads
not suggesting small improvements to pull request because it would delay merging of an otherwise good change
suggesting small improvements to a pull request which the author considers annoying nit-picking
having to chase someone to review, or merge, a pull request, delaying its integration
having to update version numbers (in several places) when modifying a company-internal library
lack of refactoring tool support when modifying a company-internal library (for the code that depends on it)
the "version update dance" for company-internal library code:

change library code
wait until it has built to get a new version number
update code that uses library to refer to new version number
manually update code to match changes made to library code

multiple versions of company-internal library code being used by different projects
diamond dependencies and difficulty getting a good set of versions of company-internal libraries
having to have a special "release" build - so being unable to deploy code for every commit

Causes

feature branches
pull requests
code reviews that block integration
multiple repositories for different parts of the same application, e.g. different repository for company-internal library compared to code using it

using fully qualified versions of company-internal libraries in order to have a deterministic build:

the "version update dance" described earlier
delayed integration of library code
multiple versions of library code
diamond dependencies

using "snapshot" dependencies or equivalent ("semantic versioning" is a better equivalent but is still not fully deterministic)

transient errors due to indeterministic builds
need for a special "release build" meaning integration is only really tested then, hence not continuous integration

Treatment

"trunk based development", or at least this version of it:

integrate any changes the rest of the team has made into the source on your machine, e.g. git pull -r
do some work (e.g. ./build.sh && git commit -am"done some work")
integrate any changes the rest of the team has made in the meantime, e.g. git pull -r)
share your work with the rest of the team if it all works, e.g. ./build.sh && git push
i.e. everyone work on master all the time. No branches, not even short lived. You already have the equivalent of a branch on your machine - any code that isn't already pushed.

"monorepos", or at least this version of it:

have all code that has build-time dependencies on each other into the same repository and build from source rather than against versioned artifacts

some people use the term monorepo to mean a single repo for an entire company - I don't know of an accepted term for what I mean. I tend to call them "multi-app repos". Suggestions in comments please.
other benefits beyond the scope of this article

pair programming, or mob programming (there are other reasons why pair programming is good; but I've limited this to what is relevant to "ASAP integration"):

constant code review
no need to create, review and merge a pull request, so reduced overhead
sooner integration, meaning less chance of people working on code that has already changed by someone else
code review is in the context of the whole codebase rather than just the diff, meaning tool support for seeing why the code is like it is
even the smallest improvement can be made with reduced chance of annoying someone and with no overhead
other benefits beyond the scope of this article

Separation of deployment and release

Push your code to master every time you've done any work that does not break your codebase. It doesn't have to be "finished", it just has to be deployable without breaking anything that has been released.

Side effects

Using this approach, it is possible to break the master branch (note that it doesn't mean you will push anything broken into production, just means the master branch can contain some "bad" commits); for some people this is unacceptable. It is an "optimistic" approach. In almost all teams I've worked in the benefits of this approach outweigh this potential problem, but does require discipline and an approach to working that not everyone is used to:

you have to work incrementally
you have to have good enough test coverage for it to be a sensible way to work
any code you have pushed must be deployable (but not necessarily released) without breaking anything currently released
you may have to have a mechanism to release changes independently of the code being deployed, e.g. feature flags
you have to try not to break the build
if you do push a commit which breaks the build, you have to fix it (or revert it) immediately
you should run your build locally before pushing, so your build (including tests) needs to be fast

Benefits

Much reduced chance of merge conflicts

merge conflicts are a problem because a human resolving a merge conflict is much more likely to make a mistake, e.g. accidentally undo someone else's change, than an automated merge

Better support for refactoring
Efficient way of working; less to do:

no creating a branch
no creating a pull request
no chasing someone to review your pull request
no doing a code review of some code you don't have the context of
no merging the pull request
no version update dance

Putting the tea into team

Monday, May 7, 2018

"As soon as you can" integration

Limitations

Symptoms

Causes

Treatment

Side effects

Benefits

Blog Archive

Followers

Contributors