Monday, May 7, 2018

"As soon as you can" integration

These days, everyone says they are doing "continuous integration" - but are you really?

Do you build your pull requests and feature branches on a CI server? Do you have individually built company-internal libraries each in their own repository (or built as separately versioned artifacts)? If so, you probably aren't doing "as soon as you can" integration (i.e. what "continuous integration" originally meant).

I ran a session at CITCON with Thierry de Pauw about "as soon as you can" integration. It's a name for a style of "trunk based development", but that term can also be taken to have a less extreme meaning than what I'm describing here.

The following practices all go together to support "as soon as you can" integration:
  • "trunk based development" (an extreme version - just one branch; not even short lived branches)
  • "monorepos" or "multi-app repos" (building against all your source, i.e. no separately versioned, company specific, shared libraries/code)
  • pair programming or mob programming (and/or after-the-fact code reviews if you must, but not code reviews that delay integration)
  • separation of deployment and release of software, i.e. your software should always be deployable even if it isn't all releasable, e.g. using feature toggles
It is the combination of these which enables "as soon as you can" integration.
The most effective teams I've worked on have used this combination of practices.

Limitations

What I'm suggesting here isn't suitable for every team, but has been suitable for almost every team I've worked on. Maybe your situation is special (in particular, the problems identified here are smaller for smaller teams), but please consider whether you could integrate sooner and what benefits that would give you.

Symptoms

Here are some problems that I've seen in teams thinking they are doing continuous integration - but they aren't really:
  • merge conflicts
    • which is a problem because resolving merge conflicts requires human intervention which leads to mistakes
  • people holding off doing a refactoring in order to avoid causing anyone else merge conflicts
  • large refactorings causing other people merge conflicts
  • waste from working on code that has been refactored by someone else but you don't know it yet
  • waste making the same improvement as someone else in parallel
  • not making small improvements to the code unrelated to what you are "working on" due to overheads
  • not suggesting small improvements to pull request because it would delay merging of an otherwise good change
  • suggesting small improvements to a pull request which the author considers annoying nit-picking
  • having to chase someone to review, or merge, a pull request, delaying its integration
  • having to update version numbers (in several places) when modifying a company-internal library
  • lack of refactoring tool support when modifying a company-internal library (for the code that depends on it)
  • the "version update dance" for company-internal library code:
    • change library code
    • wait until it has built to get a new version number
    • update code that uses library to refer to new version number
    • manually update code to match changes made to library code
  • multiple versions of company-internal library code being used by different projects
  • diamond dependencies and difficulty getting a good set of versions of company-internal libraries
  • having to have a special "release" build - so being unable to deploy code for every commit

Causes

  • feature branches
  • pull requests
  • code reviews that block integration
  • multiple repositories for different parts of the same application, e.g. different repository for company-internal library compared to code using it
    • using fully qualified versions of company-internal libraries in order to have a deterministic build:
      • the "version update dance" described earlier
      • delayed integration of library code
      • multiple versions of library code
      • diamond dependencies
    • using "snapshot" dependencies or equivalent ("semantic versioning" is a better equivalent but is still not fully deterministic)
      • transient errors due to indeterministic builds
      • need for a special "release build" meaning integration is only really tested then, hence not continuous integration

Treatment


  • "trunk based development", or at least this version of it:
    • integrate any changes the rest of the team has made into the source on your machine, e.g. git pull -r
    • do some work (e.g. ./build.sh && git commit -am"done some work")
    • integrate any changes the rest of the team has made in the meantime, e.g. git pull -r)
    • share your work with the rest of the team if it all works, e.g. ./build.sh && git push 
    • i.e. everyone work on master all the time. No branches, not even short lived. You already have the equivalent of a branch on your machine - any code that isn't already pushed.
  • "monorepos", or at least this version of it:
    • have all code that has build-time dependencies on each other into the same repository and build from source rather than against versioned artifacts
      • some people use the term monorepo to mean a single repo for an entire company - I don't know of an accepted term for what I mean. I tend to call them "multi-app repos". Suggestions in comments please.
      • other benefits beyond the scope of this article
  • pair programming, or mob programming (there are other reasons why pair programming is good; but I've limited this to what is relevant to "ASAP integration"):
    • constant code review
    • no need to create, review and merge a pull request, so reduced overhead
    • sooner integration, meaning less chance of people working on code that has already changed by someone else
    • code review is in the context of the whole codebase rather than just the diff, meaning tool support for seeing why the code is like it is
    • even the smallest improvement can be made with reduced chance of annoying someone and with no overhead
    • other benefits beyond the scope of this article
  • Separation of deployment and release
    • Push your code to master every time you've done any work that does not break your codebase. It doesn't have to be "finished", it just has to be deployable without breaking anything that has been released.

Side effects

Using this approach, it is possible to break the master branch (note that it doesn't mean you will push anything broken into production, just means the master branch can contain some "bad" commits); for some people this is unacceptable. It is an "optimistic" approach. In almost all teams I've worked in the benefits of this approach outweigh this potential problem, but does require discipline and an approach to working that not everyone is used to:
  • you have to work incrementally
  • you have to have good enough test coverage for it to be a sensible way to work
  • any code you have pushed must be deployable (but not necessarily released) without breaking anything currently released
  • you may have to have a mechanism to release changes independently of the code being deployed, e.g. feature flags
  • you have to try not to break the build
  • if you do push a commit which breaks the build, you have to fix it (or revert it) immediately
  • you should run your build locally before pushing, so your build (including tests) needs to be fast

Benefits

  • Much reduced chance of merge conflicts
    • merge conflicts are a problem because a human resolving a merge conflict is much more likely to make a mistake, e.g. accidentally undo someone else's change, than an automated merge
  • Better support for refactoring
  • Efficient way of working; less to do:
    • no creating a branch
    • no creating a pull request
    • no chasing someone to review your pull request
    • no doing a code review of some code you don't have the context of
    • no merging the pull request
    • no version update dance
It doesn't suit everyone or every team, but if your situation is suitable for you to try it, give it a go and let me know how it goes.

Copyright © 2018 Ivan Moore

No comments: