Saturday, April 4, 2015

Example of automatically setting up pipelines

To explain automatically setting up pipelines, I've prepared example code (available here - see the license) for GoCD using gomatic and made a video showing the code running - which I will refer to throughout the article. I could possibly have done this all by narrating the video - maybe I will in the future.

Inception creation

The example includes a script (called create_inception_pipeline.py) which creates the pipeline that will create the pipelines (you run this only once):

from gomatic import GoCdConfigurator, HostRestClient, ExecTask, GitMaterial

configurator = GoCdConfigurator(HostRestClient("localhost:8153"))

pipeline = configurator\
    .ensure_pipeline_group("inception")\
    .ensure_replacement_of_pipeline("inception")\
    .set_git_material(GitMaterial("https://github.com/ivanmoore/inception.git",
                                  polling=False))\
    .set_timer("0 0 * * * ?")  # run on the hour, every hour
inception_job = pipeline.ensure_stage("inception").ensure_job("inception")
inception_job.ensure_task(ExecTask(["python", "inception.py"]))

configurator.save_updated_config()

This creates a pipeline in GoCD (here running on localhost) which runs inception.py on a timer.

The video starts with this script being run, and the "inception" pipeline has been created by time 0:19.

Inception

The inception.py script creates a pipeline for every repo of a particular github user. For a real system, you might want to do something more sophisticated; this example has been kept deliberately simple.

from gomatic import GoCdConfigurator, HostRestClient, ExecTask
from github import Github, GithubException

github = Github()
me = github.get_user("teamoptimization")
for repo in me.get_repos():
    try:
        print "configuring", repo.name
        configurator = GoCdConfigurator(HostRestClient("localhost:8153"))

        pipeline = configurator\
            .ensure_pipeline_group("auto-created")\
            .ensure_pipeline(repo.name)\
            .set_git_url(repo.clone_url)
        job = pipeline\
            .ensure_initial_stage("bootstrap")\
            .ensure_job("configure-pipeline")
        bootstrap_file_url = "https://raw.githubusercontent.com/ivanmoore/inception/master/bootstrap.py"
        job.ensure_task(ExecTask(["bash", "-c", "curl -fSs " + bootstrap_file_url + " | python - " + repo.name]))

        configurator.save_updated_config()
    except GithubException:
        print 'ignoring', repo.name

This script creates a pipeline with a "bootstrap" stage for each repo (unless it already exists). As long as nothing else creates pipelines with the same names, the "bootstrap" stage will end up being the first stage. The bootstrap stage runs the bootstrap.py script described later, passing it the name of the repo/pipeline as an argument.

In the video, the "inception" pipeline is triggered manually at time 0:20 (rather than waiting for the timer) and has finished by time 1:03 (and has no affect yet as the relevant user has no repositories).

The bootstrap stage

In this example, the bootstrap.py script creates a stage for every line of a file (called commands.txt); this makes it easy to demonstrate one of the key features of the approach - the ability to keep the pipeline in sync with the repo it is for. One of the subtleties is that the bootstrap has to make sure that it doesn't remove itself, but does need to remove all other stages so that if stages are removed from commands.txt then they will be removed from the pipeline. Note that because of how gomatic works, if there is no difference as a result of removing then re-adding the stages, then no POST request will be sent to GoCD, i.e. it would be entirely unaffected.

import sys
from gomatic import GoCdConfigurator, HostRestClient, ExecTask

configurator = GoCdConfigurator(HostRestClient("localhost:8153"))

pipeline_name = sys.argv[1]

pipeline = configurator\
    .ensure_pipeline_group("auto-created")\
    .find_pipeline(pipeline_name)

for stage in pipeline.stages()[1:]:
    pipeline.ensure_removal_of_stage(stage.name())

commands = open("commands.txt").readlines()
for command in commands:
    command_name, thing_to_execute = command.strip().split('=')
    pipeline\
        .ensure_stage(command_name)\
        .ensure_job(command_name)\
        .ensure_task(ExecTask(thing_to_execute.split(" ")))

configurator.save_updated_config()

A real bootstrap script might be much more sophisticated, for example, creating a build stage automatically for any repo which contains a certain file (e.g. build.xml or maven.pom) and creating deployment stage(s) automatically. The example bootstrap.py script is as short as I could make it for the purposes of demonstrating the approach.

In the video, the user creates a repository (from time 1:04 - 1:39) and then creates a commands.txt file, commits and pushes (up to time 2:18). Rather than waiting for the timer, the "inception" pipeline is manually triggered at time 2:22 and by 2:43 the pipeline is created for "project1". Rather than wait for GoCD to run the new pipeline (which it would after a minute or so) it is manually triggered at time 2:54, and when it runs, it creates the stage defined in commands.txt.

In the video, at time 3:54 the user adds another line to commands.txt and commits and pushes. Rather than wait for GoCD to run the pipeline (which it would after a minute or so) it is manually triggered at time 4:27, and when it runs, it adds the new stage defined in commands.txt.

Copyright © 2015 Ivan Moore

Saturday, February 7, 2015

Automatically setting up pipelines

Scripting the set up your continuous integration (CI) server is better than clicky clicky, but it might be possible to do even better. If you have many pipelines that are very similar then you might be able to fully automate their set up. A bit like having your own, in house version of Travis CI.

This article will use the GoCD terms "pipeline" and "stage" (a pipeline is somewhat like a "job" Jenkins, and a pipeline comprises one or more stages).

This article describes (at a very high level) the system my colleague Hilverd Reker and I have set up to automatically create pipelines. This has built on experience I gained doing something similar with Ran Fan at a previous client, and being the "customer" of an automated CI configuration system at another previous client.

Inception

We have a pipeline in our CI server to automatically create the pipelines we want. We have called this "inception", after the film - I think Ran Fan came up with the name.

The inception pipeline looks for new things to build in new repositories, and sub directories within existing repositories, and creates pipelines as appropriate (using gomatic). (The inception process that Ran Fan and I wrote previously, looked for new things to build within "one large repo" (maybe the subject of a future blog article), and new branches of that repository).

The advantage of having this fully automated, compared to having to run a script to get the pipeline set up, is that it ensures that all pipelines get set up: none are forgotten and no effort is required.

Our inception job sets up a pipeline with only one stage, the bootstrap stage, which configures the rest of the pipeline. This keeps the inception job simple.

The bootstrap stage

Some of the configuration of a pipeline depends upon the files in the repository that the pipeline is for. By making the first stage in the pipeline the bootstrap stage, it can configure the pipeline accurately for the files as they exist when the pipeline runs. If a pipeline is configured by the inception job, or by running a script, rather than a bootstrap stage, then its configuration will not reflect the files in the repository when they change, but rather how they were at the time the inception job, or script, ran. This would result in pipelines failing because they are not set up correctly for the files they are trying to use; hence we have the bootstrap as part of the pipeline itself to solve that problem.

Implementation notes

Our bootstrap stage only alters the configuration of the pipeline if it needs to: it runs very quickly if no changes are needed. GoCD handles changes to the configuration of pipeline well. After the bootstrap stage has run, the subsequent stages run in the newly configured, or reconfigured, pipeline as expected. GoCD also handles the history of a pipelined reasonably well (but not always getting it right), even when it's configuration changes over time.

Example

What would help right now would be an example - but that'll take time to prepare; watch this space (patiently) ...

Copyright ©2015 Ivan Moore 

Wednesday, January 14, 2015

Scripting the configuration of your CI server

How do you configure your CI server?

Most people configure their CI server using a web based UI. You can confirm this by searching for "setting up Jenkins job", "setting up TeamCity build configuration", "setup ThoughtWorks Go pipeline" etc. The results will tell you to configure the appropriate CI server through a web based UI, probably with no mention that this is not the only way.

One of my serial ex-colleagues, Nick Pomfret, describes using these web based UIs as "clicky-clicky". In this article I will use the Jenkins term "job" (aka "project") to also mean TeamCity build configuration or GoCD pipeline. In this article, I'm calling GoCD a CI server; get over it.

What is wrong with clicky-clicky? 

Clicky-clicky can be useful for quick experiments, or maybe if you only have one job to set up, but has some serious drawbacks. 

It works - don't change it

Once a job has been set up using clicky-clicky, one problem is that it is difficult to manage changes to it. It can be difficult to see who has changed what, and to restore a job to a previous configuration. Just version controlling the complete CI server configuration file (which some people do) does not do this well, because such files are difficult to diff, particularly when there are changes to other jobs.

Lovingly hand crafted, each one unique

Another problem with clicky-clicky is when you have a lot of jobs that you would like to set up in the same way, clicky-clicky is both time consuming, and inevitably leads to unintended inconsistencies between jobs, which can cause them to behave in slightly different ways, causing confusion and taking longer to diagnose problems.

Can't see the wood for the tabs 

Furthermore, web UIs often don't make it easy to see everything about the configuration of a job a compact format - some CI servers are better than others for that.

The right way - scripting

If you script the setup of jobs, then you can version control the scripts. You can then safely change jobs, knowing that you can recreate them in the current or previous states, and you can see who changed what. If you need to move the CI server to a new machine, you can just rerun the scripts.

In some cases a script for setting up a job can be much more readable than the UI because it is often more compact and everything is together rather than spread over one or more screens.

Fully automated configuration of jobs

It can be very useful to script the setup of jobs so it is totally automatic; i.e. when a new project is created (e.g. a new repo is created, or a new directory containing a particular file, e.g. a build.gradle file, is created), then a job can be created automatically. If you take that approach, then it saves time because nobody needs to manually setup the jobs, it means that every project that needs a job gets one and none are forgotten, and it means that the jobs are consistent so it is easy to know what they do.

There are some subtleties about setting up fully automated jobs which I won't go into here - maybe a future blog article.

Tools for scripting

For GoCD, see gomatic. For other CI servers, please add a comment if you know of anything that is any good!

Copyright ©2015 Ivan Moore