Friday, July 22, 2016

Automated pipeline support for Consumer Driven Contracts

When developing a system composed of services (maybe microservices) some services will depend on other services in order to work. In this article I use the terminology "consumer"[1] to mean a service which depends on another service, and "provider" to mean a service being depended upon. This article only addresses consumers and providers developed within the same company. I'm not considering external consumers here.

This article is about what we did at Springer Nature to make it easy to run CDCs - there is more written about CDCs elsewhere.

CDCs - the basics

CDCs (Consumer Driven Contracts) can show the developers of a provider service that they haven't broken any of their consumer services.

Consider a provider service called Users which has two consumers, called Accounting and Tea. Accounting sends bills to users, and Tea delivers cups of tea to users.

The Users service provides various endpoints, and over time the requirements of its consumers change. Each consumer team writes tests (called CDCs) which check whether the provider service (Users in this case) understands the message sent to it and responds with a message the consumer understands (e.g. using JSON over HTTP).

How we used to run CDCs

We used to have the consumer team send the provider team an executable (e.g. an executable jar) containing the consumer's CDCs. It was then up to the provider team to run those CDCs as necessary, e.g. by adding a stage to their CD (Continuous Delivery) pipeline. A problem with this was that it required manual effort required by the provider team to set up such a stage in the CD pipeline, and required effort each time a consumer team wanted to update its CDCs.

How we run them now

Our automated pipeline system allows consumers to define CDCs in their own repository and declare which providers they depend upon in their pipeline metadata file. Using this information, the automated pipeline system adds a stage to the consumer's pipeline to run its "CDCs" against its providers, and also in the provider's pipeline to run its consumers' CDCs against itself. In our simple example earlier this means the pipelines for Users, Accounting, Tea and Marketing will be something like this:

Users:

Accounting:

Tea:


i.e. Users runs Accounting and Tea CDCs against itself (in parallel) after it has been deployed. Accounting and Tea run their CDCs against Users before they deploy.

This means that:
  • when a change is made to a consumer (e.g. Tea), its pipeline checks that its providers are still providing what is needed. This is quite standard and easy to arrange.
  • when a change is made to a provider (e.g. Users), its pipeline checks that it still provides what its consumers require. This is the clever bit that is harder to arrange. This is the point of CDCs.

Benefits of automation

By automating this setup, providers don't need to do anything in order to incorporate their consumers' CDCs into their pipeline. The providers also don't have to do anything in order to get updated versions of their consumers' CDCs.

The effort of setting up CDCs rests with the teams who have the dependency, i.e. the consumers. The consumers need to declare their provider (dependency) in their metadata file and define and maintain their CDCs.

Subtleties

There are a few subtleties involved in this system as it is currently implemented.
  • the consumer runs its CDCs against the provider in the same environment it is about to deploy into. There may be different versions of providers in different environments and this approach checks that the provider works for the version of the consumer that is about to be deployed, and will prevent the deployment if it is incompatible.
  • the provider runs the version of each consumer's CDCs corresponding to the version of the consumer in the same environment that the provider has just deployed into. There may be different versions of consumers in different environments and this approach checks that the provider works for the versions of consumers that are using the provider.
  • the system deploys the provider before running the consumer CDCs because the consumer CDCs need to run against it. It would be better for the system to deploy a new version of the provider without replacing the current version, run its consumers' CDCs and then only switch over to the new version if the CDCs all pass.
  • because the consumer's CDCs need to run against the provider in the appropriate environment, the system sets an environment variable with the host name of the provider in that environment. Because we only have one executable per consumer for all its CDCs, if a consumer has multiple providers, it needs to use those environment variables in order to determine which of its CDCs to execute.

Implementation notes

The implementation of a consumer running its CDCs against its provider is relatively straightforward. The difficulties are when a provider runs its consumers' CDCs against itself.

In order for a provider to run its consumers' CDCs the system clones each consumer's repository at the appropriate commit and then runs the appropriate executable in a Docker container. (The implementation doesn't clone every time, just if the repository hasn't been cloned on that build agent before.) Using Docker for running the CDCs means that consumers can implement their CDCs using whatever technology they want, as long as it runs in Docker.

In our system, all services are required to implement an endpoint which returns the git hash of the commit that they are built from. This is used to work out which version of consumer's CDCs to run in the case when they are run in a provider's pipeline.

Automating the running of CDCs in our automated pipelines required either making providers know who their consumers are, or consumers know who their providers are. If a provider doesn't provide what a consumer requires, it causes more problems for the consumer than for the provider. Therefore we made it the responsibility of the consumer to define that it depends on the provider rather than the other way around.


1 Other terminology in use for consumer is "downstream" and for provider is "upstream". A consumer is a dependant of a provider. A provider is a dependency of a consumer. I sometimes use the word producer instead of provider.

Copyright © 2016 Ivan Moore