Integration testing your container images with Bazel

Photo by Teng Yuhong on Unsplash

Integration testing your container images with Bazel


5 min read

Before Bazel, many teams were using Docker Compose to manage the workflow of running tests which need to do the following:

  1. Build a docker/OCI image containing your packaged application

  2. Launch a locally running container with that image, and maybe some others like a database

  3. Make sure the applications running inside the containers have the right network and filesystem mounts and are able to locate each other

  4. Execute the test runner to interact with the containers and make assertions about their behavior

  5. Tear everything down so you don't leak resources on your machine, for example shutting down containers and pruning the image data from the docker daemon

When migrating such tests to Bazel, I've seen a lot of developers struggle with how to model this. There are two high-level approaches.

1. Bazel on the "inside"

This is the simpler approach. We'll just take those same steps above, and keep that code approximately the same. Whatever is orchestrating that, whether it's Groovy code in a Jenkins pipeline, or a docker-compose.yml file, we leave it alone. If developers are expected to manually do these steps when debugging the test locally on their dev machine, they keep doing that.

We simply replace "Build a docker/OCI image" with the equivalent bazel run //my:image_target command, so that Bazel will build the image and also load that image into the container runtime (i.e. the docker daemon) with some tag like latest.

Then we replace "Execute the test runner" with the equivalent bazel test //my:tests command, possibly using something like --test_env=MY_SERVICE_PORT=9876 so that the test runner process is able to locate the services it is meant to interact with.

I call this "Bazel on the inside" because the legacy testing scripts still govern the top-level execution flow, and Bazel is just invoked by that flow at a couple points.

The benefit of this approach is that it reduces risk during a Bazel migration: we're changing fewer things at a time. But it has some downsides, so I generally recommend this only as an interim solution:

  • It's non-hermetic. Bazel can't guarantee that the code under test is actually built from HEAD, your scripting has to take care of that. It also won't know when to invalidate the cache entry for the test. You could supply the image as an additional (unused) input to the test target to remediate this.

  • Even if the bazel test invocation gets 100% cache hits, so none of the tests actually execute, you've already spent the time of setting up and tearing down the test fixture. This means you can never get the 5-second "no-op" CI run.

  • It's less portable, if you used CI scripts to orchestrate the steps, it's hard for developers to run it identically on their machine.

2. Bazel on the "outside"

This is the more "idiomatic" approach, and fixes all those downsides. However it's a bigger refactoring project, and changes the mental model for engineers.

In this approach, bazel test is the outermost block in the diagram. The test has a normal dependency on the system-under-test to guarantee it's built from HEAD, in our case that means data = ["//my:image_target"]. Then during the test execution, the test runner has the responsibility of the "lifecycle" methods for setup and teardown, which means launching and stopping the containers.

This is less work than it sounds like, thanks to the excellent testcontainers library. It's available in most languages and there is plenty of documentation on how to interact with it from your setup lifecycle hook. It can also automatically perform the teardown steps so you don't have to worry about your careless coworker exhausting resources on the CI machine.


I'll illustrate "Bazel on the outside" with an example in Python, though you can do this from most languages.


# Follow to instruct Bazel how to build an image from your application.
    name = image_target,
    base = "@distroless_base",
    tars = [layer_target],

# Package the image into a 'tar' format suitable for the 'docker load' command
    name = tarball_target,
    image = image_target,
    repotags = ["bazel/my_app:latest"],

# Our integration test target gains a `data` (runtime) dependency on the tar
    name = "integration_test",
    srcs = [""],
    data = [tarball_target],
    main = "",
    tags = [
    deps = [

Now, inside our test we can access that tar file. We could use the Bazel runfiles library to resolve the location, here I just rely on the symlinks that Bazel creates relative to the test's working directory. Sadly testcontainers doesn't know how to load it into the docker daemon, so we have to do that ourselves:

import json
import docker
import requests
from testcontainers.core.container import DockerContainer

TAR_PATH = "my_wksp/path/to/my.tarball/tarball.tar"
# Match the 'repotags' we applied from the BUILD file
IMAGE_NAME = "bazel/my_app:latest"

def _load_latest_tarball():
    client = docker.from_env()
    with open(TAR_PATH, "rb") as f:

With that bit out of the way, we can write our test case. This test is expecting that the container exposes an AWS lambda function, and we're just calling it with some dummy data:

def test_thing():

    with DockerContainer(
    ) as container:
        # waits for the container to be ready
        port = container.get_exposed_port(8080)
        data = json.dumps({})
        res =
        assert res.json() == "ok"

You can find a complete code listing in this PR:


We use this in several places. One of them is for testing rules_oci itself. There, we are using Testcontainers Cloud so that the test fixture container doesn't have to run on the same machine where the test process is executing, which allows our test to be declared as size="small", meaning that we only need one local CPU and a small amount of RAM to be reserved, and so Bazel can schedule lots of these tests in parallel.

Another application that I'm really excited about uses localstack to provide a high-fidelity mock of AWS so that we can run tests that want to interact with Amazon services. This means we don't need to wire user's real AWS credentials into our tests (a common source of cache misses as these differ between engineers) and we don't have to endure the extra time and potential flakiness that comes from tests that need to create real cloud resources, or the non-hermeticity and test isolation failures that come from tests accessing existing cloud resources that are in an unknown state.

When I find some time, I hope to make this aws_localstack_test target available in our so it's trivial for you to adopt. If your team would benefit from such a thing, and could fund the engineering work, please reach out!