Running a multi-arch Kubernetes cluster

It is becoming much more common to have mixed architecture Kubernetes clusters. There are several good reasons for this:

AWS provides ARM compute via its a1 nodes at reduced cost relative to Intel virtual machines.
The latest M1 Macs use an ARM chipset.
Raspberry Pi devices use ARM chips and with the more recent 4GB and 8GB RAM versions, they are viable nodes in a small or medium sized Kubernetes cluster.

Having multiple CPU architectures across a cluster is not without challenges. Remember, a Docker image built for amd64 cannot be executed on an arm64 and arm cluster. Docker doesn’t solve the write-once, run-anywhere problem like Java. Just because something is running in Docker doesn’t mean the underlying CPU architecture is abstracted away.

The good news is that the Docker toolchain has advanced recently and is making it easier to build multi-arch images - where you effectively compile one version for each architecture that you want to support. This makes it possible to run on every type of node but it does mean you might be running different code at runtime - so your performance characteristics can be different on each architecture. I have rarely found this to be an issue but in some circumstances it could be.

Running a Kubernetes cluster with different CPU architectures does require some planning. Here are a few tips:

Any Daemonsets which are deployed must only use images that support all architectures in your cluster. For example, if you have amd64 and arm64v8 nodes, the Daemonsets must run on both of these processor types otherwise it is difficult to manage the cluster.
You can taint nodes if you have images that really only work on certain architectures. This needs to be factored into your high availability planning as it will mean that deployments or jobs cannot run on any node in the cluster. You need to make sure you have sufficient capacity in the event of failures.
Your CI/CD platform needs to build images for every architecture to which you’re deploying. More on this below.

Building your code for multi-arch deployment

When a Docker image is built using the default settings, the resulting image will be targeted at the architecture of the build machine. For example, an image built on GitLab using docker build will run on amd64. There are options in the latest version of Docker to cross compile the image for another target architecture. This means that Docker will use the appropriate version of any base images inherited by your image. You may also need to target the artifacts in your image too.

GitLab CI/CD

This is a generic CI/CD template for GitLab CI which will build Docker images for amd64, arm and arm64 platforms. Then it compiles a manifest which allows the appropriate version of the image to be pulled using the same tag: so if you create a Kubernetes deployment in a cluster containing amd64, arm and arm64 nodes, the same tag will pull the correct version for the CPU architecture on which the container is executed.

image: docker:19.03.15

variables:
  VERSION: ${CI_COMMIT_SHA}
  IMAGE_NAME: gcr.io/${GCP_PROJECT_ID}/image-name
  
stages:
- docker-images
- docker-manifest

.login-and-configure-docker:
  image: 'google/cloud-sdk'
  services:
  - name: docker:19.03.15-dind
  before_script:
  - docker info
  - echo "${GCP_SERVICE_KEY}" | base64 -d > key.json
  - gcloud auth activate-service-account --key-file=key.json
  - gcloud auth print-access-token | docker login -u oauth2accesstoken --password-stdin https://gcr.io
  - gcloud auth configure-docker
  after_script:
  - rm -f key.json
  - rm -f /root/.docker/config.json 

.build-image:
  extends: .login-and-configure-docker
  stage: docker-images
  variables:
    DOCKER_HOST: tcp://docker:2375/
    DOCKER_DRIVER: overlay2
    DOCKER_TLS_CERTDIR: ""
  script:
  - docker run --rm --privileged multiarch/qemu-user-static --reset -p yes
  - docker build --platform ${PLATFORM} -t ${IMAGE_NAME}:latest-${BUILD_ARCH} -t ${IMAGE_NAME}:${VERSION}-${BUILD_ARCH} .
  - docker push ${IMAGE_NAME}:${VERSION}-${BUILD_ARCH}
  - docker push ${IMAGE_NAME}:latest-${BUILD_ARCH}

build-amd64:
  extends: .build-image
  variables:
    BUILD_ARCH: amd64
    PLATFORM: linux/amd64

build-arm64:
  extends: .build-image
  variables:
    BUILD_ARCH: arm64
    PLATFORM: linux/arm64/v8

build-arm:
  extends: .build-image
  variables:
    BUILD_ARCH: arm
    PLATFORM: linux/arm/v7

build-multi-arch-manifest:
  extends: .login-and-configure-docker
  stage: docker-manifest
  services:
  - name: docker:19.03.15-dind
    command: [ "--experimental" ]
  variables:
    DOCKER_HOST: tcp://docker:2375/
    DOCKER_DRIVER: overlay2
    DOCKER_TLS_CERTDIR: ""
  script:
  - DOCKER_CLI_EXPERIMENTAL=enabled docker manifest create ${IMAGE_NAME}:${VERSION} --amend ${IMAGE_NAME}:${VERSION}-arm --amend ${IMAGE_NAME}:${VERSION}-arm64 --amend ${IMAGE_NAME}:${VERSION}-amd64
  - DOCKER_CLI_EXPERIMENTAL=enabled docker manifest annotate ${IMAGE_NAME}:${VERSION} ${IMAGE_NAME}:${VERSION}-arm64 --os linux --arch arm64 --variant v8
  - DOCKER_CLI_EXPERIMENTAL=enabled docker manifest annotate ${IMAGE_NAME}:${VERSION} ${IMAGE_NAME}:${VERSION}-arm --os linux --arch arm --variant v7
  - DOCKER_CLI_EXPERIMENTAL=enabled docker manifest annotate ${IMAGE_NAME}:${VERSION} ${IMAGE_NAME}:${VERSION}-amd64 --os linux --arch amd64 
  - DOCKER_CLI_EXPERIMENTAL=enabled docker manifest push ${IMAGE_NAME}:${VERSION}