CI/CD for Kubernetes Operator Projects with GitHub Actions

Tech reviewed: Deepak Prasad
CI/CD for Kubernetes Operator Projects with GitHub Actions

Building a Kubernetes Operator is only the first step. A controller that works with make run on one developer laptop still needs a repeatable CI/CD pipeline that proves every change is safe to merge, build, and release.

Most people searching for Kubernetes Operator CI/CD with GitHub Actions are not looking for a generic CI/CD definition. They usually want a working .github/workflows/operator-ci.yml file for a Kubebuilder or Operator SDK Go project, plus clear answers for envtest, kind, image publishing, OLM bundle validation, caching, and secrets.

This guide starts with a complete GitHub Actions workflow you can adapt, then explains each part so you know what belongs on pull requests, what belongs on main, and what should wait for release gates.

This article assumes a Go-based Kubebuilder or Operator SDK (go/v4) project with a Makefile exposing targets such as generate, manifests, test, docker-build, and optionally bundle. For local testing strategy and the difference between fake clients, envtest, and kind, read Testing Kubernetes Operators with envtest, fake client, and kind first.

GitHub Actions CI/CD pipeline for Kubernetes Operators showing code commit, pull request checks, Go lint and tests, envtest validation, container image build, optional OLM bundle validation, and release artifact promotion


Kubernetes Operator CI/CD in 60 seconds

  • Run fast checks on every pull request: gofmt, go vet, unit tests, generated manifests, and pinned envtest.
  • Use envtest to test controllers against a real Kubernetes API server and etcd without starting a full cluster.
  • Build operator images on pull requests, but push images only from trusted branches, tags, or release workflows.
  • Tag images with commit SHAs and promote immutable digests instead of relying on latest.
  • Run slower kind, scorecard, and full e2e suites on main, release branches, scheduled workflows, or manual gates.
  • Cache Go modules, Go build output, Docker layers, and envtest assets with explicit version keys.
  • Treat operator-sdk bundle validate as packaging validation, not runtime proof.

Complete GitHub Actions workflow for a Kubernetes Operator

Create .github/workflows/operator-ci.yml in your operator repository:

yaml
name: operator-ci

on:
  pull_request:
  push:
    branches:
      - main
    tags:
      - 'v*'
  workflow_dispatch:

concurrency:
  group: operator-ci-${{ github.workflow }}-${{ github.ref }}
  cancel-in-progress: true

env:
  GO_VERSION_FILE: go.mod
  ENVTEST_K8S_VERSION: '1.31.x'
  IMAGE_NAME: ghcr.io/${{ github.repository }}/manager

jobs:
  test:
    name: lint, unit, envtest
    runs-on: ubuntu-latest
    timeout-minutes: 15
    permissions:
      contents: read

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Go
        uses: actions/setup-go@v5
        with:
          go-version-file: ${{ env.GO_VERSION_FILE }}
          cache: true

      - name: Cache envtest assets
        uses: actions/cache@v4
        with:
          path: ~/.local/share/kubebuilder-envtest
          key: envtest-${{ runner.os }}-${{ env.ENVTEST_K8S_VERSION }}-${{ hashFiles('**/go.sum') }}
          restore-keys: |
            envtest-${{ runner.os }}-${{ env.ENVTEST_K8S_VERSION }}-

      - name: Download dependencies
        run: go mod download

      - name: Check formatting
        run: test -z "$(gofmt -l .)"

      - name: Vet
        run: go vet ./...

      - name: Check generated code and manifests
        run: |
          make generate
          make manifests
          git diff --exit-code

      - name: Run unit tests
        run: go test ./... -short -count=1

      - name: Run envtest integration tests
        env:
          ENVTEST_K8S_VERSION: ${{ env.ENVTEST_K8S_VERSION }}
        run: make test

  image:
    name: build operator image
    runs-on: ubuntu-latest
    timeout-minutes: 20
    needs: test
    permissions:
      contents: read
      packages: write
      attestations: write
      id-token: write

    steps:
      - name: Checkout
        uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Login to GHCR
        if: github.event_name != 'pull_request'
        uses: docker/login-action@v3
        with:
          registry: ghcr.io
          username: ${{ github.actor }}
          password: ${{ secrets.GITHUB_TOKEN }}

      - name: Build and optionally push image
        id: build
        uses: docker/build-push-action@v6
        with:
          context: .
          push: ${{ github.event_name != 'pull_request' }}
          tags: |
            ${{ env.IMAGE_NAME }}:sha-${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max

      - name: Attest image provenance
        if: startsWith(github.ref, 'refs/tags/v')
        uses: actions/attest-build-provenance@v2
        with:
          subject-name: ${{ env.IMAGE_NAME }}
          subject-digest: ${{ steps.build.outputs.digest }}
          push-to-registry: true

      - name: Print image digest
        if: github.event_name != 'pull_request'
        run: |
          echo "Image digest: ${{ steps.build.outputs.digest }}"

This workflow gives you the baseline most operator repositories need:

  • Pull requests prove the code compiles, generated files are current, and controller tests pass with envtest.
  • Pull requests build the image but do not push it.
  • Pushes to main or tags publish an image to GitHub Container Registry.
  • The image is tagged by commit SHA, while the digest remains available for promotion.
  • concurrency cancels stale CI runs when a contributor pushes a newer commit.
  • Job-level permissions keep the test job read-only and grant registry/attestation permissions only to the image job.

I parsed this workflow locally with PyYAML before adding it. The first draft used an inline run: echo "Image digest: ..." line, and the parser rejected it because the colon made the scalar ambiguous. The version above uses a block scalar for that command.

If your repository lives in a monorepo, add a default working directory:

yaml
defaults:
  run:
    working-directory: operators/my-operator

Place it at the workflow root, then update context: in docker/build-push-action to the same path.


Validate the workflow before committing

GitHub catches workflow syntax only after you push. Run a local parser first if your editor does not validate GitHub Actions YAML. I used Python and PyYAML for the workflow above:

bash
python3 - <<'PY'
from pathlib import Path
import yaml

workflow = Path('.github/workflows/operator-ci.yml')
data = yaml.safe_load(workflow.read_text())
print('name:', data['name'])
print('jobs:', ','.join(data['jobs'].keys()))
print('test permissions:', data['jobs']['test']['permissions'])
print('image permissions:', data['jobs']['image']['permissions'])
PY

Sample output from the validation run:

text
name: operator-ci
jobs: test,image
test permissions: {'contents': 'read'}
image permissions: {'contents': 'read', 'packages': 'write', 'attestations': 'write', 'id-token': 'write'}

This does not replace actionlint, but it catches indentation and YAML scalar mistakes before CI sees the file. If your team can install actionlint, use it as the stronger local and CI check.


What this pipeline is trying to prove

From pull request to merge-ready artifact

A practical Kubernetes Operator CI pipeline answers a few important questions before code reaches main:

  1. Does the code still compile and pass tests?
  2. Are generated artifacts such as CRDs, deepcopy code, RBAC, and manifests still in sync with the Go API types?
  3. Does the controller still reconcile correctly against a real Kubernetes API server through envtest?
  4. Can CI build the same manager image that will later be promoted?
  5. Does the repository avoid leaking release secrets into untrusted pull request workflows?
  6. If the project ships OLM metadata, does the bundle still validate?

A green pipeline does not prove the operator works in every cluster scenario, but it removes many common regressions before human review and release.

What you deliberately skip in early CI

Not every check belongs on every pull request. kind clusters, long-running end-to-end suites, scorecard tests, fuzzing, and soak tests often belong in scheduled workflows, main branch gates, release branches, or manual workflow_dispatch jobs.

Fast pull request CI encourages frequent commits and gives contributors feedback in minutes rather than hours.


Makefile targets CI expects

The workflow above assumes your project behaves like a normal Kubebuilder or Operator SDK repository.

A typical Makefile should provide:

Target Purpose in CI
make generate Regenerates deepcopy code and generated Go artifacts
make manifests Regenerates CRDs, RBAC, webhook, and manager manifests
make test Runs controller tests, usually with envtest setup
make docker-build IMG=... Builds the manager image when using Make instead of Buildx
make bundle Generates an OLM bundle when the project publishes one

The git diff --exit-code step after make generate and make manifests is important. It fails CI when a developer changes API types but forgets to commit updated CRDs or generated files.

If your repository separates fast unit tests and envtest suites using build tags or package boundaries, split them into separate CI steps. Otherwise, a single make test target is usually simpler and less surprising for Kubebuilder-style repositories.


Workflow shape: pull request, main, and release

Pull request workflow

On pull requests, run checks that are safe for forks and fast enough for code review:

  • gofmt or gofumpt.
  • go vet.
  • go test ./... -short.
  • Generated artifact drift checks.
  • make test or a targeted envtest test command.
  • Docker image build without pushing.

Do not assume repository secrets are available on forked pull requests. Even when they are technically available in private repositories, it is better to design PR checks so they do not need production credentials.

For concurrent controllers, consider running go test -race ./... in a scheduled workflow or release gate. The race detector can catch shared-state bugs in controller-runtime code, but it increases runtime enough that many teams keep it out of the fastest PR path.

Push to main

On push to main, run the same checks and then publish a commit-SHA image:

yaml
ghcr.io/OWNER/REPO/manager:sha-${{ github.sha }}

You may also publish a convenience tag such as main, but use the immutable digest from the build output for promotion to staging or production.

Release tags

On release tags such as v1.4.2, publish semver image tags and release artifacts:

  • ghcr.io/OWNER/REPO/manager:v1.4.2
  • OLM bundle image or bundle directory.
  • SBOM, provenance, and signatures if your organization requires supply-chain metadata.

The workflow above includes actions/attest-build-provenance@v2 for tag builds. That step needs attestations: write and id-token: write, so keep those permissions on the publishing job only. Add signing with cosign if your registry, marketplace, or internal policy requires signatures in addition to GitHub artifact attestations.

Keep release publishing in a protected workflow or GitHub Environment when production credentials are involved.


envtest in GitHub Actions

Why envtest belongs in operator CI

envtest starts a real Kubernetes API server and etcd, but it does not run a kubelet, scheduler, CNI, DNS, or a full Kubernetes control plane and node environment. That makes it a strong fit for controller reconciliation tests:

  • CRD schema and API validation.
  • Create, update, list, watch, status, and finalizer behavior.
  • Reconcile loops that depend on Kubernetes API state.
  • Webhook logic when configured in the envtest environment.

It is usually much faster than a kind job and more realistic than a fake client.

Pinning envtest assets

Pin the Kubernetes asset version so CI changes only when you choose to upgrade:

yaml
env:
  ENVTEST_K8S_VERSION: '1.31.x'

If your generated Makefile already uses setup-envtest, make make test consume that version. For example, many Kubebuilder projects have a pattern similar to:

makefile
ENVTEST_K8S_VERSION = 1.31.x
ENVTEST = $(LOCALBIN)/setup-envtest

.PHONY: test
test: manifests generate fmt vet envtest
	KUBEBUILDER_ASSETS="$$( $(ENVTEST) use $(ENVTEST_K8S_VERSION) --bin-dir $(LOCALBIN) -p path )" go test ./... -coverprofile cover.out

The exact Makefile may differ by Kubebuilder or Operator SDK version, but the CI principle is the same: pin the asset version, cache the download path, and run the same target developers run locally.

Matrix testing across Kubernetes versions

Use a Kubernetes version matrix only when your support policy needs it. A practical matrix usually tests the oldest and newest supported Kubernetes versions:

yaml
strategy:
  fail-fast: false
  matrix:
    envtest-k8s-version:
      - '1.28.x'
      - '1.30.x'

Then set:

yaml
ENVTEST_K8S_VERSION: ${{ matrix.envtest-k8s-version }}

Avoid a broad matrix on every pull request unless the project is small enough that CI remains fast.


Build and publish the manager image

For most operator repositories, use docker/build-push-action with BuildKit caching. It builds the same Dockerfile used for releases and can push only on trusted events.

yaml
- name: Build and optionally push image
  uses: docker/build-push-action@v6
  with:
    context: .
    push: ${{ github.event_name != 'pull_request' }}
    tags: |
      ghcr.io/${{ github.repository }}/manager:sha-${{ github.sha }}
    cache-from: type=gha
    cache-to: type=gha,mode=max

Use commit SHA tags for traceability and semver tags during releases. Mutable tags such as main or latest are convenient for humans, but they should not be the source of truth for environment promotion.

For GHCR, this workflow needs:

yaml
permissions:
  contents: read
  packages: write

For cloud registries, prefer OIDC federation over static secrets:

  • AWS: GitHub OIDC to an IAM role, then login to ECR.
  • GCP: GitHub OIDC to Workload Identity Federation, then login to Artifact Registry.
  • Azure: GitHub OIDC to a federated credential, then login to ACR.

Caching Go modules, Docker layers, and envtest

Caching is not just a speed optimization. It keeps pull request feedback fast enough that developers trust the pipeline.

Use actions/setup-go built-in caching for simple Go repositories:

yaml
- uses: actions/setup-go@v5
  with:
    go-version-file: go.mod
    cache: true

Use explicit cache keys when you need custom paths:

yaml
- uses: actions/cache@v4
  with:
    path: |
      ~/.cache/go-build
      ~/go/pkg/mod
    key: go-${{ runner.os }}-${{ hashFiles('**/go.sum') }}

Cache envtest assets separately because they change with the Kubernetes version, not only with go.sum:

yaml
- uses: actions/cache@v4
  with:
    path: ~/.local/share/kubebuilder-envtest
    key: envtest-${{ runner.os }}-${{ env.ENVTEST_K8S_VERSION }}-${{ hashFiles('**/go.sum') }}
    restore-keys: |
      envtest-${{ runner.os }}-${{ env.ENVTEST_K8S_VERSION }}-

If your setup-envtest binary stores assets somewhere else, cache that actual path instead. The path must match your Makefile or the cache will look successful while envtest downloads on every run.

For Docker builds, prefer BuildKit cache:

yaml
cache-from: type=gha
cache-to: type=gha,mode=max

Private Go modules need extra authentication, such as .netrc, SSH deploy keys, GitHub App tokens, or a private Go proxy. Keep that setup isolated from normal build secrets and document it for contributors.


Optional: OLM bundle validation in CI

When bundle validation matters

Run operator-sdk bundle validate if your project publishes an OLM bundle, OperatorHub metadata, or a bundle image.

yaml
- name: Generate bundle
  run: make bundle

- name: Install operator-sdk
  run: |
    curl -sSLo /usr/local/bin/operator-sdk \
      "https://github.com/operator-framework/operator-sdk/releases/download/v1.34.2/operator-sdk_linux_amd64"
    chmod +x /usr/local/bin/operator-sdk

- name: Validate bundle
  run: operator-sdk bundle validate ./bundle

Choose where validation runs based on how your repository manages bundles:

  • If bundle/ is committed to Git, validate it on every pull request.
  • If the bundle is generated only for releases, validate it in the release workflow.
  • If the bundle image reference is injected during release, validate after the final image reference is known.

Pin the operator-sdk version in CI rather than downloading latest. Validation rules and defaults can change across tool versions. I validated related bundle commands locally with operator-sdk v1.42.2; if your project uses an older Kubebuilder or Operator SDK plugin layout, pin the version that matches the project.

For OperatorHub-style validation, prefer the current optional validators:

bash
operator-sdk bundle validate ./bundle \
  --select-optional name=operatorhub/v2 \
  --select-optional name=standardcapabilities \
  --select-optional name=standardcategories

Bundle validation does not replace tests

Bundle validation checks metadata and packaging. It does not prove that your reconciler behaves correctly, that webhooks are reachable, or that the operator installs successfully in a real cluster.

Keep runtime testing separate:

  • envtest for controller behavior.
  • kind for installation and cluster integration.
  • Staging clusters for release confidence.

Optional: kind-based jobs

What kind adds beyond envtest

Use kind when the test needs something envtest does not provide:

  • Kubelet behavior.
  • Service networking and DNS.
  • Admission webhooks with real TLS and webhook registration.
  • Multi-resource interactions that depend on a running cluster.
  • Helm, OLM, or Kustomize installation flows.
  • Smoke tests against the built operator image.

Because kind jobs are slower and consume more CI minutes, they often run on main, release branches, scheduled workflows, or manual workflow_dispatch triggers rather than every pull request.

Minimal kind smoke-test job

yaml
kind:
  name: kind smoke test
  runs-on: ubuntu-latest
  needs:
    - test
    - image
  if: github.event_name != 'pull_request'
  timeout-minutes: 30

  steps:
    - uses: actions/checkout@v4

    - uses: helm/kind-action@v1
      with:
        cluster_name: operator-ci

    - name: Load and install operator
      run: |
        make docker-build IMG=controller:ci
        kind load docker-image controller:ci --name operator-ci
        make deploy IMG=controller:ci
        kubectl rollout status deployment/controller-manager -n system --timeout=120s

Adjust the namespace, deployment name, and Makefile targets to match your project. Many Kubebuilder projects deploy into a namespace ending in -system, not literally system.

Upload debugging artifacts

When a kind job fails, upload enough cluster evidence to debug the run without reproducing it locally:

yaml
- name: Collect cluster diagnostics
  if: failure()
  run: |
    mkdir -p artifacts
    kubectl get all -A > artifacts/resources.txt
    kubectl get events -A --sort-by=.metadata.creationTimestamp > artifacts/events.txt
    kubectl describe pods -A > artifacts/pods.txt
    kubectl logs -n system deploy/controller-manager > artifacts/operator.log || true

- name: Upload diagnostics
  if: failure()
  uses: actions/upload-artifact@v4
  with:
    name: kind-diagnostics
    path: artifacts/

For deeper troubleshooting patterns, see Debugging Kubernetes Operators.


Secrets and least privilege in GitHub Actions

Keep permissions as narrow as possible:

yaml
permissions:
  contents: read
  packages: write

Add id-token: write only for workflows that use OIDC to authenticate to a cloud provider:

yaml
permissions:
  contents: read
  id-token: write

Separate pull request checks from release publishing:

  • Pull request jobs should avoid production secrets.
  • Image publishing should run from trusted branches or tags.
  • Production deployment should use protected GitHub Environments and required reviewers.
  • Long-lived credentials should be replaced with OIDC whenever your registry or cloud provider supports it.

This separation is especially important for public repositories, where forked pull requests have a different trust model.


Common Kubernetes Operator CI failures

Symptom Likely cause Fix
git diff --exit-code fails after make manifests CRDs, RBAC, or generated code were not committed Run make generate manifests locally and commit the output
setup-envtest downloads assets on every run Envtest path is not cached or the cache key changes too often Cache the envtest directory with a key based on OS and Kubernetes version
Tests pass locally but fail in CI Local cluster state hides missing test setup Make tests create all required CRDs, namespaces, schemes, and fixtures
Pull request cannot push to GHCR Forked PRs do not get write credentials Build without pushing on PRs, push only from trusted branches or tags
Bundle validation fails after an image change CSV or bundle image reference drifted Regenerate the bundle after final image tag or digest selection
kind job times out The test is doing too much for PR feedback Move it to main, release, nightly, or workflow_dispatch and upload diagnostics
Webhook tests pass in envtest but fail in kind TLS, webhook service, or admission registration differs in a real cluster Add a kind smoke test for webhook installation

Checklist: production-ready operator CI/CD

Area Recommended setup
Pull request checks gofmt, go vet, unit tests, generated artifact drift check, envtest
Envtest Pin Kubernetes version, cache assets, keep PR runtime short
Image build Build on PRs, push only from trusted branches or release tags
Image tags Use sha-${{ github.sha }} and semver tags; promote digests
OLM bundle Validate with a pinned operator-sdk version if the project ships bundles
kind/e2e Run on main, releases, schedules, or manual workflows when too slow for PRs
Secrets Prefer OIDC; avoid production secrets in pull request jobs
Debugging Upload test logs, coverage, events, pod descriptions, and operator logs

Frequently Asked Questions

1. What should a Kubernetes Operator GitHub Actions workflow run?

A practical workflow should run formatting checks, go vet, unit tests, generated manifest checks, envtest integration tests, and image builds. Add OLM bundle validation only if the repository publishes bundles, and move slower kind or end-to-end tests to main, release, nightly, or manual workflows.

2. Should envtest run on every pull request?

Yes for most small and medium operator repositories if the job finishes in a few minutes. Pin the Kubernetes envtest asset version, cache the downloaded binaries, and split expensive kind or full cluster tests to main branch, scheduled, or release workflows when PR feedback becomes too slow.

3. Do I need kind if I already run envtest?

Not always. envtest is enough for many controller reconciliation tests because it runs a real API server and etcd. Use kind when the test needs kubelet behavior, DNS, Services, webhook registration, admission TLS, or installation flows closer to a real cluster.

4. Where should I store registry credentials for operator image publishing?

Prefer OIDC federation to AWS, GCP, or Azure over long-lived cloud credentials. For GitHub Container Registry, use GITHUB_TOKEN with packages permissions when possible, or use a narrowly scoped fine-grained PAT stored as an encrypted secret.

5. Does operator-sdk bundle validate replace cluster testing?

No. operator-sdk bundle validate checks bundle metadata, CSV structure, and packaging rules. It does not prove the operator reconciles resources correctly at runtime. Keep envtest, kind, or staging-cluster tests for runtime behavior.

6. Should Kubernetes Operator CI publish images from pull requests?

Public repositories usually avoid pushing permanent images from forked pull requests because secrets are not available and the trust boundary is different. Build without pushing on pull requests, then push immutable commit-SHA images from main, release branches, or protected release workflows.

See also

These tutorials in the Kubernetes Operators series fit next in your reading order:

Upstream references

Bottom line: a Kubernetes Operator CI/CD workflow should give developers a fast answer on every pull request, run envtest before merge, build the same image that will be released, push only from trusted events, validate OLM bundles when you publish them, and reserve expensive full-cluster proof for main, releases, scheduled runs, or manual gates.

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with over a decade of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive experience, he excels across development, DevOps, …

  • Red Hat Certified System Administrator in Red Hat OpenStack
  • Certified Kubernetes Application Developer (CKAD)
  • Red Hat Certified Specialist in Ansible Automation
  • Go (programming language)
  • Python (programming language)
  • DevOps
  • Computer Security