From Commit to Cluster: End-to-End Operator Release Pipeline

Tech reviewed: Deepak Prasad
From Commit to Cluster: End-to-End Operator Release Pipeline

Releasing a Kubernetes Operator is not the same as deploying a stateless application. The release includes a controller image, CRDs, RBAC, webhooks, manifests or Helm charts, optional OLM bundles, and a rollback story that must respect Kubernetes API compatibility.

Most people searching for a Kubernetes Operator release pipeline want a production path from Git tag to running cluster:

Git tag -> image digest -> rendered manifests -> staging smoke test -> GitOps promotion -> production verification -> rollback plan

This guide turns that path into a practical release runbook. It builds on CI/CD with GitHub Actions, CRD version upgrades and conversion webhooks, Helm-based Operator vs Flux vs Argo CD, and Debugging Kubernetes Operators.


Operator release pipeline in 60 seconds

  • Treat the Git tag, manager image digest, CRD YAML, install manifests, Helm chart, and optional OLM bundle as one tested release unit.
  • Build and publish the manager image from a semver tag, but promote by immutable digest.
  • Render install artifacts from the same commit that produced the image.
  • Deploy to staging first, apply a sample CustomResource, and wait for meaningful status conditions.
  • Promote to production by changing GitOps state, not by running one-off kubectl set image commands.
  • Roll back the whole release tuple when CRDs, conversion webhooks, storage versions, or finalizers changed.
  • Publish OLM bundles only after the image digest and release metadata are known.

Release flow: Git tag to cluster

A practical operator release flow looks like this:

Step Output Gate before continuing
Create semver Git tag v1.5.0 Tag points to reviewed release commit
Build manager image repo/operator@sha256:... Image builds, scans, and optional signatures pass
Render install artifacts Helm chart, Kustomize output, raw YAML CRDs, RBAC, webhooks, and manager image digest match the tag
Validate API packaging CRD validation, optional OLM bundle Bundle and CRDs validate with pinned tooling
Deploy to staging Running manager and CRDs Rollout, health probes, metrics, and webhooks pass
Smoke-test sample CR Real CustomResource reaches Ready Status, events, finalizers, and cleanup behave correctly
Promote via GitOps Environment overlay or Helm values updated Approval, diff, and policy checks pass
Verify production Same digest running in cluster Post-deploy smoke and dashboards are clean

This structure matters because an operator release is an API release. A broken controller can usually be rolled back quickly. A broken CRD storage version or conversion webhook can trap existing resources in a much harder failure mode.


The release artifact contract

Every production release should answer one question: which exact artifacts were tested together?

Artifact Source Version identity Promotion rule
Manager image Dockerfile and Go code Image digest plus semver tag Promote by digest, not by mutable tag
CRDs config/crd/bases or chart templates Same Git tag as controller Apply compatible CRDs before manager rollout
RBAC and webhooks config/rbac, config/webhook, chart templates Same Git tag as controller Validate permissions and webhook reachability
Helm chart Chart directory or OCI chart Chart version and appVersion Values should reference image digest or exact tag
Kustomize overlay base plus environment overlays Git commit in environment repo Promotion is a reviewed Git change
OLM bundle bundle/, CSV, metadata, bundle image Bundle version and image digest Validate before catalog publishing
Sample CR Test fixture or release smoke test Versioned with the release Must reconcile to expected status

If one artifact is owned by another team, document that ownership explicitly. For example, a platform team may own CRDs and webhooks while application teams own CustomResources. That can work, but only if the compatibility matrix and rollout order are clear.


Stage 1: Build, tag, and sign the manager image

Semver tags and immutable digests

Use semver tags such as v1.5.0 for releases. The tag should point to the commit that contains:

  • controller source code,
  • generated CRDs and RBAC,
  • install manifests or chart changes,
  • release notes or changelog entry,
  • OLM bundle changes if the bundle is committed.

Build and push the manager image from that tag, then record the digest:

text
ghcr.io/example/database-operator@sha256:3d2f...

Human-readable tags such as v1.5.0 are useful, but production promotion should use the digest. A mutable tag can be repushed; a digest identifies the exact image content.

Registry layout

Use a predictable registry layout:

text
ghcr.io/example/database-operator:v1.5.0
ghcr.io/example/database-operator:v1.5
ghcr.io/example/database-operator:sha-a1b2c3d
ghcr.io/example/database-operator@sha256:...

Use per-commit tags for staging and semver tags for releases. Avoid latest in production manifests unless the cluster is intentionally disposable.

Signing, SBOM, and provenance

Add signing, SBOM, and provenance after the basic release path is reliable:

  • sign the image digest with Cosign or Sigstore,
  • publish an SBOM for the manager image,
  • attach provenance from CI,
  • verify signatures before promotion if your platform supports it.

These controls help consumers trust the digest, but they do not replace staging smoke tests.


Stage 2: Image to deployable manifests

Raw YAML, Helm, or Kustomize

Operator installs commonly use one of three packaging styles:

Packaging Best fit Watch out for
Raw YAML Small internal operators Harder environment-specific overrides
Kustomize Platform-owned overlays and GitOps Image and namespace substitutions must stay visible in Git
Helm Teams already standardizing on charts CRD upgrade behavior and hook ordering need discipline

Kubebuilder's make deploy is a useful development starting point. Production installs usually add resource requests, Pod security settings, topology spread, affinity, image pull secrets, and tighter RBAC. See RBAC minimum permissions before broadening permissions.

Helm chart values

For a Helm-packaged operator, keep image settings explicit:

yaml
image:
  repository: ghcr.io/example/database-operator
  tag: v1.5.0
  digest: sha256:3d2f...

If both tag and digest are supported, production templates should prefer the digest. The tag remains useful for humans reading values files.

Kustomize image substitution

For Kustomize, CI or the GitOps promotion job can update the image:

bash
kustomize edit set image controller=ghcr.io/example/database-operator@sha256:3d2f...

Commit that change to the environment repository instead of applying it directly to the cluster. The Git commit becomes the promotion audit trail.


Stage 3: Staging smoke test

Apply CRDs and webhooks before the manager

Order matters:

  1. Apply CRDs.
  2. Wait for CRDs to become established.
  3. Apply RBAC and service accounts.
  4. Ensure webhook service and certificates are ready if the operator uses webhooks.
  5. Apply webhook configurations.
  6. Roll out the manager Deployment.
  7. Apply the sample CustomResource.

Useful checks:

bash
kubectl wait --for=condition=Established crd/widgets.example.com --timeout=120s
kubectl rollout status deployment/database-operator-controller-manager -n database-operator-system --timeout=120s
kubectl get validatingwebhookconfiguration
kubectl get mutatingwebhookconfiguration

For GitOps, encode ordering with Argo CD sync waves or Flux dependencies when CRDs, webhooks, manager, and sample CRs are managed as separate units.

Apply a sample CustomResource

Use the smallest real CustomResource from your docs:

yaml
apiVersion: database.example.com/v1
kind: Database
metadata:
  name: smoke-test
spec:
  size: small

Then assert a meaningful condition:

bash
kubectl wait database smoke-test --for=condition=Ready --timeout=180s

If your CRD does not expose a Ready condition, poll the exact status field your users rely on:

bash
kubectl get database smoke-test -o jsonpath='{.status.phase}'

For condition design, see Kubernetes status and conditions.

Negative and cleanup checks

A release smoke test should prove more than "the Pod is running":

  • Apply an invalid spec and confirm CRD validation or admission webhook rejects it.
  • Delete the sample CR and confirm finalizers complete.
  • Check operator logs for reconcile errors.
  • Check Kubernetes events in the operator namespace.
  • Confirm /healthz, /readyz, and metrics endpoints behave as expected.

Related deep dives: health probes, Prometheus metrics, and finalizers.


Stage 4: GitOps promotion

CI produces the digest; CD consumes it

CI should export a single fact:

text
IMAGE_DIGEST=ghcr.io/example/database-operator@sha256:3d2f...

CD or GitOps then consumes that digest by updating:

  • a Helm values file,
  • a Kustomize image patch,
  • a raw manifest,
  • an OLM bundle or catalog lane.

Avoid humans changing live Deployments without a matching Git change. If production drifts from Git, rollback and audit become much harder.

Argo CD and Flux ordering

Operator releases often need ordering beyond a normal app deployment.

For Argo CD, common ordering is:

Sync wave Resource
-2 CRDs
-1 namespace, service account, RBAC
0 webhook service and certificates
1 webhook configurations
2 manager Deployment
3 sample CR or smoke-test job

For Flux, split CRDs, manager, and sample CRs into separate Kustomization resources when you need dependency ordering. Use dependsOn so the manager does not start before the API exists.

GitHub Environments and approvals

If GitHub Actions handles the promotion step, use GitHub Environments for production:

  • required reviewers,
  • environment-scoped secrets,
  • deployment history,
  • wait timers or change windows when needed.

Keep production registry credentials and cluster credentials out of pull request workflows.


Rollback matrix

The most common operator release mistake is assuming every rollback is only a Deployment rollback.

Change in failed release Is Deployment rollback enough? Safer rollback
Controller-only logic bug Usually yes Roll back manager image digest
RBAC regression No, if permissions changed Restore RBAC and manager from prior tag
Added optional CRD field Maybe Confirm old controller ignores or tolerates field
Removed, renamed, or pruned CRD field Usually no Restore compatible CRD and controller tuple
Conversion webhook changed Risky Restore webhook service, config, CRD, and controller together
Storage version changed No Follow migration plan; do not assume simple rollback
Finalizer behavior changed Usually no Verify cleanup logic before rolling back blindly
Operand data changed No Repair data plane separately from operator rollback

Treat the manager image, CRDs, webhooks, RBAC, Helm chart or Kustomize render, and OLM bundle as a tuple. Roll back one piece only when your compatibility matrix says it is safe.


CRD and controller versioning

Single release train

For most teams, one semver release should include:

  • manager image,
  • CRD YAML,
  • RBAC and webhooks,
  • Helm chart or Kustomize overlays,
  • OLM bundle if used,
  • changelog entry for breaking API changes.

This makes v1.5.0 mean "the exact API and controller we tested together."

Breaking vs non-breaking CRD changes

Non-breaking additions are usually easier:

  • adding optional fields,
  • adding enum values when clients tolerate them,
  • adding status fields,
  • adding printer columns.

Riskier changes need a migration plan:

  • removing or renaming fields,
  • changing validation in a way that rejects existing objects,
  • changing storage versions,
  • changing conversion webhooks,
  • changing finalizer semantics.

For versioned APIs, follow the patterns in CRD version upgrades and conversion webhooks.

Safe rollout order

When conversion or webhook behavior changes, prefer this order:

  1. Ensure the conversion webhook service can answer for old and new versions.
  2. Apply compatible CRD changes.
  3. Wait for CRDs to become established.
  4. Roll out the new manager.
  5. Verify existing CRs can still be listed, watched, and reconciled.
  6. Only then promote to production.

The key question is not "did the Deployment roll out?" It is "can the Kubernetes API still serve every existing CustomResource correctly?"


OLM bundle and catalog publishing

OLM adds another release lane. The manager image and CRDs still come first; the bundle packages them for OLM users.

A typical OLM release path:

  1. Build and push the manager image.
  2. Generate the bundle with the final image reference.
  3. Validate the bundle with a pinned operator-sdk version.
  4. Build and push the bundle image.
  5. Test install or upgrade from the bundle.
  6. Update a catalog or file-based catalog.
  7. Publish to internal catalog or submit to OperatorHub if applicable.

Common commands:

bash
make bundle IMG=ghcr.io/example/database-operator@sha256:3d2f...
operator-sdk bundle validate ./bundle
make bundle-build BUNDLE_IMG=ghcr.io/example/database-operator-bundle:v1.5.0
make bundle-push BUNDLE_IMG=ghcr.io/example/database-operator-bundle:v1.5.0
operator-sdk run bundle ghcr.io/example/database-operator-bundle:v1.5.0

Keep bundle metadata in lockstep with the same image digest tested in staging. Bundle validation checks packaging; it does not replace runtime smoke tests.

For a deeper packaging guide, see OLM bundles & OperatorHub.


Pre-flight checks before production

Before promoting to production, answer these questions:

Check Why it matters
Does the production change point to the same digest tested in staging? Prevents promoting a different binary
Are CRDs established and compatible with existing CRs? Avoids API-serving failures
Are conversion webhooks reachable? Prevents list/watch failures during version conversion
Did RBAC change? Avoids runtime permission failures
Did the sample CR reach the expected condition? Proves reconciliation, not only rollout
Did invalid input fail as expected? Proves validation and admission behavior
Did finalizer cleanup complete? Avoids stuck deletes during incidents
Are metrics, alerts, and logs clean? Catches silent reconcile failures
Is rollback tested or documented? Reduces pressure during incidents

Checklist: gate, artifact, verification

Gate Artifact Verify
Tag pushed Semver Git tag Tag points to release commit
Image build Digest-pinned manager image Pull in staging, optional signature/SBOM checks
Manifest render Helm chart, Kustomize output, or raw YAML CRDs, RBAC, webhooks, and image digest match
Bundle validation OLM bundle and bundle image operator-sdk bundle validate, install or upgrade test
Staging apply Running manager and CRDs Rollout, health, readiness, metrics, and logs
Smoke test Sample CustomResource Status condition, events, cleanup, invalid-spec rejection
Production promotion GitOps overlay or Helm values Same digest as staging, approval complete
Post-deploy verification Production operator Sample or canary CR, dashboards, alerts
Rollback rehearsal Previous release tuple Restore manager, CRDs, RBAC, webhooks, and bundle as needed

Frequently Asked Questions

1. Should CRDs and the controller image share one Git tag?

For most teams, yes. A single semver tag should identify the manager image digest, CRD YAML, Helm chart or Kustomize render, and optional OLM bundle tested together. Splitting versions across artifacts is how clusters end up with controller and API skew.

2. Can I roll back only the operator Deployment?

Only for controller-only bugs where CRDs, stored versions, webhooks, and operand data remain compatible. If the release changed CRDs, conversion webhooks, storage versions, or finalizer behavior, roll back the matched release tuple instead of only the Deployment.

3. Where does GitOps fit in an operator release pipeline?

CI builds immutable artifacts and records the image digest. GitOps promotes those artifacts by updating Helm values, Kustomize images, or environment overlays, then Argo CD or Flux applies the staged change to clusters with review, ordering, and drift detection.

4. Should I deploy CRDs separately from the operator manager?

It depends on ownership. Platform teams often manage CRDs separately so API changes are reviewed carefully. Smaller teams may ship CRDs and manager together. Either way, define ordering, compatibility, and rollback rules before production.

5. How should I test an operator release before production?

Deploy the exact image digest and manifests to staging, wait for CRDs and webhooks, roll out the manager, apply a minimal CustomResource, verify status conditions, check metrics and events, test invalid input, and confirm finalizer cleanup.

6. Where does OLM bundle publishing fit?

OLM publishing is an additional release lane after the image and manifests are known. Generate the bundle, validate it with a pinned operator-sdk version, build and push the bundle image, test install or upgrade, then update the catalog or OperatorHub submission.

See also

Upstream references

Bottom line: release a Kubernetes Operator as a tested artifact set, not as a lone Deployment. Build the image once, promote the digest through Git, keep CRDs and webhooks compatible, prove the release with a real CustomResource in staging, and roll back the whole tuple when API shape or stored data is involved.

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with over a decade of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive experience, he excels across development, DevOps, …

  • Red Hat Certified System Administrator in Red Hat OpenStack
  • Certified Kubernetes Application Developer (CKAD)
  • Red Hat Certified Specialist in Ansible Automation
  • Go (programming language)
  • Python (programming language)
  • DevOps
  • Computer Security