Testing Kubernetes Operators with envtest, Fake Client, and kind

Last reviewed: by
Testing Kubernetes Operators with envtest, Fake Client, and kind

Validation note: The article code was validated end to end with Operator SDK v1.42.2, Go 1.24.4, kubectl v1.36.1, kind v0.31.0, and Docker 29.2.1.

This article completes the Go-based Kubernetes Operator tutorial. The foundation tutorial built the first controller. The controller-runtime tutorial added the important operator behaviors: multiple resources, status, finalizers, drift handling, watches, and webhooks.

This testing and release tutorial answers the question every serious reader asks next:

How can correctness be demonstrated for this operator, and how should releases be shipped safely?

The answer is not one test type. Kubernetes operators need layered verification because different bugs appear at different layers.

Layer What it proves What it cannot prove
Pure unit tests Builders, default helper functions, condition helpers Kubernetes API behavior
Fake client tests Simple reconcile paths, create/update calls CRD validation, status subresource, webhooks, RBAC
envtest Real API server, CRDs, status, generation, webhooks Deployed manager image, RBAC, Pods, Services
kind smoke tests Real deployment, image, RBAC, CRDs, webhooks, Pods Fine-grained unit behavior

That layered answer is important. Many operator tutorials stop after a fake-client test. That is not enough for production code.

This is one of the most common gaps in Kubernetes operator content. Readers search for "operator sdk test reconciler", "kubebuilder envtest example", "controller-runtime fake client", "test validating webhook operator sdk", and "kind e2e test operator" because the testing boundary is confusing:

  • A fake client can store objects but is not a Kubernetes API server
  • envtest runs kube-apiserver and etcd but not kubelet or the scheduler
  • A kind cluster runs real workloads but is slower and harder to debug in unit-test style
  • Webhooks require certificates and admission registration, so calling a validator function directly is not the same as testing admission
  • RBAC errors only show up when the deployed manager uses its ServiceAccount

The official Kubebuilder envtest reference, controller-runtime fake client package, and kind quick start are the primary references. This article turns those pieces into a practical test strategy for the DemoApp operator built in the previous tutorials.

The following table is a pragmatic testing policy for most Go operators:

When Run
Every local edit focused Go unit tests for builders and helpers
Every pull request unit tests, fake-client tests, envtest controller tests
Every pull request that touches webhooks envtest webhook tests
Main branch or nightly kind smoke test
Before a tagged release kind install/upgrade test using old and new CR examples

This keeps fast feedback fast while still proving the things that only Kubernetes can prove. A unit test is not "bad"; it is just a narrow tool. The mistake is using a narrow tool as if it covered the whole operator lifecycle.

Go operator series (3 parts): Part 1 — Operator SDK foundation · Part 2 — controller-runtime · Part 3 — testing and shipping (this page) · Operator tutorial hub


Step 1 - Keep pure logic testable

The controller-runtime tutorial moved builders into internal/controller/resources.go:

  • buildConfigMap
  • buildService
  • buildDeployment
  • labelsFor
  • desiredReplicas
  • desiredPort
  • desiredMessage

That was not just organization. It makes the easiest tests fast and reliable.

Create internal/controller/resources_test.go:

go
package controller

import (
	"testing"

	"k8s.io/utils/ptr"

	demov1alpha1 "github.com/example/demoapp-operator/api/v1alpha1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)

func TestBuildDeploymentUsesDemoAppSpec(t *testing.T) {
	app := &demov1alpha1.DemoApp{
		ObjectMeta: metav1.ObjectMeta{
			Name:      "hello",
			Namespace: "default",
		},
		Spec: demov1alpha1.DemoAppSpec{
			Image:    "nginx:1.27",
			Replicas: ptr.To[int32](3),
			Port:     8081,
			Message:  "from test",
		},
	}

	deploy := buildDeployment(app)

	if deploy.Name != "hello" {
		t.Fatalf("expected deployment name hello, got %s", deploy.Name)
	}
	if *deploy.Spec.Replicas != 3 {
		t.Fatalf("expected 3 replicas, got %d", *deploy.Spec.Replicas)
	}
	container := deploy.Spec.Template.Spec.Containers[0]
	if container.Image != "nginx:1.27" {
		t.Fatalf("expected image nginx:1.27, got %s", container.Image)
	}
	if container.Ports[0].ContainerPort != 8081 {
		t.Fatalf("expected port 8081, got %d", container.Ports[0].ContainerPort)
	}
}

func TestBuildConfigMapUsesMessageAndVersion(t *testing.T) {
	app := &demov1alpha1.DemoApp{
		ObjectMeta: metav1.ObjectMeta{
			Name:      "hello",
			Namespace: "default",
		},
		Spec: demov1alpha1.DemoAppSpec{
			Message:       "custom message",
			ConfigVersion: "v2",
		},
	}

	cm := buildConfigMap(app)

	if cm.Name != "hello-config" {
		t.Fatalf("expected hello-config, got %s", cm.Name)
	}
	if cm.Data["message"] != "custom message" {
		t.Fatalf("unexpected message: %s", cm.Data["message"])
	}
	if cm.Data["configVersion"] != "v2" {
		t.Fatalf("unexpected configVersion: %s", cm.Data["configVersion"])
	}
}

Run:

bash
go test ./internal/controller -run TestBuild

Validated output:

bash
ok  	github.com/example/demoapp-operator/internal/controller	0.176s

These tests do not need Kubernetes. If they fail, the bug is in your desired-state construction, not in controller-runtime.

Pure tests like these should be boring and numerous. They are cheap, deterministic, and easy to debug. They are also the best reason to avoid burying resource construction inside one giant Reconcile function. If a reviewer asks, "What happens when replicas is nil?" or "Does the Service selector still match the Pod labels?", a builder test can answer immediately.

But these tests do not tell you whether the CRD schema is valid, whether the manager can start, or whether a Service actually routes traffic in a cluster. That is why the next layers exist.


Step 2 - Add a fake-client reconcile test

The fake client is useful when you want to test a simple reconcile path without starting an API server.

Create internal/controller/demoapp_fake_test.go:

go
package controller

import (
	"context"
	"testing"

	appsv1 "k8s.io/api/apps/v1"
	corev1 "k8s.io/api/core/v1"
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/apimachinery/pkg/runtime"
	"k8s.io/utils/ptr"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/client/fake"

	demov1alpha1 "github.com/example/demoapp-operator/api/v1alpha1"
)

func TestReconcileCreatesChildrenWithFakeClient(t *testing.T) {
	scheme := runtime.NewScheme()
	if err := demov1alpha1.AddToScheme(scheme); err != nil {
		t.Fatal(err)
	}
	if err := appsv1.AddToScheme(scheme); err != nil {
		t.Fatal(err)
	}
	if err := corev1.AddToScheme(scheme); err != nil {
		t.Fatal(err)
	}

	app := &demov1alpha1.DemoApp{
		ObjectMeta: metav1.ObjectMeta{
			Name:      "hello",
			Namespace: "default",
		},
		Spec: demov1alpha1.DemoAppSpec{
			Image:    "nginx:1.27",
			Replicas: ptr.To[int32](2),
			Port:     80,
			Message:  "fake test",
		},
	}

	c := fake.NewClientBuilder().
		WithScheme(scheme).
		WithObjects(app).
		WithStatusSubresource(&demov1alpha1.DemoApp{}).
		Build()

	r := &DemoAppReconciler{
		Client: c,
		Scheme: scheme,
	}

	req := ctrl.Request{
		NamespacedName: client.ObjectKeyFromObject(app),
	}

	_, err := r.Reconcile(context.Background(), req)
	if err != nil {
		t.Fatal(err)
	}

	// The first pass adds the finalizer and returns. A real watch would enqueue
	// the object again after that update; the fake client test calls Reconcile
	// a second time explicitly.
	_, err = r.Reconcile(context.Background(), req)
	if err != nil {
		t.Fatal(err)
	}

	var deploy appsv1.Deployment
	if err := c.Get(context.Background(), client.ObjectKey{Namespace: "default", Name: "hello"}, &deploy); err != nil {
		t.Fatal(err)
	}
	if *deploy.Spec.Replicas != 2 {
		t.Fatalf("expected 2 replicas, got %d", *deploy.Spec.Replicas)
	}

	var svc corev1.Service
	if err := c.Get(context.Background(), client.ObjectKey{Namespace: "default", Name: "hello"}, &svc); err != nil {
		t.Fatal(err)
	}

	var cm corev1.ConfigMap
	if err := c.Get(context.Background(), client.ObjectKey{Namespace: "default", Name: "hello-config"}, &cm); err != nil {
		t.Fatal(err)
	}
}

Run:

bash
go test ./internal/controller -run TestReconcileCreatesChildrenWithFakeClient

Validated output:

bash
ok  	github.com/example/demoapp-operator/internal/controller	0.176s

This catches many ordinary controller mistakes:

  • missing scheme registration
  • wrong child object name
  • wrong namespace
  • wrong labels
  • missing owner reference
  • wrong replica count

But do not over-trust it.

Fake client does not prove:

  • CRD OpenAPI validation
  • admission webhooks
  • status subresource behavior exactly as the API server handles it
  • RBAC
  • manager startup
  • real watch behavior
  • Pods becoming ready

For those, use envtest and kind.

The fake client is best treated as a controller logic test, not a Kubernetes behavior test. It is useful for verifying that your reconciler calls the client with the objects you expect. It is weak whenever the real API server would add behavior: defaulting, validation, managed fields, resource versions, generation changes, status subresource boundaries, and admission.


Step 3 - Use envtest for API-server behavior

envtest starts a real kube-apiserver and etcd for your tests. It does not start kubelet, scheduler, or controller-manager, so it will not create real Pods. But it does prove API-server behavior.

Operator SDK projects usually scaffold a test suite under test/ or internal/controller/suite_test.go. The important pieces are:

go
package controller

import (
	"context"
	"path/filepath"
	"testing"

	. "github.com/onsi/ginkgo/v2"
	. "github.com/onsi/gomega"
	appsv1 "k8s.io/api/apps/v1"
	corev1 "k8s.io/api/core/v1"
	"k8s.io/apimachinery/pkg/runtime"
	clientgoscheme "k8s.io/client-go/kubernetes/scheme"
	ctrl "sigs.k8s.io/controller-runtime"
	"sigs.k8s.io/controller-runtime/pkg/client"
	"sigs.k8s.io/controller-runtime/pkg/envtest"

	demov1alpha1 "github.com/example/demoapp-operator/api/v1alpha1"
)

var (
	ctx       context.Context
	cancel    context.CancelFunc
	testEnv   *envtest.Environment
	k8sClient client.Client
	scheme    *runtime.Scheme
)

func TestControllers(t *testing.T) {
	RegisterFailHandler(Fail)
	RunSpecs(t, "Controller Suite")
}

var _ = BeforeSuite(func() {
	ctx, cancel = context.WithCancel(context.Background())

	scheme = runtime.NewScheme()
	Expect(clientgoscheme.AddToScheme(scheme)).To(Succeed())
	Expect(appsv1.AddToScheme(scheme)).To(Succeed())
	Expect(corev1.AddToScheme(scheme)).To(Succeed())
	Expect(demov1alpha1.AddToScheme(scheme)).To(Succeed())

	testEnv = &envtest.Environment{
		CRDDirectoryPaths: []string{
			filepath.Join("..", "..", "config", "crd", "bases"),
		},
		ErrorIfCRDPathMissing: true,
	}

	cfg, err := testEnv.Start()
	Expect(err).NotTo(HaveOccurred())
	Expect(cfg).NotTo(BeNil())

	k8sClient, err = client.New(cfg, client.Options{Scheme: scheme})
	Expect(err).NotTo(HaveOccurred())
})

var _ = AfterSuite(func() {
	cancel()
	Expect(testEnv.Stop()).To(Succeed())
})

Before running envtest, install the test binaries. Makefile targets vary by scaffold — use the target your project documents (make setup-envtest, make envtest, or the setup-envtest stanza from the Operator SDK go/v4 Makefile).

bash
make setup-envtest

Then run:

bash
KUBEBUILDER_ASSETS="$(./bin/setup-envtest use 1.33.0 -p path)" \
  go test ./internal/controller ./internal/webhook/v1alpha1

Validated output:

bash
ok  	github.com/example/demoapp-operator/internal/controller        43.995s
ok  	github.com/example/demoapp-operator/internal/webhook/v1alpha1  21.390s

If the scaffold wires envtest through the Makefile (default for Operator SDK go/v4), prefer:

bash
make test

so it downloads etcd / kube-apiserver and exports KUBEBUILDER_ASSETS for you.

If go test fails with missing binaries or KUBEBUILDER_ASSETS

  1. Run make setup-envtest or make envtest once (or invoke setup-envtest directly) to populate a local bin/ directory.

  2. Export the path the tool prints, for example:

    bash
    export KUBEBUILDER_ASSETS="$(./bin/setup-envtest use 1.33.5 -p path)"
    go test ./internal/controller

    Pick a Kubernetes version close to production; the exact flag spelling depends on your Makefile.

  3. Remember that setup-envtest ships in a separate Go module (sigs.k8s.io/controller-runtime/tools/setup-envtest). Pin it in CI by version or git SHA — it does not follow controller-runtime library tags the way go get sigs.k8s.io/[email protected] does (upstream discussion).

  4. For slow suites, raise the test timeout on the packages you are validating, for example go test -timeout 30m ./internal/controller ./internal/webhook/v1alpha1.

  5. As a last-resort debugger, USE_EXISTING_CLUSTER=true (supported by envtest) can reuse a throwaway kind cluster instead of downloading binaries — useful when corporate proxies block object storage.


Step 4 - Test CRD validation and status with envtest

Create an envtest spec that proves the API server rejects invalid CRs and accepts valid status updates:

go
var _ = Describe("DemoApp API", func() {
	It("rejects replicas above the CRD maximum", func() {
		app := &demov1alpha1.DemoApp{
			ObjectMeta: metav1.ObjectMeta{
				Name:      "too-many",
				Namespace: "default",
			},
			Spec: demov1alpha1.DemoAppSpec{
				Image:    "nginx:1.27",
				Replicas: ptr.To[int32](99),
			},
		}

		err := k8sClient.Create(ctx, app)
		Expect(err).To(HaveOccurred())
	})

	It("updates status through the status subresource", func() {
		app := &demov1alpha1.DemoApp{
			ObjectMeta: metav1.ObjectMeta{
				Name:      "status-ok",
				Namespace: "default",
			},
			Spec: demov1alpha1.DemoAppSpec{
				Image: "nginx:1.27",
			},
		}
		Expect(k8sClient.Create(ctx, app)).To(Succeed())

		app.Status.ObservedGeneration = app.Generation
		app.Status.ServiceName = "status-ok"
		Expect(k8sClient.Status().Update(ctx, app)).To(Succeed())
	})
})

Typical imports for this spec:

go
import (
	metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
	"k8s.io/utils/ptr"
)

This is exactly where fake client is weak. A fake client can pretend status updates work. envtest proves the API server accepts your CRD and status subresource.

Use envtest whenever the test sentence contains the words "API server should." Examples:

  • the API server should reject invalid replicas
  • the API server should default missing fields
  • the API server should allow /status updates
  • the API server should call the validating webhook
  • the API server should increment generation when spec changes

Those are not pure Go questions. They are Kubernetes API questions.


Step 5 - Test the controller with envtest

Merge the following into the same BeforeSuite you started in Step 3. Ginkgo allows only one BeforeSuite per suite, so extend that block instead of pasting a second var _ = BeforeSuite verbatim.

Start the manager inside the test:

go
var _ = BeforeSuite(func() {
	// previous envtest setup omitted

	mgr, err := ctrl.NewManager(cfg, ctrl.Options{
		Scheme: scheme,
	})
	Expect(err).NotTo(HaveOccurred())

	err = (&DemoAppReconciler{
		Client: mgr.GetClient(),
		Scheme: mgr.GetScheme(),
	}).SetupWithManager(mgr)
	Expect(err).NotTo(HaveOccurred())

	go func() {
		defer GinkgoRecover()
		Expect(mgr.Start(ctx)).To(Succeed())
	}()

	k8sClient = mgr.GetClient()
})

Now test that reconcile creates children:

go
var _ = Describe("DemoApp controller", func() {
	It("creates child resources", func() {
		app := &demov1alpha1.DemoApp{
			ObjectMeta: metav1.ObjectMeta{
				Name:      "envtest-app",
				Namespace: "default",
			},
			Spec: demov1alpha1.DemoAppSpec{
				Image:    "nginx:1.27",
				Replicas: ptr.To[int32](2),
				Port:     80,
				Message:  "envtest",
			},
		}
		Expect(k8sClient.Create(ctx, app)).To(Succeed())

		deploy := &appsv1.Deployment{}
		Eventually(func() error {
			return k8sClient.Get(ctx, client.ObjectKey{
				Namespace: "default",
				Name:      "envtest-app",
			}, deploy)
		}).Should(Succeed())

		Expect(*deploy.Spec.Replicas).To(Equal(int32(2)))
	})
})

Use Eventually because reconciliation is asynchronous. The test creates a CR, the watch event enters a workqueue, the controller processes it, and then the child appears.


Step 6 - Test webhooks

Webhook tests require envtest configured with webhooks:

go
testEnv = &envtest.Environment{
	CRDDirectoryPaths: []string{
		filepath.Join("..", "..", "config", "crd", "bases"),
	},
	WebhookInstallOptions: envtest.WebhookInstallOptions{
		Paths: []string{
			filepath.Join("..", "..", "config", "webhook"),
		},
	},
	ErrorIfCRDPathMissing: true,
}

When creating the manager, set the webhook server host and port from envtest. Add webhook "sigs.k8s.io/controller-runtime/pkg/webhook" to your suite imports so webhook.NewServer resolves (package name may differ if you already import another webhook).

go
webhookInstallOptions := &testEnv.WebhookInstallOptions

mgr, err := ctrl.NewManager(cfg, ctrl.Options{
	Scheme: scheme,
	WebhookServer: webhook.NewServer(webhook.Options{
		Host:    webhookInstallOptions.LocalServingHost,
		Port:    webhookInstallOptions.LocalServingPort,
		CertDir: webhookInstallOptions.LocalServingCertDir,
	}),
})
Expect(err).NotTo(HaveOccurred())
Expect((&demov1alpha1.DemoApp{}).SetupWebhookWithManager(mgr)).To(Succeed())

For current Operator SDK go/v4, webhook setup is generated under internal/webhook/v1alpha1, so the manager call is:

go
Expect(webhookv1alpha1.SetupDemoAppWebhookWithManager(mgr)).To(Succeed())

Then test validation:

go
It("rejects reserved port 22 through the validating webhook", func() {
	app := &demov1alpha1.DemoApp{
		ObjectMeta: metav1.ObjectMeta{
			Name:      "bad-port",
			Namespace: "default",
		},
		Spec: demov1alpha1.DemoAppSpec{
			Image: "nginx:1.27",
			Port:  22,
		},
	}

	err := k8sClient.Create(ctx, app)
	Expect(err).To(HaveOccurred())
})

This proves the admission path, not just your validator function.

For full webhook mechanics, see Mutating and Validating Admission Webhooks in Operators.


Step 7 - Run a kind smoke test

envtest does not run Pods. A kind smoke test proves the packaged operator works in a real cluster.

Create a fresh cluster:

bash
kind create cluster --name demoapp-e2e
kubectl config use-context kind-demoapp-e2e

Build and load the image:

bash
IMG=demoapp-operator:e2e
make docker-build IMG=$IMG
kind load docker-image $IMG --name demoapp-e2e

If Docker builds cannot resolve Go modules from inside the build container but host-side go test works, validate with host networking:

bash
docker build --network=host -t "$IMG" .
kind load docker-image "$IMG" --name demoapp-e2e

Install and deploy:

bash
make deploy IMG=$IMG
kubectl -n demoapp-operator-system rollout status deploy/demoapp-operator-controller-manager

If you enabled webhooks but are not using cert-manager in this local kind cluster, create a short-lived serving certificate for the generated webhook Service and patch the CA bundle before applying CRs:

bash
cat > webhook-openssl.cnf <<'EOF'
[req]
distinguished_name = req_distinguished_name
x509_extensions = v3_req
prompt = no

[req_distinguished_name]
CN = demoapp-operator-webhook-service.demoapp-operator-system.svc

[v3_req]
subjectAltName = @alt_names

[alt_names]
DNS.1 = demoapp-operator-webhook-service.demoapp-operator-system.svc
DNS.2 = demoapp-operator-webhook-service.demoapp-operator-system.svc.cluster.local
EOF

openssl req -x509 -nodes -days 1 -newkey rsa:2048 \
  -keyout tls.key -out tls.crt -config webhook-openssl.cnf

kubectl -n demoapp-operator-system create secret tls webhook-server-cert \
  --cert=tls.crt --key=tls.key --dry-run=client -o yaml | kubectl apply -f -

CA_BUNDLE=$(base64 -w0 tls.crt)
kubectl patch mutatingwebhookconfiguration demoapp-operator-mutating-webhook-configuration --type=json \
  -p="[{\"op\":\"add\",\"path\":\"/webhooks/0/clientConfig/caBundle\",\"value\":\"$CA_BUNDLE\"}]"
kubectl patch validatingwebhookconfiguration demoapp-operator-validating-webhook-configuration --type=json \
  -p="[{\"op\":\"add\",\"path\":\"/webhooks/0/clientConfig/caBundle\",\"value\":\"$CA_BUNDLE\"}]"

Validated manager startup output:

bash
deployment "demoapp-operator-controller-manager" successfully rolled out
successfully acquired lease demoapp-operator-system/66a62f4c.example.com
Registering a mutating webhook  path="/mutate-demo-example-com-v1alpha1-demoapp"
Registering a validating webhook path="/validate-demo-example-com-v1alpha1-demoapp"

Apply a CR:

bash
kubectl apply -f config/samples/demo_v1alpha1_demoapp.yaml

Verify the real workload:

bash
kubectl get demoapp hello -o yaml
kubectl get configmap hello-config
kubectl get service hello
kubectl get deployment hello
kubectl rollout status deployment/hello

If your kind node cannot pull the sample workload image (nginx:1.27) because the environment is offline or rate-limited, either load that image into kind or temporarily patch the sample CR to a local test image. The operator validation depends on the managed Deployment becoming ready; the image itself is not operator-specific.

Validated output after using a locally loaded test workload image:

text
$ kubectl get deployment hello
NAME    READY   UP-TO-DATE   AVAILABLE
hello   2/2     2            2

$ kubectl get demoapp hello -o jsonpath='{.status.readyReplicas}{"|"}{.status.conditions[?(@.type=="Available")].status}{"\n"}'
2|True

Prove live webhook validation:

bash
kubectl apply --validate=false -f - <<'EOF'
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
  name: bad-port
spec:
  image: nginx:1.27
  port: 22
EOF

Validated output:

text
Error from server (Forbidden): admission webhook "vdemoapp-v1alpha1.kb.io" denied the request: spec.port: Forbidden: port 22 is reserved

Prove live webhook defaulting:

bash
kubectl apply --validate=false -f - <<'EOF'
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
  name: webhook-defaulted
spec:
  image: nginx:1.27
EOF

kubectl get demoapp webhook-defaulted \
  -o jsonpath='{.spec.replicas}{"|"}{.spec.port}{"|"}{.spec.message}{"|"}{.spec.configVersion}{"\n"}'

Validated output:

text
1|8080|hello from DemoApp|v1

Prove drift correction:

bash
kubectl scale deployment hello --replicas=5
sleep 5
kubectl get deployment hello -o jsonpath='{.spec.replicas}{"\n"}'

The replica count should return to the CR value.

Validated output:

text
deployment.apps/hello scaled
2

Clean up:

bash
kubectl delete demoapp hello
kubectl get configmap hello-delete-audit
kind delete cluster --name demoapp-e2e

Validated finalizer output:

text
$ kubectl get configmap hello-delete-audit -o jsonpath='{.data.demoApp}{"|"}{.data.namespace}{"|"}{.data.deletedAt}{"\n"}'
hello|default|2026-06-05T12:07:36Z

Do not skip this layer before publishing tutorial code. It catches the problems local unit tests miss: stale generated YAML, missing RBAC, bad image names, webhook service issues, and manager startup failures.


Step 8 - Package the operator with Kustomize manifests

For an internal tutorial or platform team, the simplest release artifact is:

  • a manager image
  • CRD YAML
  • RBAC YAML
  • manager Deployment YAML
  • webhook YAML if enabled

Operator SDK already uses Kustomize under config/.

Set an image:

bash
IMG=registry.example.com/platform/demoapp-operator:v0.1.0

Build and push:

bash
make docker-build IMG=$IMG
docker push $IMG

Generate install YAML:

bash
make build-installer IMG=$IMG

Many Operator SDK projects generate:

text
dist/install.yaml

Install it:

bash
kubectl apply -f dist/install.yaml

If your Makefile does not include build-installer, use Kustomize directly:

bash
cd config/manager
kustomize edit set image controller=$IMG
cd ../..
kustomize build config/default > dist/install.yaml
kubectl apply -f dist/install.yaml

For public OperatorHub-style distribution, add OLM bundle work later. Do not force OLM into the first Go tutorial path unless your readers specifically need it.


Step 9 - Safe upgrade workflow

Assume version v0.1.0 supports:

yaml
spec:
  image: nginx:1.27
  replicas: 2
  port: 80
  message: hello

Now version v0.2.0 adds an optional field:

go
// LogLevel controls application logging.
//
// +kubebuilder:default=info
// +kubebuilder:validation:Enum=debug;info;warn;error
// +optional
LogLevel string `json:"logLevel,omitempty"`

Safe upgrade order:

  1. Add the field as optional or defaulted.
  2. Regenerate CRD with make manifests.
  3. Update controller code to tolerate the field being empty.
  4. Apply the new CRD first.
  5. Roll out the new controller image.
  6. Verify old CRs still reconcile.
  7. Apply a new CR that uses the new field.

Commands:

bash
make generate
make manifests

kubectl apply -f config/crd/bases/demo.example.com_demoapps.yaml

IMG=registry.example.com/platform/demoapp-operator:v0.2.0
make docker-build IMG=$IMG
docker push $IMG
make deploy IMG=$IMG

kubectl -n demoapp-operator-system rollout status deploy/demoapp-operator-controller-manager
kubectl get demoapps -A

Avoid these breaking changes:

  • changing a field type, for example string to object
  • making an existing optional field required without a default
  • removing a field that existing CRs use
  • changing Deployment selector labels in a way Kubernetes rejects
  • changing ownership labels that your cleanup logic depends on

If you need a new API version such as v1alpha1 to v1, use a conversion webhook. That is covered in CRD Version Upgrades with Conversion Webhooks.


Step 10 - Troubleshooting lab

Here are the failures readers hit most often.

CRD not installed

Symptom:

text
error: resource mapping not found for name: "hello" kind: "DemoApp"

Fix:

bash
make install
kubectl get crd demoapps.demo.example.com

RBAC denied

Symptom in manager logs:

text
deployments.apps is forbidden: User "system:serviceaccount:..." cannot create resource "deployments"

Fix:

bash
grep -n "deployments" config/rbac/role.yaml
make manifests
make deploy IMG=$IMG

Check that the deployed ClusterRole contains the permission:

bash
kubectl get clusterrole demoapp-operator-manager-role -o yaml

Webhook certificate problem

Symptom:

text
failed calling webhook ... x509: certificate signed by unknown authority

Check:

bash
kubectl get validatingwebhookconfiguration
kubectl get mutatingwebhookconfiguration
kubectl -n demoapp-operator-system get service
kubectl -n demoapp-operator-system get pods

In production, use cert-manager or another certificate injection flow. For local tutorials, make sure the webhook manifests and cert setup match your Operator SDK scaffold.

Stuck finalizer

Symptom:

bash
kubectl get demoapp hello -o jsonpath='{.metadata.deletionTimestamp}{"\n"}'
kubectl get demoapp hello -o jsonpath='{.metadata.finalizers}{"\n"}'

If deletion timestamp is set and the finalizer remains, the delete path is failing.

Check logs:

bash
kubectl -n demoapp-operator-system logs deploy/demoapp-operator-controller-manager -c manager

Do not manually remove finalizers as the first move. Fix the controller if possible. Manual finalizer removal skips cleanup.

Reconcile hot loop

Symptoms:

  • logs repeat constantly
  • workqueue metrics rise
  • CPU increases
  • status updates happen every loop

Common causes:

  • writing status even when status did not change
  • mutating a field that an admission webhook changes back
  • fighting another controller over the same child field
  • using Requeue: true without a reason

Use Operator Metrics with Prometheus and Drift Detection Patterns to identify the source.

Owned resource changes do not trigger reconcile

Check SetupWithManager:

go
return ctrl.NewControllerManagedBy(mgr).
	For(&demov1alpha1.DemoApp{}).
	Owns(&appsv1.Deployment{}).
	Owns(&corev1.Service{}).
	Owns(&corev1.ConfigMap{}).
	Complete(r)

Also check the child resource has an owner reference:

bash
kubectl get deployment hello -o yaml | grep -A10 ownerReferences

No owner reference means Owns cannot map the child back to the parent.

envtest fails to download etcd / kube-apiserver

Symptoms include KUBEBUILDER_ASSETS unset errors or timeouts while envtest fetches tarballs.

Fix:

  • Prefer make test / make envtest from the scaffolded Makefile so versions stay aligned.
  • On air-gapped CI, vendor the tarball cache or mirror the storage bucket your setup-envtest release uses.
  • Fall back to USE_EXISTING_CLUSTER=true against a disposable kind cluster while you fix networking.

Final series checkpoint

Across the three parts, you built a Go operator that:

  • defines a real custom API
  • generates CRDs and RBAC
  • reconciles several child resources
  • uses status conditions
  • handles deletion with a finalizer
  • corrects drift
  • watches owned and referenced resources
  • validates and defaults CRs with webhooks
  • emits Events
  • has unit, fake-client, envtest, webhook, and kind testing paths
  • builds and deploys as a real manager image
  • supports safe backward-compatible upgrades

That is the end-to-end Go operator path most readers actually need.

From here, the specialized topics are separate:


Frequently Asked Questions

1. Is unit testing alone enough for a Kubernetes operator?

No. Unit tests are excellent for pure desired-state builders, but they cannot prove CRD validation, /status subresource semantics, admission webhooks, RBAC on the manager ServiceAccount, or real Pod scheduling. Layer unit tests, fake-client reconciler tests, envtest, and kind as described in this article.

2. When should I use the fake client instead of envtest?

Fake client is fastest for straight-line client calls (Get/Create/Update) where you do not need the real API server. envtest is required when the sentence under test includes "the API server should ..." (validation, defaulting, generation, managedFields, admission, status updates).

3. Do kind end-to-end tests replace envtest?

No. kind proves the packaged image, RBAC, webhooks against a real Service, kubelet, and scheduling. envtest stays faster for tight controller loops. Keep both: envtest on every PR, kind on main/release.

4. My envtest run fails with missing etcd / kube-apiserver / empty KUBEBUILDER_ASSETS — what now?

envtest shells out to real etcd and kube-apiserver binaries. Prefer make test on Operator SDK / Kubebuilder scaffolds so the Makefile downloads versions and exports KUBEBUILDER_ASSETS. If you run go test directly, download assets once (make envtest or setup-envtest use <k8s-version> -p path) and export the printed directory. The setup-envtest CLI lives in its own Go module (sigs.k8s.io/controller-runtime/tools/setup-envtest) — pin by tag/commit in CI; do not assume it tracks controller-runtime library tags. For debugging only, USE_EXISTING_CLUSTER=true can point tests at a disposable kind cluster instead of downloading binaries.

5. Should I package the operator with OLM on day one?

Not unless you need OperatorHub channels, CSV metadata, or cross-cluster lifecycle semantics. Many internal teams ship Kustomize + a versioned image first, then add operator-sdk bundle when distribution demands it.

6. What is the safest way to upgrade an operator in production?

Apply backward-compatible CRDs first, roll the controller after old CRs remain valid, default new fields, avoid breaking type changes, and run old+new sample CRs through envtest/kind before tagging a release. Use conversion webhooks only when you add a new served API version.

7. Is fake client useless?

No — it is the wrong tool if you expect it to substitute for envtest or kind. Use it for what it is: a lightweight fake of client operations.

8. Should every pull request run kind tests?

Not always. A common split is unit + envtest on every PR, kind smoke on main and release branches, and a fuller install/upgrade suite before shipping a tag.

9. Should I assert on generated CRD YAML verbatim?

Prefer behavioral proof: envtest and kind load the generated manifests. If markers are wrong, those layers fail before production.

10. What is the most common operator upgrade bug?

Shipping a CRD schema change that invalidates existing objects (new required fields, type flips, removed keys). Always test old CR documents against the new CRD before rolling the controller.

11. What signals should I monitor first on a reconciling controller?

Reconcile error rate, reconcile latency histogram, workqueue depth/retries, webhook admission failures, manager readiness, and leader election changes — then add domain metrics once the baseline is healthy.

What's next?

You now have a complete Go operator path from scaffold to tests. Explore the hub chapters above, or jump to OLM / capability levels when you are ready to publish bundles.

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with over a decade of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive experience, he excels across development, DevOps, …

  • Red Hat Certified System Administrator in Red Hat OpenStack
  • Certified Kubernetes Application Developer (CKAD)
  • Red Hat Certified Specialist in Ansible Automation
  • Go (programming language)
  • Python (programming language)
  • DevOps
  • Computer Security