Validation note: The article code was validated end to end with Operator SDK v1.42.2, Go 1.24.4, kubectl v1.36.1, kind v0.31.0, and Docker 29.2.1.
This article completes the Go-based Kubernetes Operator tutorial. The foundation tutorial built the first controller. The controller-runtime tutorial added the important operator behaviors: multiple resources, status, finalizers, drift handling, watches, and webhooks.
This testing and release tutorial answers the question every serious reader asks next:
How can correctness be demonstrated for this operator, and how should releases be shipped safely?
The answer is not one test type. Kubernetes operators need layered verification because different bugs appear at different layers.
| Layer | What it proves | What it cannot prove |
|---|---|---|
| Pure unit tests | Builders, default helper functions, condition helpers | Kubernetes API behavior |
| Fake client tests | Simple reconcile paths, create/update calls | CRD validation, status subresource, webhooks, RBAC |
| envtest | Real API server, CRDs, status, generation, webhooks | Deployed manager image, RBAC, Pods, Services |
| kind smoke tests | Real deployment, image, RBAC, CRDs, webhooks, Pods | Fine-grained unit behavior |
That layered answer is important. Many operator tutorials stop after a fake-client test. That is not enough for production code.
This is one of the most common gaps in Kubernetes operator content. Readers search for "operator sdk test reconciler", "kubebuilder envtest example", "controller-runtime fake client", "test validating webhook operator sdk", and "kind e2e test operator" because the testing boundary is confusing:
- A fake client can store objects but is not a Kubernetes API server
envtestruns kube-apiserver and etcd but not kubelet or the scheduler- A kind cluster runs real workloads but is slower and harder to debug in unit-test style
- Webhooks require certificates and admission registration, so calling a validator function directly is not the same as testing admission
- RBAC errors only show up when the deployed manager uses its ServiceAccount
The official Kubebuilder envtest reference, controller-runtime fake client package, and kind quick start are the primary references. This article turns those pieces into a practical test strategy for the DemoApp operator built in the previous tutorials.
The following table is a pragmatic testing policy for most Go operators:
| When | Run |
|---|---|
| Every local edit | focused Go unit tests for builders and helpers |
| Every pull request | unit tests, fake-client tests, envtest controller tests |
| Every pull request that touches webhooks | envtest webhook tests |
| Main branch or nightly | kind smoke test |
| Before a tagged release | kind install/upgrade test using old and new CR examples |
This keeps fast feedback fast while still proving the things that only Kubernetes can prove. A unit test is not "bad"; it is just a narrow tool. The mistake is using a narrow tool as if it covered the whole operator lifecycle.
Go operator series (3 parts): Part 1 — Operator SDK foundation · Part 2 — controller-runtime · Part 3 — testing and shipping (this page) · Operator tutorial hub
Step 1 - Keep pure logic testable
The controller-runtime tutorial moved builders into internal/controller/resources.go:
buildConfigMapbuildServicebuildDeploymentlabelsFordesiredReplicasdesiredPortdesiredMessage
That was not just organization. It makes the easiest tests fast and reliable.
Create internal/controller/resources_test.go:
package controller
import (
"testing"
"k8s.io/utils/ptr"
demov1alpha1 "github.com/example/demoapp-operator/api/v1alpha1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
)
func TestBuildDeploymentUsesDemoAppSpec(t *testing.T) {
app := &demov1alpha1.DemoApp{
ObjectMeta: metav1.ObjectMeta{
Name: "hello",
Namespace: "default",
},
Spec: demov1alpha1.DemoAppSpec{
Image: "nginx:1.27",
Replicas: ptr.To[int32](3),
Port: 8081,
Message: "from test",
},
}
deploy := buildDeployment(app)
if deploy.Name != "hello" {
t.Fatalf("expected deployment name hello, got %s", deploy.Name)
}
if *deploy.Spec.Replicas != 3 {
t.Fatalf("expected 3 replicas, got %d", *deploy.Spec.Replicas)
}
container := deploy.Spec.Template.Spec.Containers[0]
if container.Image != "nginx:1.27" {
t.Fatalf("expected image nginx:1.27, got %s", container.Image)
}
if container.Ports[0].ContainerPort != 8081 {
t.Fatalf("expected port 8081, got %d", container.Ports[0].ContainerPort)
}
}
func TestBuildConfigMapUsesMessageAndVersion(t *testing.T) {
app := &demov1alpha1.DemoApp{
ObjectMeta: metav1.ObjectMeta{
Name: "hello",
Namespace: "default",
},
Spec: demov1alpha1.DemoAppSpec{
Message: "custom message",
ConfigVersion: "v2",
},
}
cm := buildConfigMap(app)
if cm.Name != "hello-config" {
t.Fatalf("expected hello-config, got %s", cm.Name)
}
if cm.Data["message"] != "custom message" {
t.Fatalf("unexpected message: %s", cm.Data["message"])
}
if cm.Data["configVersion"] != "v2" {
t.Fatalf("unexpected configVersion: %s", cm.Data["configVersion"])
}
}Run:
go test ./internal/controller -run TestBuildValidated output:
ok github.com/example/demoapp-operator/internal/controller 0.176sThese tests do not need Kubernetes. If they fail, the bug is in your desired-state construction, not in controller-runtime.
Pure tests like these should be boring and numerous. They are cheap, deterministic, and easy to debug. They are also the best reason to avoid burying resource construction inside one giant Reconcile function. If a reviewer asks, "What happens when replicas is nil?" or "Does the Service selector still match the Pod labels?", a builder test can answer immediately.
But these tests do not tell you whether the CRD schema is valid, whether the manager can start, or whether a Service actually routes traffic in a cluster. That is why the next layers exist.
Step 2 - Add a fake-client reconcile test
The fake client is useful when you want to test a simple reconcile path without starting an API server.
Create internal/controller/demoapp_fake_test.go:
package controller
import (
"context"
"testing"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"k8s.io/utils/ptr"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/client/fake"
demov1alpha1 "github.com/example/demoapp-operator/api/v1alpha1"
)
func TestReconcileCreatesChildrenWithFakeClient(t *testing.T) {
scheme := runtime.NewScheme()
if err := demov1alpha1.AddToScheme(scheme); err != nil {
t.Fatal(err)
}
if err := appsv1.AddToScheme(scheme); err != nil {
t.Fatal(err)
}
if err := corev1.AddToScheme(scheme); err != nil {
t.Fatal(err)
}
app := &demov1alpha1.DemoApp{
ObjectMeta: metav1.ObjectMeta{
Name: "hello",
Namespace: "default",
},
Spec: demov1alpha1.DemoAppSpec{
Image: "nginx:1.27",
Replicas: ptr.To[int32](2),
Port: 80,
Message: "fake test",
},
}
c := fake.NewClientBuilder().
WithScheme(scheme).
WithObjects(app).
WithStatusSubresource(&demov1alpha1.DemoApp{}).
Build()
r := &DemoAppReconciler{
Client: c,
Scheme: scheme,
}
req := ctrl.Request{
NamespacedName: client.ObjectKeyFromObject(app),
}
_, err := r.Reconcile(context.Background(), req)
if err != nil {
t.Fatal(err)
}
// The first pass adds the finalizer and returns. A real watch would enqueue
// the object again after that update; the fake client test calls Reconcile
// a second time explicitly.
_, err = r.Reconcile(context.Background(), req)
if err != nil {
t.Fatal(err)
}
var deploy appsv1.Deployment
if err := c.Get(context.Background(), client.ObjectKey{Namespace: "default", Name: "hello"}, &deploy); err != nil {
t.Fatal(err)
}
if *deploy.Spec.Replicas != 2 {
t.Fatalf("expected 2 replicas, got %d", *deploy.Spec.Replicas)
}
var svc corev1.Service
if err := c.Get(context.Background(), client.ObjectKey{Namespace: "default", Name: "hello"}, &svc); err != nil {
t.Fatal(err)
}
var cm corev1.ConfigMap
if err := c.Get(context.Background(), client.ObjectKey{Namespace: "default", Name: "hello-config"}, &cm); err != nil {
t.Fatal(err)
}
}Run:
go test ./internal/controller -run TestReconcileCreatesChildrenWithFakeClientValidated output:
ok github.com/example/demoapp-operator/internal/controller 0.176sThis catches many ordinary controller mistakes:
- missing scheme registration
- wrong child object name
- wrong namespace
- wrong labels
- missing owner reference
- wrong replica count
But do not over-trust it.
Fake client does not prove:
- CRD OpenAPI validation
- admission webhooks
- status subresource behavior exactly as the API server handles it
- RBAC
- manager startup
- real watch behavior
- Pods becoming ready
For those, use envtest and kind.
The fake client is best treated as a controller logic test, not a Kubernetes behavior test. It is useful for verifying that your reconciler calls the client with the objects you expect. It is weak whenever the real API server would add behavior: defaulting, validation, managed fields, resource versions, generation changes, status subresource boundaries, and admission.
Step 3 - Use envtest for API-server behavior
envtest starts a real kube-apiserver and etcd for your tests. It does not start kubelet, scheduler, or controller-manager, so it will not create real Pods. But it does prove API-server behavior.
Operator SDK projects usually scaffold a test suite under test/ or internal/controller/suite_test.go. The important pieces are:
package controller
import (
"context"
"path/filepath"
"testing"
. "github.com/onsi/ginkgo/v2"
. "github.com/onsi/gomega"
appsv1 "k8s.io/api/apps/v1"
corev1 "k8s.io/api/core/v1"
"k8s.io/apimachinery/pkg/runtime"
clientgoscheme "k8s.io/client-go/kubernetes/scheme"
ctrl "sigs.k8s.io/controller-runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/envtest"
demov1alpha1 "github.com/example/demoapp-operator/api/v1alpha1"
)
var (
ctx context.Context
cancel context.CancelFunc
testEnv *envtest.Environment
k8sClient client.Client
scheme *runtime.Scheme
)
func TestControllers(t *testing.T) {
RegisterFailHandler(Fail)
RunSpecs(t, "Controller Suite")
}
var _ = BeforeSuite(func() {
ctx, cancel = context.WithCancel(context.Background())
scheme = runtime.NewScheme()
Expect(clientgoscheme.AddToScheme(scheme)).To(Succeed())
Expect(appsv1.AddToScheme(scheme)).To(Succeed())
Expect(corev1.AddToScheme(scheme)).To(Succeed())
Expect(demov1alpha1.AddToScheme(scheme)).To(Succeed())
testEnv = &envtest.Environment{
CRDDirectoryPaths: []string{
filepath.Join("..", "..", "config", "crd", "bases"),
},
ErrorIfCRDPathMissing: true,
}
cfg, err := testEnv.Start()
Expect(err).NotTo(HaveOccurred())
Expect(cfg).NotTo(BeNil())
k8sClient, err = client.New(cfg, client.Options{Scheme: scheme})
Expect(err).NotTo(HaveOccurred())
})
var _ = AfterSuite(func() {
cancel()
Expect(testEnv.Stop()).To(Succeed())
})Before running envtest, install the test binaries. Makefile targets vary by scaffold — use the target your project documents (make setup-envtest, make envtest, or the setup-envtest stanza from the Operator SDK go/v4 Makefile).
make setup-envtestThen run:
KUBEBUILDER_ASSETS="$(./bin/setup-envtest use 1.33.0 -p path)" \
go test ./internal/controller ./internal/webhook/v1alpha1Validated output:
ok github.com/example/demoapp-operator/internal/controller 43.995s
ok github.com/example/demoapp-operator/internal/webhook/v1alpha1 21.390sIf the scaffold wires envtest through the Makefile (default for Operator SDK go/v4), prefer:
make testso it downloads etcd / kube-apiserver and exports KUBEBUILDER_ASSETS for you.
If go test fails with missing binaries or KUBEBUILDER_ASSETS
-
Run
make setup-envtestormake envtestonce (or invokesetup-envtestdirectly) to populate a localbin/directory. -
Export the path the tool prints, for example:
bashexport KUBEBUILDER_ASSETS="$(./bin/setup-envtest use 1.33.5 -p path)" go test ./internal/controllerPick a Kubernetes version close to production; the exact flag spelling depends on your Makefile.
-
Remember that
setup-envtestships in a separate Go module (sigs.k8s.io/controller-runtime/tools/setup-envtest). Pin it in CI by version or git SHA — it does not followcontroller-runtimelibrary tags the waygo get sigs.k8s.io/[email protected]does (upstream discussion). -
For slow suites, raise the test timeout on the packages you are validating, for example
go test -timeout 30m ./internal/controller ./internal/webhook/v1alpha1. -
As a last-resort debugger,
USE_EXISTING_CLUSTER=true(supported by envtest) can reuse a throwawaykindcluster instead of downloading binaries — useful when corporate proxies block object storage.
Step 4 - Test CRD validation and status with envtest
Create an envtest spec that proves the API server rejects invalid CRs and accepts valid status updates:
var _ = Describe("DemoApp API", func() {
It("rejects replicas above the CRD maximum", func() {
app := &demov1alpha1.DemoApp{
ObjectMeta: metav1.ObjectMeta{
Name: "too-many",
Namespace: "default",
},
Spec: demov1alpha1.DemoAppSpec{
Image: "nginx:1.27",
Replicas: ptr.To[int32](99),
},
}
err := k8sClient.Create(ctx, app)
Expect(err).To(HaveOccurred())
})
It("updates status through the status subresource", func() {
app := &demov1alpha1.DemoApp{
ObjectMeta: metav1.ObjectMeta{
Name: "status-ok",
Namespace: "default",
},
Spec: demov1alpha1.DemoAppSpec{
Image: "nginx:1.27",
},
}
Expect(k8sClient.Create(ctx, app)).To(Succeed())
app.Status.ObservedGeneration = app.Generation
app.Status.ServiceName = "status-ok"
Expect(k8sClient.Status().Update(ctx, app)).To(Succeed())
})
})Typical imports for this spec:
import (
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/utils/ptr"
)This is exactly where fake client is weak. A fake client can pretend status updates work. envtest proves the API server accepts your CRD and status subresource.
Use envtest whenever the test sentence contains the words "API server should." Examples:
- the API server should reject invalid
replicas - the API server should default missing fields
- the API server should allow
/statusupdates - the API server should call the validating webhook
- the API server should increment generation when spec changes
Those are not pure Go questions. They are Kubernetes API questions.
Step 5 - Test the controller with envtest
Merge the following into the same BeforeSuite you started in Step 3. Ginkgo allows only one BeforeSuite per suite, so extend that block instead of pasting a second var _ = BeforeSuite verbatim.
Start the manager inside the test:
var _ = BeforeSuite(func() {
// previous envtest setup omitted
mgr, err := ctrl.NewManager(cfg, ctrl.Options{
Scheme: scheme,
})
Expect(err).NotTo(HaveOccurred())
err = (&DemoAppReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr)
Expect(err).NotTo(HaveOccurred())
go func() {
defer GinkgoRecover()
Expect(mgr.Start(ctx)).To(Succeed())
}()
k8sClient = mgr.GetClient()
})Now test that reconcile creates children:
var _ = Describe("DemoApp controller", func() {
It("creates child resources", func() {
app := &demov1alpha1.DemoApp{
ObjectMeta: metav1.ObjectMeta{
Name: "envtest-app",
Namespace: "default",
},
Spec: demov1alpha1.DemoAppSpec{
Image: "nginx:1.27",
Replicas: ptr.To[int32](2),
Port: 80,
Message: "envtest",
},
}
Expect(k8sClient.Create(ctx, app)).To(Succeed())
deploy := &appsv1.Deployment{}
Eventually(func() error {
return k8sClient.Get(ctx, client.ObjectKey{
Namespace: "default",
Name: "envtest-app",
}, deploy)
}).Should(Succeed())
Expect(*deploy.Spec.Replicas).To(Equal(int32(2)))
})
})Use Eventually because reconciliation is asynchronous. The test creates a CR, the watch event enters a workqueue, the controller processes it, and then the child appears.
Step 6 - Test webhooks
Webhook tests require envtest configured with webhooks:
testEnv = &envtest.Environment{
CRDDirectoryPaths: []string{
filepath.Join("..", "..", "config", "crd", "bases"),
},
WebhookInstallOptions: envtest.WebhookInstallOptions{
Paths: []string{
filepath.Join("..", "..", "config", "webhook"),
},
},
ErrorIfCRDPathMissing: true,
}When creating the manager, set the webhook server host and port from envtest. Add webhook "sigs.k8s.io/controller-runtime/pkg/webhook" to your suite imports so webhook.NewServer resolves (package name may differ if you already import another webhook).
webhookInstallOptions := &testEnv.WebhookInstallOptions
mgr, err := ctrl.NewManager(cfg, ctrl.Options{
Scheme: scheme,
WebhookServer: webhook.NewServer(webhook.Options{
Host: webhookInstallOptions.LocalServingHost,
Port: webhookInstallOptions.LocalServingPort,
CertDir: webhookInstallOptions.LocalServingCertDir,
}),
})
Expect(err).NotTo(HaveOccurred())
Expect((&demov1alpha1.DemoApp{}).SetupWebhookWithManager(mgr)).To(Succeed())For current Operator SDK go/v4, webhook setup is generated under internal/webhook/v1alpha1, so the manager call is:
Expect(webhookv1alpha1.SetupDemoAppWebhookWithManager(mgr)).To(Succeed())Then test validation:
It("rejects reserved port 22 through the validating webhook", func() {
app := &demov1alpha1.DemoApp{
ObjectMeta: metav1.ObjectMeta{
Name: "bad-port",
Namespace: "default",
},
Spec: demov1alpha1.DemoAppSpec{
Image: "nginx:1.27",
Port: 22,
},
}
err := k8sClient.Create(ctx, app)
Expect(err).To(HaveOccurred())
})This proves the admission path, not just your validator function.
For full webhook mechanics, see Mutating and Validating Admission Webhooks in Operators.
Step 7 - Run a kind smoke test
envtest does not run Pods. A kind smoke test proves the packaged operator works in a real cluster.
Create a fresh cluster:
kind create cluster --name demoapp-e2e
kubectl config use-context kind-demoapp-e2eBuild and load the image:
IMG=demoapp-operator:e2e
make docker-build IMG=$IMG
kind load docker-image $IMG --name demoapp-e2eIf Docker builds cannot resolve Go modules from inside the build container but host-side go test works, validate with host networking:
docker build --network=host -t "$IMG" .
kind load docker-image "$IMG" --name demoapp-e2eInstall and deploy:
make deploy IMG=$IMG
kubectl -n demoapp-operator-system rollout status deploy/demoapp-operator-controller-managerIf you enabled webhooks but are not using cert-manager in this local kind cluster, create a short-lived serving certificate for the generated webhook Service and patch the CA bundle before applying CRs:
cat > webhook-openssl.cnf <<'EOF'
[req]
distinguished_name = req_distinguished_name
x509_extensions = v3_req
prompt = no
[req_distinguished_name]
CN = demoapp-operator-webhook-service.demoapp-operator-system.svc
[v3_req]
subjectAltName = @alt_names
[alt_names]
DNS.1 = demoapp-operator-webhook-service.demoapp-operator-system.svc
DNS.2 = demoapp-operator-webhook-service.demoapp-operator-system.svc.cluster.local
EOF
openssl req -x509 -nodes -days 1 -newkey rsa:2048 \
-keyout tls.key -out tls.crt -config webhook-openssl.cnf
kubectl -n demoapp-operator-system create secret tls webhook-server-cert \
--cert=tls.crt --key=tls.key --dry-run=client -o yaml | kubectl apply -f -
CA_BUNDLE=$(base64 -w0 tls.crt)
kubectl patch mutatingwebhookconfiguration demoapp-operator-mutating-webhook-configuration --type=json \
-p="[{\"op\":\"add\",\"path\":\"/webhooks/0/clientConfig/caBundle\",\"value\":\"$CA_BUNDLE\"}]"
kubectl patch validatingwebhookconfiguration demoapp-operator-validating-webhook-configuration --type=json \
-p="[{\"op\":\"add\",\"path\":\"/webhooks/0/clientConfig/caBundle\",\"value\":\"$CA_BUNDLE\"}]"Validated manager startup output:
deployment "demoapp-operator-controller-manager" successfully rolled out
successfully acquired lease demoapp-operator-system/66a62f4c.example.com
Registering a mutating webhook path="/mutate-demo-example-com-v1alpha1-demoapp"
Registering a validating webhook path="/validate-demo-example-com-v1alpha1-demoapp"Apply a CR:
kubectl apply -f config/samples/demo_v1alpha1_demoapp.yamlVerify the real workload:
kubectl get demoapp hello -o yaml
kubectl get configmap hello-config
kubectl get service hello
kubectl get deployment hello
kubectl rollout status deployment/helloIf your kind node cannot pull the sample workload image (nginx:1.27) because the environment is offline or rate-limited, either load that image into kind or temporarily patch the sample CR to a local test image. The operator validation depends on the managed Deployment becoming ready; the image itself is not operator-specific.
Validated output after using a locally loaded test workload image:
$ kubectl get deployment hello
NAME READY UP-TO-DATE AVAILABLE
hello 2/2 2 2
$ kubectl get demoapp hello -o jsonpath='{.status.readyReplicas}{"|"}{.status.conditions[?(@.type=="Available")].status}{"\n"}'
2|TrueProve live webhook validation:
kubectl apply --validate=false -f - <<'EOF'
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
name: bad-port
spec:
image: nginx:1.27
port: 22
EOFValidated output:
Error from server (Forbidden): admission webhook "vdemoapp-v1alpha1.kb.io" denied the request: spec.port: Forbidden: port 22 is reservedProve live webhook defaulting:
kubectl apply --validate=false -f - <<'EOF'
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
name: webhook-defaulted
spec:
image: nginx:1.27
EOF
kubectl get demoapp webhook-defaulted \
-o jsonpath='{.spec.replicas}{"|"}{.spec.port}{"|"}{.spec.message}{"|"}{.spec.configVersion}{"\n"}'Validated output:
1|8080|hello from DemoApp|v1Prove drift correction:
kubectl scale deployment hello --replicas=5
sleep 5
kubectl get deployment hello -o jsonpath='{.spec.replicas}{"\n"}'The replica count should return to the CR value.
Validated output:
deployment.apps/hello scaled
2Clean up:
kubectl delete demoapp hello
kubectl get configmap hello-delete-audit
kind delete cluster --name demoapp-e2eValidated finalizer output:
$ kubectl get configmap hello-delete-audit -o jsonpath='{.data.demoApp}{"|"}{.data.namespace}{"|"}{.data.deletedAt}{"\n"}'
hello|default|2026-06-05T12:07:36ZDo not skip this layer before publishing tutorial code. It catches the problems local unit tests miss: stale generated YAML, missing RBAC, bad image names, webhook service issues, and manager startup failures.
Step 8 - Package the operator with Kustomize manifests
For an internal tutorial or platform team, the simplest release artifact is:
- a manager image
- CRD YAML
- RBAC YAML
- manager Deployment YAML
- webhook YAML if enabled
Operator SDK already uses Kustomize under config/.
Set an image:
IMG=registry.example.com/platform/demoapp-operator:v0.1.0Build and push:
make docker-build IMG=$IMG
docker push $IMGGenerate install YAML:
make build-installer IMG=$IMGMany Operator SDK projects generate:
dist/install.yamlInstall it:
kubectl apply -f dist/install.yamlIf your Makefile does not include build-installer, use Kustomize directly:
cd config/manager
kustomize edit set image controller=$IMG
cd ../..
kustomize build config/default > dist/install.yaml
kubectl apply -f dist/install.yamlFor public OperatorHub-style distribution, add OLM bundle work later. Do not force OLM into the first Go tutorial path unless your readers specifically need it.
Step 9 - Safe upgrade workflow
Assume version v0.1.0 supports:
spec:
image: nginx:1.27
replicas: 2
port: 80
message: helloNow version v0.2.0 adds an optional field:
// LogLevel controls application logging.
//
// +kubebuilder:default=info
// +kubebuilder:validation:Enum=debug;info;warn;error
// +optional
LogLevel string `json:"logLevel,omitempty"`Safe upgrade order:
- Add the field as optional or defaulted.
- Regenerate CRD with
make manifests. - Update controller code to tolerate the field being empty.
- Apply the new CRD first.
- Roll out the new controller image.
- Verify old CRs still reconcile.
- Apply a new CR that uses the new field.
Commands:
make generate
make manifests
kubectl apply -f config/crd/bases/demo.example.com_demoapps.yaml
IMG=registry.example.com/platform/demoapp-operator:v0.2.0
make docker-build IMG=$IMG
docker push $IMG
make deploy IMG=$IMG
kubectl -n demoapp-operator-system rollout status deploy/demoapp-operator-controller-manager
kubectl get demoapps -AAvoid these breaking changes:
- changing a field type, for example string to object
- making an existing optional field required without a default
- removing a field that existing CRs use
- changing Deployment selector labels in a way Kubernetes rejects
- changing ownership labels that your cleanup logic depends on
If you need a new API version such as v1alpha1 to v1, use a conversion webhook. That is covered in CRD Version Upgrades with Conversion Webhooks.
Step 10 - Troubleshooting lab
Here are the failures readers hit most often.
CRD not installed
Symptom:
error: resource mapping not found for name: "hello" kind: "DemoApp"Fix:
make install
kubectl get crd demoapps.demo.example.comRBAC denied
Symptom in manager logs:
deployments.apps is forbidden: User "system:serviceaccount:..." cannot create resource "deployments"Fix:
grep -n "deployments" config/rbac/role.yaml
make manifests
make deploy IMG=$IMGCheck that the deployed ClusterRole contains the permission:
kubectl get clusterrole demoapp-operator-manager-role -o yamlWebhook certificate problem
Symptom:
failed calling webhook ... x509: certificate signed by unknown authorityCheck:
kubectl get validatingwebhookconfiguration
kubectl get mutatingwebhookconfiguration
kubectl -n demoapp-operator-system get service
kubectl -n demoapp-operator-system get podsIn production, use cert-manager or another certificate injection flow. For local tutorials, make sure the webhook manifests and cert setup match your Operator SDK scaffold.
Stuck finalizer
Symptom:
kubectl get demoapp hello -o jsonpath='{.metadata.deletionTimestamp}{"\n"}'
kubectl get demoapp hello -o jsonpath='{.metadata.finalizers}{"\n"}'If deletion timestamp is set and the finalizer remains, the delete path is failing.
Check logs:
kubectl -n demoapp-operator-system logs deploy/demoapp-operator-controller-manager -c managerDo not manually remove finalizers as the first move. Fix the controller if possible. Manual finalizer removal skips cleanup.
Reconcile hot loop
Symptoms:
- logs repeat constantly
- workqueue metrics rise
- CPU increases
- status updates happen every loop
Common causes:
- writing status even when status did not change
- mutating a field that an admission webhook changes back
- fighting another controller over the same child field
- using
Requeue: truewithout a reason
Use Operator Metrics with Prometheus and Drift Detection Patterns to identify the source.
Owned resource changes do not trigger reconcile
Check SetupWithManager:
return ctrl.NewControllerManagedBy(mgr).
For(&demov1alpha1.DemoApp{}).
Owns(&appsv1.Deployment{}).
Owns(&corev1.Service{}).
Owns(&corev1.ConfigMap{}).
Complete(r)Also check the child resource has an owner reference:
kubectl get deployment hello -o yaml | grep -A10 ownerReferencesNo owner reference means Owns cannot map the child back to the parent.
envtest fails to download etcd / kube-apiserver
Symptoms include KUBEBUILDER_ASSETS unset errors or timeouts while envtest fetches tarballs.
Fix:
- Prefer
make test/make envtestfrom the scaffolded Makefile so versions stay aligned. - On air-gapped CI, vendor the tarball cache or mirror the storage bucket your
setup-envtestrelease uses. - Fall back to
USE_EXISTING_CLUSTER=trueagainst a disposablekindcluster while you fix networking.
Final series checkpoint
Across the three parts, you built a Go operator that:
- defines a real custom API
- generates CRDs and RBAC
- reconciles several child resources
- uses status conditions
- handles deletion with a finalizer
- corrects drift
- watches owned and referenced resources
- validates and defaults CRs with webhooks
- emits Events
- has unit, fake-client, envtest, webhook, and kind testing paths
- builds and deploys as a real manager image
- supports safe backward-compatible upgrades
That is the end-to-end Go operator path most readers actually need.
From here, the specialized topics are separate:
- Server-Side Apply in Operators for shared field ownership
- Leader Election for HA managers
- Operator Metrics with Prometheus for alerts and dashboards
- Multi-Tenancy Patterns for namespace and tenant isolation
- CRD Version Upgrades for multi-version APIs
Frequently Asked Questions
1. Is unit testing alone enough for a Kubernetes operator?
No. Unit tests are excellent for pure desired-state builders, but they cannot prove CRD validation,/status subresource semantics, admission webhooks, RBAC on the manager ServiceAccount, or real Pod scheduling. Layer unit tests, fake-client reconciler tests, envtest, and kind as described in this article.2. When should I use the fake client instead of envtest?
Fake client is fastest for straight-line client calls (Get/Create/Update) where you do not need the real API server. envtest is required when the sentence under test includes "the API server should ..." (validation, defaulting, generation, managedFields, admission, status updates).3. Do kind end-to-end tests replace envtest?
No. kind proves the packaged image, RBAC, webhooks against a real Service, kubelet, and scheduling. envtest stays faster for tight controller loops. Keep both: envtest on every PR, kind on main/release.4. My envtest run fails with missing etcd / kube-apiserver / empty KUBEBUILDER_ASSETS — what now?
envtest shells out to realetcd and kube-apiserver binaries. Prefer make test on Operator SDK / Kubebuilder scaffolds so the Makefile downloads versions and exports KUBEBUILDER_ASSETS. If you run go test directly, download assets once (make envtest or setup-envtest use <k8s-version> -p path) and export the printed directory. The setup-envtest CLI lives in its own Go module (sigs.k8s.io/controller-runtime/tools/setup-envtest) — pin by tag/commit in CI; do not assume it tracks controller-runtime library tags. For debugging only, USE_EXISTING_CLUSTER=true can point tests at a disposable kind cluster instead of downloading binaries.5. Should I package the operator with OLM on day one?
Not unless you need OperatorHub channels, CSV metadata, or cross-cluster lifecycle semantics. Many internal teams ship Kustomize + a versioned image first, then addoperator-sdk bundle when distribution demands it.6. What is the safest way to upgrade an operator in production?
Apply backward-compatible CRDs first, roll the controller after old CRs remain valid, default new fields, avoid breaking type changes, and run old+new sample CRs through envtest/kind before tagging a release. Use conversion webhooks only when you add a new served API version.7. Is fake client useless?
No — it is the wrong tool if you expect it to substitute for envtest or kind. Use it for what it is: a lightweight fake of client operations.8. Should every pull request run kind tests?
Not always. A common split is unit + envtest on every PR, kind smoke on main and release branches, and a fuller install/upgrade suite before shipping a tag.9. Should I assert on generated CRD YAML verbatim?
Prefer behavioral proof: envtest and kind load the generated manifests. If markers are wrong, those layers fail before production.10. What is the most common operator upgrade bug?
Shipping a CRD schema change that invalidates existing objects (new required fields, type flips, removed keys). Always test old CR documents against the new CRD before rolling the controller.11. What signals should I monitor first on a reconciling controller?
Reconcile error rate, reconcile latency histogram, workqueue depth/retries, webhook admission failures, manager readiness, and leader election changes — then add domain metrics once the baseline is healthy.What's next?
You now have a complete Go operator path from scaffold to tests. Explore the hub chapters above, or jump to OLM / capability levels when you are ready to publish bundles.

