Helm-Based Operator Tutorial Part 1 - Build the Operator (Chart, CRD, watches.yaml)

Last reviewed: by
Helm-Based Operator Tutorial Part 1 - Build the Operator (Chart, CRD, watches.yaml)

The Helm-based operator is the operator pattern for teams who have a working Helm chart and do not want to write Go. The Operator SDK ships a generic reconciler that watches a CR, converts its .spec to Helm values, and runs helm install / upgrade / uninstall on every CR event. The chart does the work; you write zero lines of Go.

This is a two-part tutorial. Part 1 (this article) gets you from zero to a deployed operator with a four-template demo-app chart, a DemoApp CRD, and a fully understood watches.yaml. Part 2 picks up everything you do with the operator afterwards: lifecycle (upgrade/uninstall), drift, hooks, scope, and the hard ceiling.

Prerequisites: install Operator-SDK on Linux, Helm v3 or v4 CLI, Docker, kubectl, and a kind cluster. Familiarity with helm install, helm upgrade, and helm uninstall is assumed - the CLI commands used in this tutorial are identical on both Helm 3 and Helm 4.

A note on Helm 3 vs Helm 4. Helm 4 (current stable v4.2.0 as of mid-2026) is the version you should install for new work. The Operator SDK Helm plugin (helm.sdk.operatorframework.io/v1) used in this tutorial currently still bundles the Helm v3 Go SDK internally (helm-operator-plugins v0.9.x depends on helm.sh/helm/v3); Helm 4 SDK migration is on the roadmap. This does not affect anything you do in this article - the chart format, the CLI you use to test the chart standalone, and the resulting Helm release format (sh.helm.release.v1.<name>.v<rev> Secrets) are all unchanged between Helm 3 and Helm 4. The Helm hybrid operator is the path to take if you want to ship a Go operator that uses the Helm v4 SDK directly today.


What a Helm-based operator actually is (and what you don't write)

A Helm-based operator has four moving pieces:

  1. A Helm chart (your existing one or a new one).
  2. A CRD that defines the shape of the CR.
  3. A watches.yaml that maps CR Group/Version/Kind to a chart directory.
  4. The pre-built helm-operator reconciler binary shipped by Operator SDK.

The reconciler is gcr.io/kubebuilder/... style image that Operator SDK builds for you. When a CR is created or updated, the reconciler reads its .spec, calls helm install or helm upgrade with that spec passed as values, and writes status conditions. When the CR is deleted, it calls helm uninstall. You write zero Go code - the reconciler is generic.

Here is what is generated vs what you provide, compared to a Go operator:

Piece Helm-based operator Go operator (for comparison)
Resource templates Helm chart you write (or already have) Go code in controllers/ constructing each resource
CRD YAML Generated by operator-sdk init from CLI flags (permissive) Generated by make manifests from Go API types
Go API types None - no types.go, no make generate api/v1alpha1/*_types.go you author
Reconcile function None - pre-built reconciler runs helm install/upgrade Reconcile() you author
Configuration watches.yaml (Kind to chart mapping) SetupWithManager in code
Status conditions Fixed: Initialized, Deployed, ReleaseFailed Anything you want
Custom finalizer logic Not possible without rebuilding the operator image A few lines of Go
Reading external state Not possible in reconcile Standard Go HTTP/SDK calls

The trade-off is clear: zero code for "install this chart on every CR," at the cost of zero control over anything else. The five rows marked None / Not possible above are the hard ceiling - Part 2 enumerates them with workarounds (Helm hooks) and Article 3 / Helm hybrid operator show how to break the ceiling by switching to a Go-driven hybrid.


What you'll build in this two-part series

A DemoApp CRD that drives a tiny four-template Helm chart called demo-app:

Template Purpose
templates/deployment.yaml Nginx Deployment serving a single HTML file
templates/service.yaml ClusterIP Service in front of the Deployment
templates/configmap.yaml The HTML body (sourced from .Values.message)
templates/secret.yaml A fake API key (sourced from .Values.apiKey) mounted as an env var

The DemoApp CR maps to Helm values like this:

CR field Helm value Resource it ends up in
spec.replicas replicaCount Deployment.spec.replicas
spec.image image Deployment.spec.template.spec.containers[0].image
spec.message message ConfigMap.data.index.html
spec.apiKey apiKey Secret.data.api-key (base64)
spec.service.type service.type Service.spec.type

What ships in Part 1 vs Part 2:

Part 1 (this article) Part 2 (next)
Write the demo-app chart Upgrade the CR, see helm upgrade
Scaffold the operator (operator-sdk init) Delete the CR, see helm uninstall
Build, deploy, apply first CR (install only) overrideValues and value precedence (full rules)
Tighten the generated CRD Drift detection (edit / delete chart resources, watch reconcile)
Walk every field of watches.yaml Helm hooks for pre/post install/upgrade/delete custom work
Cluster-scoped vs namespace-scoped (WATCH_NAMESPACE + RBAC swap)
Multi-tenancy with the selector field
The hard ceiling (what you cannot do without Go)

Prerequisites

This article assumes you have already completed the full lab setup in Install Operator-SDK on Linux - that one walks the installs (Go, kubectl, Docker, Helm 4 CLI, kind, operator-sdk binary) and brings up a plain kind cluster you can target.

If everything from that guide is in place, all of these should print without error:

bash
operator-sdk version
# operator-sdk version: "v1.42.2", commit: "...", kubernetes version: "1.33.1", ...
helm version --short
# v4.2.0+g0646808
kubectl version --client
# Client Version: v1.36.1
# Kustomize Version: v5.8.1
kind version
# kind v0.31.0 go1.25.5 linux/amd64
docker version --format '{{.Server.Version}}'
# 29.2.1

# kind cluster (created by `kind create cluster --name demo` in the install article)
kubectl get nodes
# NAME                  STATUS   ROLES           AGE   VERSION
# demo-control-plane    Ready    control-plane   ...   v1.35.0

# Helm plugin available in operator-sdk
operator-sdk init --plugins helm --help 2>&1 | head -n 1
# Initialize a new Helm-based operator project.

Version note: This article was verified end-to-end against operator-sdk v1.42.2, helm v4.2.0, kind v0.31.0 (which ships kindest/node:v1.35.0 by default), and Docker 29.2.1. Older operator-sdk versions before v1.34 had a different scaffold (two-container manager pod with a kube-rbac-proxy sidecar) — see the note in Step 4 below.

If operator-sdk init --plugins helm --help errors with no plugin could be resolved with key "helm", your operator-sdk build does not include the Helm plugin - re-run the install steps in the prereq article. If kubectl get nodes is empty, run kind create cluster --name demo from the prereq article's Step 1.

Image distribution: this article (and Part 2) builds a local operator image and needs to ship it into the kind cluster. We use ttl.sh — a free, public, ephemeral container registry that requires no signup — because it works from any cluster with zero setup. The pattern is make docker-build IMG=ttl.sh/demoapp-$(uuidgen):24hdocker push "$IMG"make deploy IMG="$IMG". The prereq article explains the choice in detail, including why we don't use kind load docker-image (brittle on Docker 24+) or a local registry:2 container (works but ~30 lines of setup). Anything pushed to ttl.sh is public; do not push proprietary code or secrets — use a real registry (GHCR, ECR, GAR, ACR) for production.


Part A - Build the operator end to end

Step 1 - Write the demo-app chart

Scaffold a minimal chart:

bash
mkdir -p ~/helm-operator && cd ~/helm-operator
helm create demo-app

helm create (Helm 4) produces about a dozen default templates including deployment.yaml, service.yaml, ingress.yaml, serviceaccount.yaml, hpa.yaml, httproute.yaml, plus _helpers.tpl, NOTES.txt, and a tests/ directory — all useless for our purpose. Wipe them and start clean:

bash
rm demo-app/templates/*.yaml
rm demo-app/templates/tests/*.yaml
rmdir demo-app/templates/tests
rm demo-app/templates/_helpers.tpl
rm demo-app/templates/NOTES.txt

The first rm demo-app/templates/*.yaml catches httproute.yaml, hpa.yaml, and friends in one shot. If you're on Helm 3 you'll see one or two fewer files — the cleanup is the same.

Write demo-app/values.yaml:

yaml
replicaCount: 1
image: nginx:1.27-alpine
message: "Hello from demo-app"
apiKey: "changeme"

service:
  type: ClusterIP
  port: 80

The chart uses the upstream nginx:1.27-alpine from Docker Hub directly — it's a stable, widely-mirrored public image and one pull per CR over the course of this tutorial doesn't come close to Docker Hub's anonymous rate limit (100 pulls / 6 h). The operator image you'll build in Step 4 is a different story (it's built locally and has to reach the cluster) — that one goes through ttl.sh as flagged in Prerequisites.

Write demo-app/templates/_helpers.tpl:

yaml
{{- define "demo-app.name" -}}
{{ .Release.Name }}
{{- end }}

{{- define "demo-app.labels" -}}
app.kubernetes.io/name: {{ include "demo-app.name" . }}
app.kubernetes.io/managed-by: demo-app-operator
{{- end }}

Write demo-app/templates/configmap.yaml:

yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "demo-app.name" . }}
  labels: {{- include "demo-app.labels" . | nindent 4 }}
data:
  index.html: |
    <html><body><h1>{{ .Values.message }}</h1></body></html>

Write demo-app/templates/secret.yaml:

yaml
apiVersion: v1
kind: Secret
metadata:
  name: {{ include "demo-app.name" . }}
  labels: {{- include "demo-app.labels" . | nindent 4 }}
type: Opaque
stringData:
  api-key: {{ .Values.apiKey | quote }}

Write demo-app/templates/deployment.yaml:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "demo-app.name" . }}
  labels: {{- include "demo-app.labels" . | nindent 4 }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app.kubernetes.io/name: {{ include "demo-app.name" . }}
  template:
    metadata:
      labels: {{- include "demo-app.labels" . | nindent 8 }}
    spec:
      containers:
        - name: web
          image: {{ .Values.image }}
          ports:
            - containerPort: 80
          env:
            - name: API_KEY
              valueFrom:
                secretKeyRef:
                  name: {{ include "demo-app.name" . }}
                  key: api-key
          volumeMounts:
            - name: web-content
              mountPath: /usr/share/nginx/html
      volumes:
        - name: web-content
          configMap:
            name: {{ include "demo-app.name" . }}

Write demo-app/templates/service.yaml:

yaml
apiVersion: v1
kind: Service
metadata:
  name: {{ include "demo-app.name" . }}
  labels: {{- include "demo-app.labels" . | nindent 4 }}
spec:
  type: {{ .Values.service.type }}
  selector:
    app.kubernetes.io/name: {{ include "demo-app.name" . }}
  ports:
    - port: {{ .Values.service.port }}
      targetPort: 80

Verify the chart renders and lints:

bash
helm lint demo-app
# ==> Linting demo-app
# [INFO] Chart.yaml: icon is recommended
# 1 chart(s) linted, 0 chart(s) failed

Smoke-test the chart standalone (we are not yet using the operator — the kubelet pulls nginx:1.27-alpine directly from Docker Hub):

bash
helm install hello demo-app --set message="standalone test"
kubectl get deploy,svc,cm,secret -l app.kubernetes.io/name=hello
# kubectl get -l is exact-match; the full label value is `hello`, not a substring
helm uninstall hello

If standalone install works, the chart is good. From here on, the operator drives Helm - you never run helm install directly again.

Step 2 - Scaffold with operator-sdk init (Helm plugin)

Create an empty operator project alongside the chart:

bash
mkdir -p ~/helm-operator/demo-app-operator
cd ~/helm-operator/demo-app-operator

operator-sdk init --plugins=helm.sdk.operatorframework.io/v1 --domain example.com \
  --group demo --version v1alpha1 --kind DemoApp --helm-chart=../demo-app

helm.sdk.operatorframework.io/v1 is the fully-qualified plugin key. The Operator SDK also accepts the short form --plugins=helm (bare alias resolving to the same plugin). Short forms like --plugins=helm/v1 are not recognized by the plugin resolver - the SDK returns no plugin could be resolved with key "helm/v1".

Three things just happened:

  1. A Go module (well, mostly YAML) was scaffolded with a Dockerfile, Makefile, config/, and helm-charts/.
  2. Your demo-app chart was copied into helm-charts/demo-app/.
  3. A watches.yaml was generated mapping demo.example.com/v1alpha1/DemoApp to helm-charts/demo-app.

You will also see a level=warning message near the end of the output:

text
time="..." level=warning msg="The RBAC rules generated in config/rbac/role.yaml are based on the chart's default manifest. Some rules may be missing for resources that are only enabled with custom values, and some existing rules may be overly broad. Double check the rules generated in config/rbac/role.yaml to ensure they meet the operator's permission requirements."

This is an honest warning, not an error. The Operator SDK infers RBAC by rendering the chart with default values and granting create/update/patch/delete on every Kind that comes out. For our demo-app chart the default render is also the only render - we always emit a Deployment, Service, ConfigMap, and Secret regardless of CR values - so the generated role.yaml covers every chart resource.

The warning bites real charts that render conditionally:

yaml
# templates/ingress.yaml
{{- if .Values.ingress.enabled }}
apiVersion: networking.k8s.io/v1
kind: Ingress
...
{{- end }}

With ingress.enabled: false in defaults, operator-sdk init will not generate ingresses.networking.k8s.io permissions, and a CR that sets ingress.enabled: true later will fail the reconcile with ingresses.networking.k8s.io is forbidden. The fix in that case is to manually edit config/rbac/role.yaml to add every Kind the chart can possibly render across all values combinations. Part 2 of this tutorial covers RBAC tightening in the namespace-scoped section; the gap-filling pattern is identical for cluster-scoped operators.

Add the patch verb to the existing events rule

The chart-rendering inference catches every Kind the chart writes. The scaffold also adds two framework-only rules the inference cannot see: secrets:* (helm release storage) and events:create (so the operator can emit Kubernetes Events for things like ReleaseFailed or OverrideValuesInUse). Read your generated config/rbac/role.yaml and you'll find this block already there:

yaml
# We need to create events on CRs about things happening during reconciliation
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create

What's missing is the patch verb. The framework's EventRecorder aggregates repeated events (e.g., many OverrideValuesInUse warnings on the same CR) using a Patch call, and without patch the operator log fills with events ... is forbidden lines on every aggregation attempt (the reconcile itself still succeeds — only the aggregation fails). Add patch to that rule now, before the first make deploy:

yaml
- apiGroups:
  - ""
  resources:
  - events
  verbs:
  - create
  - patch    # add this line

Treat the scaffolded role.yaml as a starting point, not the final answer.

Future-proof for Helm hook resources

Add this bundle to config/rbac/role.yaml now, in the same edit as the events rule above, so the single make deploy you run at the end of this article carries everything Part 2 will need:

yaml
  - apiGroups:
      - ""
    resources:
      - serviceaccounts
      - namespaces
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
      - delete
  - apiGroups:
      - batch
    resources:
      - jobs
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
      - delete
  - apiGroups:
      - rbac.authorization.k8s.io
    resources:
      - roles
      - rolebindings
      - clusterroles
      - clusterrolebindings
    verbs:
      - get
      - list
      - watch
      - create
      - update
      - patch
      - delete

These rules are not free — granting the operator clusterroles/clusterrolebindings write power is a privilege escalation surface (any chart it manages can mint cluster admin). For a production operator that never uses hooks you should leave them out and add only what each chart actually renders. For a tutorial operator where you will exercise hooks in the next article, adding them up-front avoids a second deploy cycle and keeps the focus on the helm-operator concepts rather than RBAC bookkeeping.

Step 3 - Project folder structure

text
demo-app-operator/
├── Dockerfile                          # multi-stage build of the helm-operator binary + chart
├── Makefile                            # docker-build / deploy / undeploy targets
├── PROJECT                             # operator-sdk metadata
├── watches.yaml                        # CR Kind → chart mapping (the routing layer)
├── helm-charts/
│   └── demo-app/                       # your chart, copied here at scaffold time
│       ├── Chart.yaml
│       ├── values.yaml
│       └── templates/...
└── config/
    ├── crd/
    │   └── bases/
    │       └── demo.example.com_demoapps.yaml   # generated CRD (permissive)
    ├── samples/
    │   └── demo_v1alpha1_demoapp.yaml           # generated sample CR
    │   └── kustomization.yaml
    ├── default/                                  # kustomize base for deploying the operator
    ├── manager/                                  # the Deployment that runs the operator pod
    └── rbac/                                     # ClusterRole, ClusterRoleBinding, ServiceAccount

Read three files now - they are the entire "API" of your operator:

bash
cat watches.yaml
cat config/crd/bases/demo.example.com_demoapps.yaml | head -40
cat config/samples/demo_v1alpha1_demoapp.yaml

The watches.yaml:

yaml
# Use the 'create api' subcommand to add watches to this file.
- group: demo.example.com
  version: v1alpha1
  kind: DemoApp
  chart: helm-charts/demo-app
# +kubebuilder:scaffold:watch

Four required fields, nothing more. Part C walks every optional field you can add to this.

The generated sample CR:

yaml
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
  name: demoapp-sample
spec:
  # Default values copied from <project_dir>/helm-charts/demo-app/values.yaml
  replicaCount: 1
  image: nginx:1.27-alpine
  message: "Hello from demo-app"
  apiKey: "changeme"
  service:
    type: ClusterIP
    port: 80

Notice: the sample CR's spec is exactly your chart's values.yaml. This is the implicit mapping - whatever you put in spec becomes Helm values verbatim.

Step 4 - Build, deploy, verify

Pick a unique image URL on ttl.sh. The UUID avoids collisions with anyone else using ttl.sh, and 24h is the time-to-live before the image auto-expires (plenty of room for working through Part 1 + Part 2):

bash
export IMG=ttl.sh/demoapp-$(uuidgen):24h
echo "$IMG"
# ttl.sh/demoapp-3f8b9c12-4a5e-49b8-9d6a-87f2c1e0d3a4:24h

Build the operator image with that tag. The Dockerfile produces a single image containing the helm-operator reconciler binary plus your chart bundled at /opt/helm/helm-charts/demo-app:

bash
make docker-build IMG="$IMG"

Push it to ttl.sh - the cluster will pull from the same URL:

bash
docker push "$IMG"

Why ttl.sh instead of kind load docker-image or a local registry? Newer Docker (24+) with the containerd snapshotter breaks kind load with ctr: content digest <sha>: not found, and running a local registry:2 container requires ~30 lines of cluster setup (custom containerdConfigPatches + per-node hosts.toml). ttl.sh works with zero setup from any cluster — see the prereq article's ttl.sh section for the full reasoning. Do not push proprietary images to ttl.sh — it's public.

Deploy the operator (this also applies the CRD), then wait for the new operator pod to be ready before doing anything else:

bash
make deploy IMG="$IMG"

kubectl -n demo-app-operator-system rollout status deploy/demo-app-operator-controller-manager
# deployment "demo-app-operator-controller-manager" successfully rolled out

make deploy runs kustomize build config/default | kubectl apply -f - - it creates the namespace, ServiceAccount, ClusterRole, ClusterRoleBinding, and the operator Deployment.

Always wait for rollout status after make deploy before applying CRs. Every rebuild of the operator image triggers a Deployment rollout. The new pod isn't Ready instantly — and if you apply (or re-apply) a CR while the OLD pod is still serving, it will reconcile against the OLD watches.yaml / chart / RBAC inside that pod, producing results that don't match this article. The same rule applies to every rebuild block in Part 2.

Sample output (your ttl.sh/demoapp-... URL will differ — the UUID is the one you generated):

text
cd config/manager && /root/helm-operator/demo-app-operator/bin/kustomize edit set image controller=ttl.sh/demoapp-3f8b9c12-4a5e-49b8-9d6a-87f2c1e0d3a4:24h
/root/helm-operator/demo-app-operator/bin/kustomize build config/default | kubectl apply -f -
namespace/demo-app-operator-system created
customresourcedefinition.apiextensions.k8s.io/demoapps.demo.example.com created
serviceaccount/demo-app-operator-controller-manager created
role.rbac.authorization.k8s.io/demo-app-operator-leader-election-role created
clusterrole.rbac.authorization.k8s.io/demo-app-operator-demoapp-admin-role created
clusterrole.rbac.authorization.k8s.io/demo-app-operator-demoapp-editor-role created
clusterrole.rbac.authorization.k8s.io/demo-app-operator-demoapp-viewer-role created
clusterrole.rbac.authorization.k8s.io/demo-app-operator-manager-role created
clusterrole.rbac.authorization.k8s.io/demo-app-operator-metrics-auth-role created
clusterrole.rbac.authorization.k8s.io/demo-app-operator-metrics-reader created
rolebinding.rbac.authorization.k8s.io/demo-app-operator-leader-election-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/demo-app-operator-manager-rolebinding created
clusterrolebinding.rbac.authorization.k8s.io/demo-app-operator-metrics-auth-rolebinding created
service/demo-app-operator-controller-manager-metrics-service created
deployment.apps/demo-app-operator-controller-manager created

Verify:

bash
kubectl get crd | grep demoapps
# demoapps.demo.example.com   2026-06-02T04:00:00Z

kubectl -n demo-app-operator-system get pods
# NAME                                                READY   STATUS    RESTARTS   AGE
# demo-app-operator-controller-manager-7f8b...        1/1     Running   0          25s

READY 1/1 is the modern shape: a single container running the pre-built helm-operator binary. Older operator-sdk releases (before v1.36 / kubebuilder v4.4) scaffolded a two-container pod (READY 2/2) with a kube-rbac-proxy sidecar handling metrics-endpoint auth; that has been replaced with in-process authentication using the Kubernetes TokenReview API, so newer scaffolds drop the sidecar entirely. Both shapes are correct — this article assumes the modern one. Tail the manager log to confirm it picked up watches.yaml:

bash
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager -c manager | head -20
# {"level":"info","ts":"2026-06-03T08:37:54Z","logger":"cmd","msg":"Version","Go Version":"go1.25.8","GOOS":"linux","GOARCH":"amd64","helm-operator":"v1.42.2","commit":"6001c29067051e1a04e829ea033988b904d1845e"}
# {"level":"info","ts":"2026-06-03T08:37:54Z","logger":"cmd","msg":"Watching all namespaces"}
# {"level":"info","ts":"2026-06-03T08:37:54Z","logger":"helm.controller","msg":"Watching resource","apiVersion":"demo.example.com/v1alpha1","kind":"DemoApp","reconcilePeriod":"1m0s"}
# {"level":"info","ts":"2026-06-03T08:37:54Z","logger":"controller-runtime.metrics","msg":"Starting metrics server"}
# {"level":"info","ts":"2026-06-03T08:37:54Z","msg":"starting server","name":"health probe","addr":"[::]:8081"}
# I0603 08:37:54.626521       1 leaderelection.go:257] attempting to acquire leader lease demo-app-operator-system/demo-app-operator...
# I0603 08:37:54.676569       1 leaderelection.go:271] successfully acquired lease demo-app-operator-system/demo-app-operator
# {"level":"info","ts":"2026-06-03T08:37:54Z","msg":"Starting EventSource","controller":"demoapp-controller","source":"kind source: *unstructured.Unstructured"}
# {"level":"info","ts":"2026-06-03T08:37:55Z","logger":"controller-runtime.metrics","msg":"Serving metrics server","bindAddress":":8443","secure":true}
# {"level":"info","ts":"2026-06-03T08:37:55Z","msg":"Starting Controller","controller":"demoapp-controller"}
# {"level":"info","ts":"2026-06-03T08:37:55Z","msg":"Starting workers","controller":"demoapp-controller","worker count":2}

"msg":"Watching all namespaces" confirms cluster scope (no WATCH_NAMESPACE env var → all namespaces). The line "msg":"Watching resource","kind":"DemoApp","reconcilePeriod":"1m0s" confirms watches.yaml was loaded for the DemoApp Kind. Part 2 covers how to flip cluster-scope to namespace-scope.

Step 5 - First install: apply the CR

bash
kubectl apply -f config/samples/demo_v1alpha1_demoapp.yaml
# demoapp.demo.example.com/demoapp-sample created

Watch the operator's reconcile fire:

bash
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager -c manager -f
# ... (startup lines from the previous step) ...
# {"level":"info","ts":"2026-06-03T08:41:47Z","msg":"Starting EventSource","controller":"demoapp-controller","source":"kind source: *unstructured.Unstructured"}
# {"level":"info","ts":"2026-06-03T08:41:47Z","logger":"helm.controller","msg":"Watching dependent resource","ownerApiVersion":"demo.example.com/v1alpha1","ownerKind":"DemoApp","apiVersion":"v1","kind":"Secret"}
# {"level":"info","ts":"2026-06-03T08:41:47Z","logger":"helm.controller","msg":"Watching dependent resource","ownerApiVersion":"demo.example.com/v1alpha1","ownerKind":"DemoApp","apiVersion":"v1","kind":"ConfigMap"}
# {"level":"info","ts":"2026-06-03T08:41:47Z","logger":"helm.controller","msg":"Watching dependent resource","ownerApiVersion":"demo.example.com/v1alpha1","ownerKind":"DemoApp","apiVersion":"v1","kind":"Service"}
# {"level":"info","ts":"2026-06-03T08:41:47Z","logger":"helm.controller","msg":"Watching dependent resource","ownerApiVersion":"demo.example.com/v1alpha1","ownerKind":"DemoApp","apiVersion":"apps/v1","kind":"Deployment"}
# {"level":"info","ts":"2026-06-03T08:41:47Z","logger":"helm.controller","msg":"Installed release","namespace":"default","name":"demoapp-sample","apiVersion":"demo.example.com/v1alpha1","kind":"DemoApp","release":"demoapp-sample"}
# {"level":"info","ts":"2026-06-03T08:41:50Z","logger":"helm.controller","msg":"Reconciled release","namespace":"default","name":"demoapp-sample","apiVersion":"demo.example.com/v1alpha1","kind":"DemoApp","release":"demoapp-sample"}

The four "Watching dependent resource" lines are the operator subscribing to events on every Kind the chart renders (Secret, ConfigMap, Service, Deployment) — that's the drift-detection mechanism covered in Part 2. "Installed release" is the actual helm install call returning success. Stop tailing with Ctrl-C.

The chart's resources should be present in default:

bash
kubectl get all,cm,secret -l app.kubernetes.io/name=demoapp-sample
# NAME                                  READY   STATUS    RESTARTS   AGE
# pod/demoapp-sample-799cd75ff5-lvb4w   1/1     Running   0          72s

# NAME                     TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE
# service/demoapp-sample   ClusterIP   10.96.196.76   <none>        80/TCP    75s

# NAME                             READY   UP-TO-DATE   AVAILABLE   AGE
# deployment.apps/demoapp-sample   1/1     1            1           75s

# NAME                                        DESIRED   CURRENT   READY   AGE
# replicaset.apps/demoapp-sample-799cd75ff5   1         1         1       74s

# NAME                       DATA   AGE
# configmap/demoapp-sample   1      75s

# NAME                    TYPE     DATA   AGE
# secret/demoapp-sample   Opaque   1      76s

Port-forward and curl the service to confirm the chart actually renders your message:

bash
kubectl port-forward svc/demoapp-sample 8080:80 &
curl localhost:8080
# <html><body><h1>Hello from demo-app</h1></body></html>
kill %1

The Helm release lives as a Secret (Helm's default storage backend):

bash
kubectl get secret -l owner=helm
# NAME                                   TYPE                 DATA   AGE
# sh.helm.release.v1.demoapp-sample.v1   helm.sh/release.v1   1      3m22s

You now have a working pre-built Helm operator. Zero lines of Go. The full lifecycle demo (upgrade, uninstall, drift, hooks, etc.) is in Part 2. The rest of Part 1 makes the CRD safer and explains every knob in watches.yaml.


Part B - The CRD and CR (the generated API)

What operator-sdk wrote for you (the permissive default)

Read the generated CRD:

bash
cat config/crd/bases/demo.example.com_demoapps.yaml

You will see something like:

yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: demoapps.demo.example.com
spec:
  group: demo.example.com
  names:
    kind: DemoApp
    listKind: DemoAppList
    plural: demoapps
    singular: demoapp
  scope: Namespaced
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              x-kubernetes-preserve-unknown-fields: true
            status:
              type: object
              x-kubernetes-preserve-unknown-fields: true
      subresources:
        status: {}

Why x-kubernetes-preserve-unknown-fields: true is the default

operator-sdk init --plugins=helm.sdk.operatorframework.io/v1 cannot know the shape of your chart's values.yaml - charts can use arbitrary nested keys, conditionals, and templated values. Rather than guess, it accepts anything. That gives you a working operator on day one but no client-side validation: a typo like replicaCounnt: 3 is silently dropped at template time and the chart uses the default.

For real use you should tighten the schema.

Tightening the CRD with OpenAPI v3 markers

You want to replace the permissive spec: block inside properties: with an explicit OpenAPI v3 schema. The surrounding fields - apiVersion, kind, metadata (which is just type: object with no further schema), and status (kept permissive on purpose, the operator owns it) - stay as-is.

Because indentation in this YAML is depth-sensitive and easy to get wrong with a surgical replace, the safest path is to overwrite the whole file. Open config/crd/bases/demo.example.com_demoapps.yaml and replace the entire contents with:

yaml
---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: demoapps.demo.example.com
spec:
  group: demo.example.com
  names:
    kind: DemoApp
    listKind: DemoAppList
    plural: demoapps
    singular: demoapp
  scope: Namespaced
  versions:
    - name: v1alpha1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            apiVersion:
              type: string
            kind:
              type: string
            metadata:
              type: object
            spec:
              type: object
              required:
                - replicaCount
                - message
              properties:
                replicaCount:
                  type: integer
                  minimum: 1
                  maximum: 10
                  default: 1
                image:
                  type: string
                  pattern: '^[a-z0-9./:-]+$'
                  default: 'nginx:1.27-alpine'
                message:
                  type: string
                  minLength: 1
                  maxLength: 200
                apiKey:
                  type: string
                  minLength: 8
                service:
                  type: object
                  properties:
                    type:
                      type: string
                      enum: ["ClusterIP", "NodePort", "LoadBalancer"]
                      default: "ClusterIP"
                    port:
                      type: integer
                      minimum: 1
                      maximum: 65535
                      default: 80
            status:
              type: object
              x-kubernetes-preserve-unknown-fields: true
      subresources:
        status: {}

Worked example - what tightening buys you

Re-apply the CRD:

bash
kubectl apply -f config/crd/bases/demo.example.com_demoapps.yaml

Now bad CRs are rejected at admission, never reach the reconciler:

bash
cat <<EOF | kubectl apply -f -
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
  name: bad
spec:
  replicaCount: 99           # exceeds maximum: 10
  message: ""                 # violates minLength: 1
  service:
    type: "Invalid"           # not in enum
EOF
# The DemoApp "bad" is invalid: 
# * spec.message: Invalid value: "": spec.message in body should be at least 1 chars long
# * spec.replicaCount: Invalid value: 99: spec.replicaCount in body should be less than or equal to 10
# * spec.service.type: Unsupported value: "Invalid": supported values: "ClusterIP", "NodePort", "LoadBalancer"

(The order is alphabetical by field path, not the order you wrote them in.)

This is the single highest-leverage edit in any Helm-based operator project. Without it, the only failure signal you get is a chart render error in the operator log - far too late and far too hidden.

What happens when a user submits invalid spec

Two layers reject invalid input:

  1. kube-apiserver uses the OpenAPI schema to reject at admission - this is what the example above showed.
  2. The chart's own template logic can still fail at render time for fields the schema cannot express (e.g., "spec.image must point to an image tag that exists in your registry"). Those failures appear in the operator log and set status.conditions[type=ReleaseFailed].status=True.

Treat the CRD schema as the first line of defense and chart template guards ({{ required "..." }}) as the second.

Rebuild and redeploy after CRD edits

CRD changes do not require rebuilding the operator image - they live in config/crd/bases/ and are applied by make deploy. Re-apply the CRD alone whenever you change validation:

bash
kubectl apply -f config/crd/bases/demo.example.com_demoapps.yaml

You only rebuild the image when the chart or watches.yaml changes (covered in Part C).


Part C - The watches.yaml routing layer

The watches.yaml you saw in Step 3 had only the four required fields. The full schema includes four optional fields that let you tune cadence, control drift behaviour, inject values from the operator level, and filter which CRs each controller picks up.

Schema overview

Field Type Required Default Purpose
group string yes - CR API group (e.g. demo.example.com)
version string yes - CR API version (e.g. v1alpha1)
kind string yes - CR Kind (e.g. DemoApp)
chart string (path) yes - Local chart directory inside the operator image
reconcilePeriod duration no 1m Cadence of periodic resync per CR
watchDependentResources bool no true Watch chart-rendered resources, reconcile on their changes
overrideValues map[string]any no {} Operator-level value injection (highest precedence)
selector LabelSelector no {} Only CRs matching these labels are handled by this controller

A complete example with every field present:

yaml
- group: demo.example.com
  version: v1alpha1
  kind: DemoApp
  chart: helm-charts/demo-app
  reconcilePeriod: 30s
  watchDependentResources: true
  overrideValues:
    image: "nginx:1.27-alpine"
    registryMirror: "$REGISTRY_MIRROR"
  selector:
    matchLabels:
      tier: production

group, version, kind, chart - the required four

These bind a CR Kind to a chart. They are exactly what operator-sdk init --plugins=helm.sdk.operatorframework.io/v1 filled in for you.

  • chart is a path inside the operator image (the Dockerfile copies your helm-charts/ into /opt/helm/helm-charts/). Remote chart URLs are not supported - rebuild the image when the chart changes.
  • Changing any of group, version, kind is an API break; users have to migrate their CRs.

reconcilePeriod - cadence

Every CR is reconciled on:

  1. Every event on the CR itself (create/update/delete).
  2. Every event on a chart-rendered resource (if watchDependentResources: true).
  3. Periodically, every reconcilePeriod.

The periodic resync exists as a safety net - it re-renders the chart and re-applies anything missing, even if no event fired. The default 1m is reasonable for tens of CRs; with hundreds you should bump it to 5m or 10m to keep CPU sane. The trade-off is detection latency for "silent" drift (the kind events don't catch).

The drift demo and full tuning guidance is in Part 2.

watchDependentResources - drift on/off

When true (default), the operator subscribes to events for the resource types its chart renders. If somebody kubectl edits a chart-rendered ConfigMap, the operator wakes up and re-renders, reverting the change.

When false, the operator only reacts to events on the CR itself plus the periodic resync. Drift is only corrected on the resync cadence - useful for debugging or when you intentionally want a stable window for manual intervention.

yaml
watchDependentResources: false   # only react to CR events + periodic resync

overrideValues - operator-level value injection

Values you set here have higher precedence than the CR's .spec and higher precedence than the chart's values.yaml defaults. Use them for:

  • Pinning fields the user should not override (image registry, security contexts).
  • Injecting env-var-sourced values (per-environment defaults from the operator pod's environment).
  • Cluster-wide labels you want on every release.

Static example - force a specific image tag for all CRs:

yaml
overrideValues:
  image: "nginx:1.27-alpine"

Env-var substitution example — the operator pod sets REGISTRY_MIRROR via env, and the value flows into every CR's reconcile:

yaml
overrideValues:
  registryMirror: "$REGISTRY_MIRROR"
  imagePullPolicy: "$IMAGE_PULL_POLICY"

The supported substitution syntax is intentionally minimal:

Form Meaning
$VAR Substitute env var VAR. If unset, resolves to an empty string (no error).
${VAR} Same as $VAR.
'{{ env "VAR" }}' Go-template form via Sprig. Same empty-string-on-unset behaviour as $VAR.
'{{ default "x" (env "VAR") }}' Go-template form with a fallback. The only supported way to get a default.

Shell-style fallback is NOT supported. ${VAR:-default} (the bash idiom) is not recognised by the helm-operator. If you write it, the substitution silently fails and the value becomes an empty string — which then overrides the chart's own default. Always set the env var on the operator pod, or use the '{{ default ... (env "VAR") }}' Go-template form.

To set the env vars, edit config/manager/manager.yaml:

yaml
        env:
          - name: REGISTRY_MIRROR
            value: "registry.internal/proxy"
          - name: IMAGE_PULL_POLICY
            value: "Always"

Part 2 covers the full precedence rules (overrideValues > CR .spec > chart values.yaml) and patterns like per-environment defaults and secret-handling.

selector - label-based CR filtering

With selector, the operator only reconciles CRs whose labels match. The use case is multi-tenancy with a single operator binary: run two copies of the operator with different selectors, each handling a different tenant's CRs.

yaml
selector:
  matchLabels:
    tier: production

A DemoApp CR labeled tier: production would be picked up; one labeled tier: staging would not. Without the label or with a different label, the CR is silently ignored by this controller. Full multi-tenant patterns (three options including selector) are in Part 2.

⚠️ Known regression — selector is ignored in cluster-scoped mode. Since helm-operator v1.34 there is a confirmed bug where the selector field is silently dropped when WATCH_NAMESPACE is empty (i.e. the default cluster-scoped configuration). Every CR of the watched Kind gets reconciled regardless of labels. Two workarounds: (1) set WATCH_NAMESPACE to a specific namespace on the manager pod (this turns the operator into a namespace-scoped one — Part 2 walks that flip), or (2) add the label helm.sdk.operatorframework.io/chart: <chart-name> to every CR you want reconciled (hacky but works cluster-wide). The Part C worked example below sets WATCH_NAMESPACE: "default" so that selector actually filters; remove that env var if you need true cluster scope and the selector won't behave as documented above.

Multi-Kind operator - one operator, two charts

watches.yaml is a list. One operator can manage multiple Kinds, each mapped to its own chart:

yaml
- group: demo.example.com
  version: v1alpha1
  kind: DemoApp
  chart: helm-charts/demo-app

- group: demo.example.com
  version: v1alpha1
  kind: WorkerApp
  chart: helm-charts/worker-app
  reconcilePeriod: 5m
  watchDependentResources: true

You scaffold the second Kind with operator-sdk create api:

bash
operator-sdk create api \
  --group demo \
  --version v1alpha1 \
  --kind WorkerApp \
  --helm-chart=../worker-app

This adds a second entry to watches.yaml, copies the chart into helm-charts/worker-app/, and adds a CRD for WorkerApp. The same operator pod runs both controllers in-process.

Worked example - exercise the optional fields end-to-end

The schema and per-field notes above describe what each optional field does. This section is the hands-on counterpart: edit watches.yaml, rebuild, redeploy, and observe reconcilePeriod, overrideValues, and selector in action with three concrete checks. (watchDependentResources gets its own dedicated drift demos in Part 2 — leaving it at the default true here.)

1. Edit watches.yaml

Replace the four-line minimal watches.yaml with this:

yaml
- group: demo.example.com
  version: v1alpha1
  kind: DemoApp
  chart: helm-charts/demo-app
  reconcilePeriod: 10s
  overrideValues:
    apiKey: "$SHARED_DEMO_KEY"
  selector:
    matchLabels:
      tier: demo

What this changes:

  • reconcilePeriod: 10s — speeds up the periodic safety-net resync from the 1m0s default so it's easy to observe at startup.
  • overrideValues.apiKey: "$SHARED_DEMO_KEY" — overrides whatever apiKey the CR sets (or doesn't set) with the value of the SHARED_DEMO_KEY env var on the operator pod. (Remember: no ${VAR:-default} shell syntax — set the env var or the override resolves to empty.)
  • selector.matchLabels.tier: demo — only CRs labelled tier: demo will be reconciled by this controller.

2. Add the env vars to the operator pod

Open config/manager/manager.yaml, find the containers: block, and add this env: block under the manager container (SHARED_DEMO_KEY feeds the override; WATCH_NAMESPACE: "default" is required for the selector to actually filter — see the regression note in the selector section above):

yaml
        env:
          - name: SHARED_DEMO_KEY
            value: "operator-supplied-key-12345"
          - name: WATCH_NAMESPACE
            value: "default"

3. Rebuild, push, redeploy

Both watches.yaml and manager.yaml are baked into the operator image at make docker-build time, so changes need a new image. Bump the tag (or generate a fresh ttl.sh URL) and push:

bash
export IMG=ttl.sh/demoapp-$(uuidgen):24h   # new URL for the v0.1.1 iteration
make docker-build IMG="$IMG"
docker push "$IMG"
make deploy IMG="$IMG"

kubectl -n demo-app-operator-system rollout status \
  deploy/demo-app-operator-controller-manager
# deployment "demo-app-operator-controller-manager" successfully rolled out

A fresh UUID per build keeps the kubelet from caching a stale image under the same tag — saves an imagePullPolicy: Always patch. If you'd rather keep one URL across iterations, export IMG once and re-push to the same tag, but you'll want imagePullPolicy: Always on the operator Deployment.

Check 1 - reconcilePeriod is what you set

Tail the operator startup log:

bash
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager \
  -c manager | grep -i 'reconcilePeriod'
# {"level":"info","ts":"...","logger":"helm.controller","msg":"Watching resource",
#  "apiVersion":"demo.example.com/v1alpha1","kind":"DemoApp","reconcilePeriod":"10s"}

reconcilePeriod":"10s" confirms the value flowed from watches.yaml into the runtime controller config. (Full drift / cadence demos are in Part 2 — this is just the "it took effect" check.)

Check 2 - selector filters CRs

Create two CRs in the same namespace - only picked carries the tier: demo label:

bash
kubectl apply -f - <<'EOF'
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
  name: picked
  labels:
    tier: demo            # matches selector
spec:
  replicaCount: 1
  image: nginx:1.27-alpine
  message: "Hello from the picked CR"
---
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
  name: ignored
  # no labels -> does NOT match selector
spec:
  replicaCount: 1
  image: nginx:1.27-alpine
  message: "Should not be reconciled"
EOF

Both CRs exist, but only picked should have chart resources behind it:

bash
kubectl get demoapp
# NAME      AGE
# ignored   8s
# picked    8s

kubectl get deploy,svc,cm,secret -l app.kubernetes.io/name=picked
# deployment.apps/picked   1/1   1   1   20s
# service/picked           ClusterIP   ...
# configmap/picked         1     20s
# secret/picked            Opaque   1   20s

kubectl get deploy,svc,cm,secret -l app.kubernetes.io/name=ignored
# No resources found in default namespace.

The ignored CR exists in etcd but the controller filtered it out before reconcile. The operator log confirms it:

bash
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager \
  -c manager | grep -E 'picked|ignored' | head -3
# {"level":"info","ts":"...","msg":"Starting Reconcile","name":"picked","namespace":"default"}
# {"level":"info","ts":"...","msg":"Reconciled release","name":"picked","namespace":"default"}
# (no "ignored" lines — the predicate dropped the event before it reached Reconcile)

Check 3 - overrideValues wins over the CR

The picked CR did not set spec.apiKey (the chart's default changeme would normally apply). But overrideValues.apiKey in watches.yaml injected the env-var-sourced value at higher precedence. Confirm by base64-decoding the Secret:

bash
kubectl get secret picked -o jsonpath='{.data.api-key}' | base64 -d
# operator-supplied-key-12345

Not changeme (the chart's values.yaml default), but operator-supplied-key-12345 — the value the operator pod's SHARED_DEMO_KEY env var resolved to and flowed through $SHARED_DEMO_KEY in watches.yaml. This is the override + env-var-substitution path the section above described, end-to-end. You can also confirm what Helm received by running helm get values picked -n default — the USER-SUPPLIED VALUES block should show apiKey: operator-supplied-key-12345. Part 2 walks the full precedence rules (overrideValues > CR .spec > chart values.yaml) with more patterns.

Confirming the override visibly. The framework also emits one Warning Event of reason OverrideValuesInUse per overridden field, on the CR's Event stream — handy for explaining to CR authors why their value was ignored:

bash
kubectl get events --field-selector involvedObject.name=picked --sort-by=.lastTimestamp
# LAST SEEN   TYPE      REASON                OBJECT             MESSAGE
# 12s         Warning   OverrideValuesInUse   demoapp/picked     Chart value "apiKey" overridden to "operator-supplied-key-12345" by operator's watches.yaml

Cleanup

bash
kubectl delete demoapp picked ignored --ignore-not-found

Recommended before starting Part 2: revert watches.yaml to the minimal four required fields and remove both SHARED_DEMO_KEY and WATCH_NAMESPACE from manager.yaml's env: block, then rebuild + redeploy with a fresh IMG=ttl.sh/demoapp-$(uuidgen):24h and re-apply config/samples/demo_v1alpha1_demoapp.yaml. Part 2 assumes the operator is cluster-scoped with a vanilla watches.yaml and a single demoapp-sample CR running.

Pitfalls per field

Pitfall Why it hurts
Editing the chart but not running make docker-build + docker push The new templates never reach the running operator
Setting reconcilePeriod: 5s with hundreds of CRs Operator CPU pegs; the API server takes the punishment too
Putting a remote chart URL in chart: The operator boots, finds the path on disk, and crashes
Using ${VAR:-default} in overrideValues Shell-style fallback is not supported; the substitution silently becomes ""
Forgetting to declare env vars in manager.yaml for $VAR substitution The variable resolves to an empty string, which then overrides the chart default
Leaving WATCH_NAMESPACE unset and using selector Known regressionselector is ignored cluster-scoped
Two watches.yaml entries with overlapping selector for the same Kind Both controllers fight over the same CRs; status flaps
Switching watchDependentResources: false "to debug" and forgetting Drift goes uncorrected until next periodic resync

What's next - Part 2

You now have a working Helm-based operator with a tight CRD and a fully understood watches.yaml. Part 2 - Lifecycle, drift, hooks, scope, and the hard ceiling picks up here and covers:

  • Lifecycle: upgrade the CR and see helm upgrade run, delete the CR and see helm uninstall cascade.
  • Values mapping: full precedence rules (overrideValues > CR .spec > chart values.yaml), env-var substitution patterns, secret-handling patterns.
  • Drift detection: edit a ConfigMap with kubectl edit, watch the operator revert it within seconds; same for deletion.
  • Helm hooks: pre/post install/upgrade/delete Jobs - the pre-built operator's only escape hatch for "do something custom around the chart."
  • Scope and multi-tenancy: flip from cluster-scoped to namespace-scoped (WATCH_NAMESPACE + RBAC swap), and three options for multi-tenant deployments including selector.
  • The hard ceiling: the features the pre-built operator cannot provide - custom finalizer logic against external systems, custom status fields, cross-CR coordination, reading external state in reconcile. Each entry links to the Helm hybrid operator where the ceiling is broken.

If you only need "install the chart on every CR," you may never need Part 2. If you need anything beyond that - even drift recovery on its own - read it.


Further reading


Summary

The Helm-based operator gives you a Kubernetes operator with zero lines of Go: a Helm chart, a CRD, and a watches.yaml mapping the two. The pre-built reconciler ships from Operator SDK and handles install/upgrade/uninstall on every CR event. Part 1 walked you from an empty directory to a deployed operator with a tightened CRD and a fully understood watches.yaml - the four required fields plus the four optional knobs (reconcilePeriod, watchDependentResources, overrideValues, selector). The single most valuable edit you can make is tightening the CRD's schema; the permissive x-kubernetes-preserve-unknown-fields: true default is fine for day one and dangerous for day ten. Part 2 picks up where Part 1 ends: lifecycle, drift, Helm hooks, scope, and the hard ceiling beyond which only a Helm hybrid operator will do.

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with over a decade of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive experience, he excels across development, DevOps, …

  • Red Hat Certified System Administrator in Red Hat OpenStack
  • Certified Kubernetes Application Developer (CKAD)
  • Red Hat Certified Specialist in Ansible Automation
  • Go (programming language)
  • Python (programming language)
  • DevOps
  • Computer Security