Part 1 ended with a deployed operator and a running DemoApp CR that rendered an nginx Deployment, a Service, a ConfigMap, and a Secret. This Part 2 picks up there and walks everything you do with the operator afterwards: the upgrade and uninstall lifecycle, the full value precedence rules, drift recovery, Helm hooks as the only escape hatch for custom work, scope and multi-tenancy patterns, and the hard ceiling of features the pre-built operator cannot provide.
Code-first throughout. If you have not built the Part 1 operator, start there - this article assumes the same demo-app chart, DemoApp CR, and demo-app-operator deployed in your kind cluster.
Recap from Part 1
You have:
- A
demo-appchart with four templates: Deployment, Service, ConfigMap, Secret. - An operator project built with
operator-sdk init --plugins=helm.sdk.operatorframework.io/v1, image pushed to a ttl.sh URL (e.g.ttl.sh/demoapp-<uuid>:24h) and deployed viamake deploy IMG="$IMG". - A
watches.yamlmappingdemo.example.com/v1alpha1/DemoApptohelm-charts/demo-app. - A tightened CRD with OpenAPI v3 validation.
- A
DemoAppCR nameddemoapp-samplein thedefaultnamespace driving one chart release.
Quick sanity check before proceeding:
kubectl get demoapps -A
# NAMESPACE NAME AGE
# default demoapp-sample 10m
kubectl get all,cm,secret -l app.kubernetes.io/name=demoapp-samplePart D - Mapping CR spec to Helm values (precedence)
The implicit mapping in action
Everything in your CR's .spec becomes Helm values verbatim. Watch the wire:
kubectl patch demoapp demoapp-sample --type=merge -p '{"spec":{"replicaCount":3,"message":"Updated via patch"}}'
# demoapp.demo.example.com/demoapp-sample patchedTail the operator log:
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager \
-c manager -f
# "Upgraded release" "name": "demoapp-sample"Effect on the cluster:
kubectl get deploy demoapp-sample -o jsonpath='{.spec.replicas}'
# 3
kubectl get cm demoapp-sample -o jsonpath='{.data.index\.html}'
# <html><body><h1>Updated via patch</h1></body></html>There is no glue code. The operator JSON-marshals .spec, hands it to helm upgrade as values, and the chart renders new manifests.
overrideValues - operator-level value injection
overrideValues lives in watches.yaml and takes precedence over the CR's .spec. Edit watches.yaml:
- group: demo.example.com
version: v1alpha1
kind: DemoApp
chart: helm-charts/demo-app
overrideValues:
image: "nginx:1.27-alpine"
apiKey: '{{ default "changeme-defaulted" (env "OPERATOR_DEFAULT_API_KEY") }}'The image line pins every CR's image to nginx 1.27 — users cannot downgrade. The apiKey line uses a Sprig Go-template with default so it falls back to the literal string changeme-defaulted when the env var is unset.
Declare the env var on the manager pod in config/manager/manager.yaml:
env:
- name: OPERATOR_DEFAULT_API_KEY
valueFrom:
secretKeyRef:
name: operator-defaults
key: api-key
optional: trueThe scaffolded manager has no
WATCH_NAMESPACEenv var at all — cluster scope is implicit when the variable is unset. Add it explicitly only when you want to flip to namespace scope (Part H).
Rebuild and redeploy - watches.yaml is baked into the operator image at docker-build time, so you need a fresh build. Generate a fresh ttl.sh URL for this iteration (24 h is plenty for working through Part 2):
export IMG=ttl.sh/demoapp-$(uuidgen):24h
make docker-build IMG="$IMG"
docker push "$IMG"
make deploy IMG="$IMG"
kubectl -n demo-app-operator-system rollout status \
deploy/demo-app-operator-controller-manager
# deployment "demo-app-operator-controller-manager" successfully rolled outNow create a CR that tries to override image:
cat <<EOF | kubectl apply -f -
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
name: app-trying-override
spec:
image: "nginx:1.99-bogus" # ignored - overrideValues wins
replicaCount: 2
message: "I tried to set a bogus image"
apiKey: "user-supplied" # ignored - overrideValues wins (with default fallback)
EOFVerify the Deployment uses the operator-pinned image, not the CR's:
kubectl get deploy app-trying-override -o jsonpath='{.spec.template.spec.containers[0].image}'
# nginx:1.27-alpineThe framework also emits a Warning Event per overridden field — useful for showing CR authors why their value was ignored:
kubectl get events --field-selector involvedObject.name=app-trying-override \
--sort-by=.lastTimestamp
# LAST SEEN TYPE REASON OBJECT MESSAGE
# 12s Warning OverrideValuesInUse demoapp/app-trying-override Chart value "image" overridden to "nginx:1.27-alpine" by operator's watches.yaml
# 12s Warning OverrideValuesInUse demoapp/app-trying-override Chart value "apiKey" overridden to "changeme-defaulted" by operator's watches.yamlPrecedence rules (full)
From highest to lowest:
| Tier | Source | Where defined |
|---|---|---|
| 1 | overrideValues (with env-var substitution) |
watches.yaml in operator image |
| 2 | CR's .spec |
The CR YAML applied by the user |
| 3 | Chart's values.yaml |
helm-charts/demo-app/values.yaml |
| 4 | Template fallbacks ({{ default "x" .Values.y }}) |
Inside chart templates |
The reconciler computes the effective values map by merging in that order and passes it to Helm.
Worked patterns
Per-environment defaults — same operator image, different defaults per cluster, sourced from the operator pod's env. Make every required env var an explicit value: (or a valueFrom:) on the manager pod — never rely on shell fallback syntax:
overrideValues:
image: '{{ env "IMAGE_PREFIX" }}/nginx:{{ default "latest" (env "IMAGE_TAG") }}'
ingress:
className: '{{ default "nginx" (env "INGRESS_CLASS") }}'Pair with config/manager/manager.yaml:
env:
- name: IMAGE_PREFIX
value: "registry.prod.internal"
- name: IMAGE_TAG
value: "1.27-alpine"
- name: INGRESS_CLASS
value: "nginx-internal"Cluster-wide labels — inject a common label every chart-rendered resource carries:
overrideValues:
commonLabels:
cluster: '{{ env "CLUSTER_NAME" }}'
cost-center: '{{ default "unallocated" (env "COST_CENTER") }}'The chart must consume these (e.g., merge into _helpers.tpl's label block). The operator just makes the values available.
Hot-fix without releasing the chart - emergency knob to force every release onto a patched image without editing every CR:
overrideValues:
image: "registry.prod.internal/nginx:1.27.0-cve-fix-2"Edit, rebuild, redeploy - every CR reconciles within reconcilePeriod and upgrades to the patched image. The CRs themselves are untouched.
Secret handling - the right way
Never put secret material in overrideValues literally. Two patterns work:
Pattern 1 - env var sourced from a Kubernetes Secret on the operator pod:
# watches.yaml
overrideValues:
apiKey: "${SHARED_DEFAULT_API_KEY}"
# config/manager/manager.yaml
env:
- name: SHARED_DEFAULT_API_KEY
valueFrom:
secretKeyRef:
name: operator-secrets
key: shared-api-keyPattern 2 - keep the secret entirely outside the operator; chart references a pre-existing Secret by name:
Have the chart take a secretRef and use envFrom: secretRef.name=... in the Deployment. The user provisions the Secret out-of-band; the CR only carries the name.
spec:
secretRefName: "my-prod-credentials"Pattern 2 keeps the secret out of the operator image entirely.
Part E - Lifecycle: upgrade and uninstall
Helm 3 vs Helm 4 note. The Helm release Secret format (
sh.helm.release.v1.<name>.v<rev>) shown in this section is unchanged between Helm 3 and Helm 4 - the same naming, same storage, samehelm.sh/release.v1type field. The one operationally visible difference if/when the Operator SDK ships with Helm 4 SDK: new Helm 4 installs default to Server-Side Apply (SSA), so updates show up as field-managed patches rather than client-side full replacements. Existing releases originally installed with Helm 3 keep using client-side apply on upgrade unless you explicitly migrate them. None of the demos below need to be re-run when Helm 4 SDK lands in operator-sdk - the CR-driven flow is identical.
CR update triggers helm upgrade
Any change to .spec triggers a reconcile that calls helm upgrade. Demo:
kubectl patch demoapp demoapp-sample --type=merge \
-p '{"spec":{"replicaCount":2,"message":"Hello revision 2"}}'
# Helm tracks revisions in its release Secret - one Secret per kept revision
kubectl get secret -l owner=helm,name=demoapp-sample
# NAME TYPE DATA AGE
# sh.helm.release.v1.demoapp-sample.v2 helm.sh/release.v1 1 8s
# (you may also see vN secrets for older kept revisions, depending on max-release-history)Each release Secret is named sh.helm.release.v1.<release>.v<revision>. The .vN suffix is the revision counter — it increments with every reconcile that calls helm upgrade, whether the upgrade succeeded or failed. The highest-numbered Secret is the current revision; older revisions are pruned by helm-operator's --max-release-history limit (low by default — bump it via the manager's args if you want more history for debugging). The exact number of Secrets you see at any moment depends on recent activity (a clean release after upgrades vs. one mid-rollback can leave a different count), so don't be surprised if it's 1, 3, or somewhere in between. If you've iterated on the CR a few times your suffix may already be .v4 or much higher (failed reconciles during a broken chart can push it into triple digits within an hour) — that's expected behavior, not a bug. The friendlier view of the same data:
helm history demoapp-sample -n default
# REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
# 1 Wed Jun 3 14:00:00 2026 superseded demo-app-0.1.0 1.16.0 Install complete
# 2 Wed Jun 3 14:12:34 2026 deployed demo-app-0.1.0 1.16.0 Upgrade completeThe Deployment's template hash changes when the rendered Pod spec changes, triggering a rolling update:
kubectl rollout status deploy/demoapp-sample
# deployment "demoapp-sample" successfully rolled outConfirm the patched values actually landed in the rendered resources. The demo-app chart's ConfigMap exposes the message through the index.html key (it serves it as an nginx welcome page), so the jsonpath needs to escape the dot in the key name:
kubectl get deploy demoapp-sample -o jsonpath='{.spec.replicas}{"\n"}'
# 2
kubectl get cm demoapp-sample -o jsonpath='{.data.index\.html}{"\n"}'
# <html><body><h1>Hello revision 2</h1></body></html>Immutable field gotchas
Some chart edits produce manifests Kubernetes refuses to update — not because Helm or the operator does anything wrong, but because the underlying Kubernetes API rejects the change. The most common culprits:
| Resource | Immutable field(s) |
|---|---|
Service |
spec.clusterIP (the assigned IP itself; spec.type is mutable in k8s 1.20+ — ClusterIP/NodePort/LoadBalancer transitions all work) |
Deployment |
spec.selector (the label selector picking up pods) |
StatefulSet |
spec.selector, spec.serviceName, spec.podManagementPolicy, most of spec.volumeClaimTemplates |
Job |
spec.template (the whole pod template once the Job exists) |
PersistentVolumeClaim |
spec.storageClassName, spec.accessModes, shrinking spec.resources.requests.storage |
The demo-app chart in this tutorial doesn't expose any CR field that maps to one of these — every value (replicaCount, image, message, apiKey, service.type, service.port) targets a mutable Kubernetes field, so any CR patch you make will succeed end-to-end. (Try kubectl patch demoapp demoapp-sample --type=merge -p '{"spec":{"service":{"type":"NodePort"}}}' — helm history will show a clean Upgrade complete.)
In a real-world chart you will eventually hit one. The failure pattern is predictable:
- The reconcile call to
helm upgradefails with a*.spec.<field>: Invalid valueerror from the Kubernetes API. - The error surfaces in the operator log under the CR's name (
grep <cr-name>in the manager logs). - The framework sets
status.conditions[type=Released].reason=UpgradeErrorand keeps the previous release running. Nothing in the cluster breaks — the old resources keep serving traffic; only the upgrade got rejected.
Once it happens, you have three recovery options, in increasing order of disruption:
# 1. Revert the offending CR field (zero downtime, easiest):
kubectl patch demoapp <name> --type=merge -p '{"spec":{"<field>":"<previous-value>"}}'
# 2. Delete just the immutable resource, let the next reconcile recreate it (brief gap):
kubectl delete <kind>/<name>
# next reconcile (within reconcilePeriod, or instantly if you trigger one) re-creates it from the chart
# 3. Delete the CR entirely and re-apply with the new spec (full release re-install):
kubectl delete demoapp <name>
kubectl apply -f <updated-cr.yaml>The operator will never delete-and-recreate a resource to satisfy an incompatible upgrade on its own — that's a deliberate design choice (silent destruction of stateful resources would be far worse than a noisy failure). Plan immutable-field changes deliberately, or design your chart so the field that varies between releases isn't immutable (e.g., parameterize the StatefulSet's serviceName to {{ .Release.Name }}-headless rather than exposing it through the CR).
Worked example — chart-driven Deployment selector change
The cleanest way to see the failure pattern is to make a chart change that touches Deployment.spec.selector — a field Kubernetes refuses to mutate. This is the same situation any chart maintainer hits the day they decide to add a label convention (e.g., app.kubernetes.io/component: web) to existing selectors.
Step 1 — Overwrite the chart template. Overwrite the entire contents of helm-charts/demo-app/templates/deployment.yaml with the version below (do not edit it in place — delete the existing file's contents first, then paste this). Two intentional changes from Part 1: a component: web label is added to both spec.selector.matchLabels and spec.template.metadata.labels (a selector label must also appear on the pods it selects), and the pod template's labels: {{- include "demo-app.labels" . | nindent 8 }} one-liner is swapped for three hardcoded label lines so indentation can't go wrong:
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "demo-app.name" . }}
labels: {{- include "demo-app.labels" . | nindent 4 }}
spec:
replicas: {{ .Values.replicaCount }}
selector:
matchLabels:
app.kubernetes.io/name: {{ include "demo-app.name" . }}
app.kubernetes.io/component: web
template:
metadata:
labels:
app.kubernetes.io/name: {{ include "demo-app.name" . }}
app.kubernetes.io/managed-by: demo-app-operator
app.kubernetes.io/component: web
spec:
containers:
- name: web
image: {{ .Values.image }}
ports:
- containerPort: 80
env:
- name: API_KEY
valueFrom:
secretKeyRef:
name: {{ include "demo-app.name" . }}
key: api-key
volumeMounts:
- name: web-content
mountPath: /usr/share/nginx/html
volumes:
- name: web-content
configMap:
name: {{ include "demo-app.name" . }}Sanity-check the render before rebuilding:
helm template demo-app helm-charts/demo-app --show-only templates/deployment.yaml \
| grep -E 'matchLabels:|labels:|component'
# labels:
# matchLabels:
# app.kubernetes.io/component: web
# labels:
# app.kubernetes.io/component: webStep 2 — Rebuild and redeploy the operator. The chart ships inside the operator image, so the change reaches the cluster only after a rebuild:
export IMG=ttl.sh/demoapp-$(uuidgen):24h
make docker-build IMG="$IMG"
docker push "$IMG"
make deploy IMG="$IMG"
kubectl -n demo-app-operator-system rollout status deploy/demo-app-operator-controller-managerStep 3 — Force a reconcile and observe the failure. Trigger a reconcile:
kubectl annotate demoapp demoapp-sample force-reconcile=$(date +%s) --overwrite
sleep 5The operator log shows the failure:
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager \
-c manager --tail=80 | grep 'Release failed'
# {"level":"error","logger":"helm.controller","msg":"Release failed",
# "namespace":"default","name":"demoapp-sample","release":"demoapp-sample",
# "error":"upgrade failed; rollback required"}The CR's ReleaseFailed condition carries the same wrapper:
kubectl get demoapp demoapp-sample -o jsonpath='{.status.conditions[?(@.type=="ReleaseFailed")]}' | jq
# {
# "lastTransitionTime": "2026-06-03T04:52:54Z",
# "message": "upgrade failed; rollback required",
# "reason": "UpgradeError",
# "status": "True",
# "type": "ReleaseFailed"
# }ReleaseFailed with reason: UpgradeError is the only signal — helm-operator collapses the K8s API's actual spec.selector: Invalid value: ... field is immutable error and doesn't surface it anywhere. Diagnose from the chart diff and your knowledge of K8s immutability rules.
The live Pods, Service, ConfigMap, and Secret are unchanged — K8s rejected each patch before it touched anything. No outage.
Step 4 — Recover by deleting the Deployment. The selector is the immutable field; deleting the Deployment lets the next reconcile recreate it from scratch with the new selector:
kubectl delete deploy demoapp-sample
kubectl annotate demoapp demoapp-sample force-reconcile=$(date +%s) --overwrite
# Poll until the new Deployment appears with the new selector (10–20s typically)
for i in $(seq 1 20); do
sel=$(kubectl get deploy demoapp-sample -o jsonpath='{.spec.selector.matchLabels}' 2>/dev/null)
if [[ "$sel" == *component* ]]; then echo "recovered: $sel"; break; fi
sleep 2
done
# recovered: {"app.kubernetes.io/component":"web","app.kubernetes.io/name":"demoapp-sample"}
helm history demoapp-sample -n default | tail -1
# REVISION UPDATED STATUS CHART APP VERSION DESCRIPTION
# N ... deployed demo-app-0.1.0 1.16.0 Upgrade completeIf the first force-reconcile still fails (the failed-upgrade backoff can swallow it), repeat the
kubectl annotate ... force-reconcile=...line once more — the second attempt almost always succeeds because the offending Deployment is gone and the operator now plans an "install" of that one resource rather than an "upgrade." Pods are absent for a few seconds while the new Deployment scales up — plan immutable-field changes around a maintenance window in production.
Step 5 — Cleanup. Overwrite templates/deployment.yaml back to the Part 1 version (restore labels: {{- include "demo-app.labels" . | nindent 8 }} and remove both component: web lines). Rebuild + push + deploy + wait rollout. The first reconcile of the reverted chart fails the same way (selector change in reverse) — repeat the Step 4 recovery once more (kubectl delete deploy demoapp-sample + force-reconcile) to land back on the original selector. Subsequent sections of this article assume the chart is in its original Part 1 shape.
CR delete triggers helm uninstall
kubectl delete demoapp demoapp-sample
# demoapp.demo.example.com "demoapp-sample" deleted
kubectl get all,cm,secret -l app.kubernetes.io/name=demoapp-sample
# No resources found in default namespace.The operator runs helm uninstall. Helm's cascade removes every resource in the release manifest. The Helm release Secret is also deleted:
kubectl get secret -l owner=helm
# No resources found in default namespace.What helm uninstall does NOT clean
| Not cleaned | Why |
|---|---|
| PersistentVolumeClaims (if the chart didn't own them) | Helm only deletes resources in the rendered manifest |
| External resources (cloud DNS, IAM roles, S3 buckets) | Outside the cluster - Helm cannot see them |
| Namespaces created externally | The chart's release scope is resources, not the namespace |
Stale data left behind in Secrets you reused |
The chart never owned them |
This is the largest single reason teams outgrow the pre-built operator. Custom pre-uninstall logic (e.g., release an external IP, deregister from a service mesh) requires the Helm hybrid operator.
Status conditions populated by the framework
kubectl get demoapp demoapp-sample -o yaml | grep -A 30 '^status:'Three condition types are written by the reconciler:
| Condition type | When True |
|---|---|
Initialized |
Reconciler has read the CR and bound it to a chart |
Deployed |
helm install or helm upgrade succeeded; release is in deployed state |
ReleaseFailed |
Last Helm operation failed (with message and reason populated) |
Plus status.deployedRelease.name (Helm release name) and status.deployedRelease.manifest (full rendered YAML).
You cannot add a fourth condition type. This is one of the ceiling items in Part I.
Part F - Drift detection
Two mechanisms
| Mechanism | Trigger | Latency |
|---|---|---|
watchDependentResources |
Event on any chart-rendered resource | Sub-second |
reconcilePeriod |
Periodic timer per CR | Up to the configured period |
Both are on by default (true and 1m). Together they catch event-driven drift instantly and silent drift on the cadence interval.
Demo 1 - revert a manual ConfigMap edit
Make sure you have a fresh CR:
kubectl apply -f config/samples/demo_v1alpha1_demoapp.yaml
sleep 10
kubectl get cm demoapp-sample -o jsonpath='{.data.index\.html}'
# <html><body><h1>Hello from demo-app</h1></body></html>Edit the ConfigMap manually:
kubectl patch cm demoapp-sample --type=merge \
-p '{"data":{"index.html":"<html><body><h1>HACKED</h1></body></html>"}}'Poll until the operator reverts it — in practice this is the periodic resync (default reconcilePeriod: 1m) that catches it; the dependent-resource event-driven path is best-effort and is not reliably sub-second in current helm-operator builds (see note below):
for i in $(seq 1 90); do
v=$(kubectl get cm demoapp-sample -o jsonpath='{.data.index\.html}')
if [[ "$v" != *HACKED* ]]; then echo "reverted after ~$((i*2))s"; break; fi
sleep 2
done
# reverted after ~30s
kubectl get cm demoapp-sample -o jsonpath='{.data.index\.html}'
# <html><body><h1>Hello from demo-app</h1></body></html>
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager -c manager --tail=5
# "Upgraded release" "name":"demoapp-sample"
# "Reconciled release" "name":"demoapp-sample"The change is gone. watchDependentResources: true is supposed to fire on every ConfigMap event — in our testing with helm-operator v1.42 the event-driven path was slower than advertised, and reverts often only happened on the periodic resync (reconcilePeriod). Either way, the chart re-renders and overwrites the manual edit.
Drift timing reality check. The original docs/article promise "sub-second" event-driven reverts. On a real fresh kind cluster with helm-operator v1.42 we measured 20–40 seconds for both ConfigMap and Service drift, lining up with the
reconcilePeriod: 1mboundary, not the watch-event boundary. The fastest reliable way to demonstrate sub-reconcilePeriodcorrection is to dropreconcilePeriodto10sinwatches.yamlfor the demo and rebuild the operator image — both mechanisms are still in play, but timing assertions in client docs should be a ceiling, not a floor.
Demo 2 - recreate a deleted Service
kubectl delete svc demoapp-sample
# service "demoapp-sample" deleted
# Poll until it reappears (typically ~20s; up to reconcilePeriod)
for i in $(seq 1 60); do
if kubectl get svc demoapp-sample &>/dev/null; then
echo "recreated after ~$((i*2))s"; break
fi
sleep 2
done
# recreated after ~20s
kubectl get svc demoapp-sample
# NAME TYPE CLUSTER-IP PORT(S) AGE
# demoapp-sample ClusterIP 10.96.130.109 80/TCP 2sThe Service is back, with a new ClusterIP. Same mechanism — the deletion event (eventually) woke the operator, it re-applied the chart, the Service got recreated. Tighten reconcilePeriod if you need lower recovery latency in production.
What drift CANNOT be detected
| Out-of-band change | Why invisible |
|---|---|
| A NetworkPolicy added externally that blocks the chart's pods | NetworkPolicy is not in the rendered manifest; not watched |
| PVCs the chart did not template | Same - not in the rendered manifest |
Changes to a Secret the chart envFrom's but does not own |
Watched only if the chart renders it |
| External infra rot (DNS, IAM, cloud LB) | Outside the cluster - the operator has no view |
| RBAC changes affecting the chart's ServiceAccount | The chart owns the SA, but RBAC rules attached externally are not |
The operator only knows about resources Helm rendered. Anything else is invisible drift.
Tuning reconcilePeriod at scale
| CR count | Recommended reconcilePeriod |
Reason |
|---|---|---|
| < 10 | 30s (or default 1m) |
Fast feedback during development |
| 10 - 100 | 1m (default) - 5m |
Balanced |
| 100 - 1000 | 5m - 15m |
API server protection becomes the dominant concern |
| > 1000 | 15m - 30m and shard by selector |
Single operator becomes a bottleneck |
Setting reconcilePeriod: 5s with hundreds of CRs is a common foot-gun - the operator pegs CPU and the API server takes the punishment. The event-driven path (watchDependentResources) is free; the periodic resync is not.
Turning drift correction off (temporarily)
For debugging or planned maintenance you can disable event-driven drift correction:
- group: demo.example.com
version: v1alpha1
kind: DemoApp
chart: helm-charts/demo-app
watchDependentResources: false # only react to CR events + periodic resyncRebuild and redeploy. Drift is now only corrected on the reconcilePeriod cadence. Remember to flip it back - we have seen this disabled "for an investigation" and forgotten for months.
Part G - Helm hooks (the pre-built's escape hatch for custom work)
Helm hooks are how you do "extra work" around the chart without leaving the pre-built operator. They are normal Kubernetes resources (almost always Jobs) annotated to run at a specific Helm lifecycle phase.
What Helm hooks are
| Hook | Runs when |
|---|---|
pre-install |
Before any chart resource is rendered on first install |
post-install |
After all chart resources have been created on first install |
pre-upgrade |
Before any chart resource is updated on upgrade |
post-upgrade |
After all chart resources have been updated on upgrade |
pre-delete |
Before Helm cascades the uninstall |
post-delete |
After Helm has cascaded the uninstall |
The pre-built operator honours every hook automatically because it just calls helm install / upgrade / uninstall under the hood.
Worked example 1 - pre-install Job seeds a ConfigMap
The use case: before the app starts, write a fixed seed-data ConfigMap the app expects. Add a new template helm-charts/demo-app/templates/seed-hook.yaml:
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "demo-app.name" . }}-seed
annotations:
helm.sh/hook: pre-install
helm.sh/hook-weight: "0"
helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
spec:
template:
spec:
restartPolicy: Never
containers:
- name: seed
image: alpine/k8s:1.31.0
command:
- sh
- -c
- |
kubectl create configmap {{ include "demo-app.name" . }}-seed \
--from-literal=greeting="seeded at $(date)" \
--dry-run=client -o yaml | kubectl apply -f -
serviceAccountName: {{ include "demo-app.name" . }}-seed
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "demo-app.name" . }}-seed
annotations:
helm.sh/hook: pre-install
helm.sh/hook-weight: "-10"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: {{ include "demo-app.name" . }}-seed
annotations:
helm.sh/hook: pre-install
helm.sh/hook-weight: "-10"
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["create", "patch", "get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: {{ include "demo-app.name" . }}-seed
annotations:
helm.sh/hook: pre-install
helm.sh/hook-weight: "-10"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: {{ include "demo-app.name" . }}-seed
subjects:
- kind: ServiceAccount
name: {{ include "demo-app.name" . }}-seedRebuild and redeploy with a fresh ttl.sh URL:
export IMG=ttl.sh/demoapp-$(uuidgen):24h
make docker-build IMG="$IMG"
docker push "$IMG"
make deploy IMG="$IMG"
kubectl -n demo-app-operator-system rollout status \
deploy/demo-app-operator-controller-managerApply a fresh CR and verify the seed ran. Delete the previous release first — hooks live in the rendered release manifest, so the pre-install hook only runs on a clean helm install, not on an helm upgrade of an already-deployed release:
kubectl delete demoapp demoapp-sample --ignore-not-found
# Wait until the namespace is fully cleaned (helm release Secret gone) before re-applying
for i in $(seq 1 30); do
if ! kubectl get secret -l owner=helm,name=demoapp-sample -n default 2>&1 | grep -q demoapp-sample; then
echo "release torn down"; break
fi
sleep 2
done
kubectl apply -f config/samples/demo_v1alpha1_demoapp.yaml
# Poll for the seed ConfigMap. First-time alpine/k8s:1.31.0 pull is ~300 MB,
# so plan for ~60–120s on a fresh cluster, ~5s on re-runs (image cached).
for i in $(seq 1 60); do
cm=$(kubectl get cm demoapp-sample-seed -o jsonpath='{.data.greeting}' 2>&1)
if [[ "$cm" == seeded* ]]; then echo "seeded after ~$((i*5))s: $cm"; break; fi
sleep 5
done
# seeded after ~5s: seeded at Wed Jun 3 14:10:52 UTC 2026The seed ConfigMap exists because the pre-install Job ran and created it before the Deployment came up. The Job itself is already gone — the helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation annotation deletes the Job as soon as it succeeds (and again before the next install, to avoid stale ones). If you want to see the Job during the run, use kubectl get jobs -w in another terminal before applying the CR.
Why the first run is slow. The
alpine/k8s:1.31.0image is ~300 MB (it bundleskubectl,helm, etc.) and is not in the kind node image. On a fresh cluster the kubelet pulls it once at hook time — that adds ~1 minute. Subsequent installs (delete + apply on the same kind node) reuse the cached image and the hook completes in seconds.
Worked example 2 - pre-delete Job exports state before uninstall
The use case: before tearing down, dump the current ConfigMap to a last-known-state ConfigMap in another namespace so post-mortem analysis is possible. Add helm-charts/demo-app/templates/export-hook.yaml:
apiVersion: v1
kind: Namespace
metadata:
name: demoapp-archive
annotations:
helm.sh/hook: pre-delete
helm.sh/hook-weight: "-20"
---
apiVersion: batch/v1
kind: Job
metadata:
name: {{ include "demo-app.name" . }}-export
annotations:
helm.sh/hook: pre-delete
helm.sh/hook-weight: "0"
helm.sh/hook-delete-policy: hook-succeeded
spec:
template:
spec:
restartPolicy: Never
serviceAccountName: {{ include "demo-app.name" . }}-export
containers:
- name: export
image: alpine/k8s:1.31.0
command:
- sh
- -c
- |
kubectl get cm {{ include "demo-app.name" . }} -o yaml > /tmp/state.yaml
kubectl -n demoapp-archive create cm \
{{ include "demo-app.name" . }}-$(date +%s) \
--from-file=state.yaml=/tmp/state.yaml || true
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ include "demo-app.name" . }}-export
annotations:
helm.sh/hook: pre-delete
helm.sh/hook-weight: "-10"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: {{ include "demo-app.name" . }}-export
annotations:
helm.sh/hook: pre-delete
helm.sh/hook-weight: "-10"
rules:
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: {{ include "demo-app.name" . }}-export
annotations:
helm.sh/hook: pre-delete
helm.sh/hook-weight: "-10"
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: {{ include "demo-app.name" . }}-export
subjects:
- kind: ServiceAccount
name: {{ include "demo-app.name" . }}-export
namespace: defaultThe operator image bakes the chart in at build time, so rebuild and redeploy before testing the new hook:
export IMG=ttl.sh/demoapp-$(uuidgen):24h
make docker-build IMG="$IMG"
docker push "$IMG"
make deploy IMG="$IMG"
kubectl -n demo-app-operator-system rollout status \
deploy/demo-app-operator-controller-managerCritical sequencing: the pre-delete hook templates must be present in the chart at install time so they get baked into the rendered release manifest. If a release was installed with an older chart (no pre-delete hook), uninstalling it will skip the hook even after you rebuild the operator. Always tear down the existing release first, then re-install with the new chart, then test the delete:
# Tear down any existing release so the next install bakes in the new hook
kubectl delete demoapp demoapp-sample --ignore-not-found
kubectl delete ns demoapp-archive --ignore-not-found
for i in $(seq 1 30); do
if ! kubectl get secret -l owner=helm,name=demoapp-sample -n default 2>&1 | grep -q demoapp-sample; then
break
fi
sleep 2
done
# Fresh install — registers BOTH hooks in the release manifest
kubectl apply -f config/samples/demo_v1alpha1_demoapp.yaml
for i in $(seq 1 30); do
if helm list -n default 2>&1 | grep -q "demoapp-sample.*deployed"; then
echo "release deployed after ~$((i*5))s"; break
fi
sleep 5
done
# Now delete the CR — pre-delete hook runs first, then Helm cascades the uninstall
kubectl delete demoapp demoapp-sample
# Poll for the archive namespace + ConfigMap
for i in $(seq 1 60); do
if kubectl get ns demoapp-archive &>/dev/null && \
[[ $(kubectl -n demoapp-archive get cm --no-headers 2>&1 | grep -c demoapp-sample) -ge 1 ]]; then
echo "pre-delete export captured after ~$((i*3))s"; break
fi
sleep 3
done
kubectl -n demoapp-archive get cm
# NAME DATA AGE
# demoapp-sample-1780477875 1 30sThe archive lives even after helm uninstall finishes. The numeric suffix is the $(date +%s) (Unix epoch) baked into the ConfigMap name inside the hook Job, so it's unique per uninstall.
Gotcha we hit while testing this: if you don't tear down the previous release before applying the new hook-aware chart, you may run into a stale-finalizer condition (
Failed to add CR uninstall finalizer) where the operator never actually callshelm uninstallon the subsequent delete — so the pre-delete hook silently skips. The teardown loop above prevents that.
The limits of Helm hooks
| Limit | Impact |
|---|---|
| Hooks run as in-cluster Jobs | Cannot call external APIs without bundling clients in image |
| No retry-with-backoff semantics | A failed Job fails the entire install/upgrade/delete |
| Cannot block the API server (they run after Helm has decided to proceed) | No way to veto a delete based on external state |
Hard to read complex cluster state (yes you can kubectl get, but it's clunky) |
Not the right tool for "is the cluster ready for this?" |
| Hook YAML lives in the chart, not the operator | Same chart used in two operators gets the same hook logic |
For anything beyond "run a Job at this phase," you have outgrown the pre-built operator. The Helm hybrid operator replaces hooks with Go finalizers and pre/post-reconcile logic that can do anything.
Part H - Operator scope and multi-tenancy
Default: cluster-scoped, watches all namespaces
make deploy ships an operator pod with no WATCH_NAMESPACE env var at all — unset = watch all namespaces. RBAC is granted via ClusterRole. Verify:
kubectl -n demo-app-operator-system get deploy demo-app-operator-controller-manager \
-o yaml | grep -A1 WATCH_NAMESPACE
# (no output — modern operator-sdk scaffolds with WATCH_NAMESPACE unset)
# Cross-check from the manager's startup log instead:
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager \
-c manager | grep -m1 'Watching all namespaces'
# {"level":"info","ts":"...","logger":"cmd","msg":"Watching all namespaces"}
kubectl -n demo-app-operator-system get clusterrole | grep demo-app-operator-manager-role
# demo-app-operator-manager-roleThe operator can manage DemoApp CRs in any namespace.
Flipping to namespace-scoped
Two changes: the env var and the RBAC binding.
Change 1 - WATCH_NAMESPACE env var. Edit config/manager/manager.yaml:
env:
- name: WATCH_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespaceThis makes the operator only watch its own namespace (demo-app-operator-system by default - you would probably also rename it).
Change 2 - swap ClusterRole/ClusterRoleBinding for Role/RoleBinding. Edit config/rbac/role.yaml:
apiVersion: rbac.authorization.k8s.io/v1
kind: Role # was: ClusterRole
metadata:
name: manager-role
namespace: system # add a namespace
rules:
# ... same rules ...Edit config/rbac/role_binding.yaml:
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding # was: ClusterRoleBinding
metadata:
name: manager-rolebinding
namespace: system # add a namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role # was: ClusterRole
name: manager-role
subjects:
- kind: ServiceAccount
name: controller-manager
namespace: systemNote: the CRD itself stays cluster-scoped - CRDs are always cluster resources. What changes is that CR instances in other namespaces are no longer reconciled by this operator.
Rebuild (the changes are in config/, not the image, so make deploy is enough — reuse whatever IMG you set earlier):
make undeploy IMG="$IMG"
make deploy IMG="$IMG"Verify the operator log:
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager \
-c manager | head -5
# "Watching resource" "kind":"DemoApp" "namespace":"demo-app-operator-system"namespace is now the operator's own namespace, not "".
Per-tenant operator pattern - three options
| Option | Pro | Con |
|---|---|---|
| 1. One cluster-scoped operator (the default) | Single deployment, single image to upgrade | Single failure domain; one bad CR can stall all |
| 2. N namespace-scoped operators, one per tenant | Hard isolation; tenant blast-radius bounded | N deployments to manage; image upgrades multiplied |
3. One cluster-scoped operator + selector per CR |
Single deployment but per-tenant CR filtering | Selector overlap risk; not a real isolation layer |
Pick Option 1 for internal teams that trust each other. Option 2 for hard multi-tenancy. Option 3 only when you need horizontal sharding for performance, not isolation.
selector-based multi-tenancy (Option 3 walkthrough)
watches.yaml with a selector:
- group: demo.example.com
version: v1alpha1
kind: DemoApp
chart: helm-charts/demo-app
selector:
matchLabels:
tier: productionRun a second operator with a different selector (different watches.yaml, different image tag, or use a Kustomize overlay to patch watches.yaml):
- group: demo.example.com
version: v1alpha1
kind: DemoApp
chart: helm-charts/demo-app
selector:
matchLabels:
tier: stagingCRs are labeled at creation:
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
name: prod-app
labels:
tier: production # picked up by prod operator only
spec:
message: "Production app"
---
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
name: staging-app
labels:
tier: staging # picked up by staging operator only
spec:
message: "Staging app"The operators silently ignore CRs whose labels do not match. Verify:
# in the production operator's log:
kubectl logs -n demo-app-operator-prod deploy/demo-app-operator-controller-manager
# "Reconciling release" "name":"prod-app"
# (no mention of staging-app)When NOT to flip to namespace-scoped
| Operator role | Why cluster-scope is the right choice |
|---|---|
| Ingress controller | Watches Ingress across all namespaces |
| Monitoring/observability stack | Aggregates metrics cluster-wide |
| cert-manager-style certificate issuer | Issues certs for resources in any namespace |
| Network policy controller | Enforces policy across the whole cluster |
If the operator's job is to do something cluster-wide, namespace-scoping it does not make sense.
Part I - The hard ceiling (what the pre-built CANNOT do)
These are the features no amount of YAML tweaking will give you. Each one is the reason teams outgrow the pre-built operator. The Helm hybrid operator article addresses every one of them with concrete Go code.
1. Custom finalizer logic against external systems
The pre-built operator's "finalizer" is helm uninstall followed by Helm's cascade. There is no place to insert "first, deregister this instance from our internal billing API and wait for confirmation, then uninstall." Pre-delete hooks are the closest approximation but cannot easily call external APIs with retries or block the deletion on external state.
→ Hybrid: a Go finalizer that does anything before the Helm uninstall fires.
2. Custom status fields
The reconciler writes status.conditions with exactly three types (Initialized, Deployed, ReleaseFailed) plus status.deployedRelease. There is no way to add status.lastBackupTimestamp, status.externalServiceState, or any other custom field without forking the operator binary itself.
→ Hybrid: a DemoAppStatus Go struct with any fields you want.
3. Custom install/upgrade decision logic
The pre-built reconciler always installs on CR create and upgrades on CR update. "Only upgrade between 02:00 and 04:00 UTC," "wait for the database to be ready before upgrading," "do a dry run first and require manual approval" - none of these are expressible in YAML.
→ Hybrid: a few lines of Go in Reconcile to gate the install/upgrade call.
4. Reading external state during reconcile
The reconciler computes Helm values from .spec and overrideValues only. "Render with replicas equal to whatever our autoscaler service reports as the right number right now," "block install if our cloud quota check fails" - the pre-built operator has no place to make those calls.
→ Hybrid: standard Go HTTP/SDK calls inside Reconcile.
5. Cross-CR coordination
The pre-built reconciler treats each CR in isolation. "CR app-B waits until CR app-A reports Deployed: True before installing" requires reading another CR's status during reconcile - not supported.
→ Hybrid: list other CRs with the controller-runtime client and gate on their status.
6. Conditional resource rendering based on cluster state
You can write conditional logic inside Helm templates based on .Values, but you cannot read live cluster state. "Render the Ingress only if the nginx-ingress controller is installed on this cluster" requires querying the API for the existence of resources - the chart cannot do this; the operator cannot do this.
→ Hybrid: detect cluster capabilities at startup and inject them as values.
7. Watching arbitrary non-chart resources
watchDependentResources watches only the resource types the chart renders. "Reconcile DemoApp when a NetworkPolicy in the same namespace changes" requires adding an external watch - not supported by the pre-built operator.
→ Hybrid: controller.Watches(&source.Kind{Type: &netv1.NetworkPolicy{}}, ...) in SetupWithManager.
Summary - what's free, what you tune, what you can't have
| Capability | Pre-built (Parts A-H) | Hard ceiling (Part I) - hybrid required |
|---|---|---|
| Install / upgrade / uninstall | Free (Parts A and E) | - |
| CRD generation | Generated, permissive by default (Part B) | - |
CR .spec → Helm values |
Implicit, full mapping (Part D) | - |
| Operator-level value injection | overrideValues with env-var substitution (D) |
- |
| Drift detection | watchDependentResources + reconcilePeriod (F) |
Watching non-chart resources (#7) |
| Custom pre-uninstall | Helm pre-delete hook (G) | Custom finalizer logic (#1) |
| Status conditions | Fixed three: Initialized/Deployed/ReleaseFailed |
Custom status fields (#2) |
| Install/upgrade gating | Always-on | Custom decision logic (#3) |
| Reading external state in reconcile | - | Required for #4 |
| Cross-CR coordination | - | Required for #5 |
| Conditional render on cluster state | - | Required for #6 |
| Multi-tenancy | selector or namespace-scope swap (H) |
- |
If your operator's job stays inside the left column, you may never need anything else. If any of the right column applies, the Helm hybrid operator is the next stop - same chart, your own Go reconciler, none of the ceiling.
Further reading
- Helm-based operator tutorial Part 1 - build, CRD, watches.yaml.
- Helm hybrid operator (Go + Helm SDK) - the same operator, in Go, ceiling-free.
- Helm operator vs Flux vs Argo CD - when each tool is the right pick.
- Drift detection patterns in operators - language-neutral drift theory.
- Operator multi-tenancy patterns - cluster vs namespace, in depth.
- Status subresource and Conditions - the standard conditions contract.
- Finalizers explained - what real finalizers look like.
- Helm hooks documentation - the upstream Helm reference for every hook annotation.
Summary
Part 2 of the Helm-based operator tutorial walked the full operating envelope of the pre-built helm.sdk.operatorframework.io/v1 plugin. The CR's .spec is passed to Helm verbatim, with overrideValues in watches.yaml taking final precedence and env-var substitution covering per-environment defaults. Lifecycle is helm install / upgrade / uninstall driven by CR events, with the framework writing three fixed status conditions. Drift detection is event-driven (watchDependentResources) plus periodic (reconcilePeriod); together they cover noisy and silent drift on chart-rendered resources. Helm hooks are the only mechanism for custom pre/post install/upgrade/delete work, useful for in-cluster Jobs but unable to do real external-system work. Scope flips from cluster-wide to namespace via WATCH_NAMESPACE plus a ClusterRole-to-Role swap; multi-tenancy patterns range from one operator with selector to one operator per tenant. The hard ceiling - custom finalizers, custom status, custom decision logic, external state, cross-CR coordination, conditional rendering, non-chart watches - is the moment to graduate to the Helm hybrid operator.

