Helm-Based Operator Tutorial Part 2 - Lifecycle, Drift, Hooks, Scope, and the Hard Ceiling

Last reviewed: by
Helm-Based Operator Tutorial Part 2 - Lifecycle, Drift, Hooks, Scope, and the Hard Ceiling

Part 1 ended with a deployed operator and a running DemoApp CR that rendered an nginx Deployment, a Service, a ConfigMap, and a Secret. This Part 2 picks up there and walks everything you do with the operator afterwards: the upgrade and uninstall lifecycle, the full value precedence rules, drift recovery, Helm hooks as the only escape hatch for custom work, scope and multi-tenancy patterns, and the hard ceiling of features the pre-built operator cannot provide.

Code-first throughout. If you have not built the Part 1 operator, start there - this article assumes the same demo-app chart, DemoApp CR, and demo-app-operator deployed in your kind cluster.


Recap from Part 1

You have:

  • A demo-app chart with four templates: Deployment, Service, ConfigMap, Secret.
  • An operator project built with operator-sdk init --plugins=helm.sdk.operatorframework.io/v1, image pushed to a ttl.sh URL (e.g. ttl.sh/demoapp-<uuid>:24h) and deployed via make deploy IMG="$IMG".
  • A watches.yaml mapping demo.example.com/v1alpha1/DemoApp to helm-charts/demo-app.
  • A tightened CRD with OpenAPI v3 validation.
  • A DemoApp CR named demoapp-sample in the default namespace driving one chart release.

Quick sanity check before proceeding:

bash
kubectl get demoapps -A
# NAMESPACE   NAME              AGE
# default     demoapp-sample    10m

kubectl get all,cm,secret -l app.kubernetes.io/name=demoapp-sample

Part D - Mapping CR spec to Helm values (precedence)

The implicit mapping in action

Everything in your CR's .spec becomes Helm values verbatim. Watch the wire:

bash
kubectl patch demoapp demoapp-sample --type=merge -p '{"spec":{"replicaCount":3,"message":"Updated via patch"}}'
# demoapp.demo.example.com/demoapp-sample patched

Tail the operator log:

bash
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager \
  -c manager -f
# "Upgraded release"  "name": "demoapp-sample"

Effect on the cluster:

bash
kubectl get deploy demoapp-sample -o jsonpath='{.spec.replicas}'
# 3

kubectl get cm demoapp-sample -o jsonpath='{.data.index\.html}'
# <html><body><h1>Updated via patch</h1></body></html>

There is no glue code. The operator JSON-marshals .spec, hands it to helm upgrade as values, and the chart renders new manifests.

overrideValues - operator-level value injection

overrideValues lives in watches.yaml and takes precedence over the CR's .spec. Edit watches.yaml:

yaml
- group: demo.example.com
  version: v1alpha1
  kind: DemoApp
  chart: helm-charts/demo-app
  overrideValues:
    image: "nginx:1.27-alpine"
    apiKey: '{{ default "changeme-defaulted" (env "OPERATOR_DEFAULT_API_KEY") }}'

The image line pins every CR's image to nginx 1.27 — users cannot downgrade. The apiKey line uses a Sprig Go-template with default so it falls back to the literal string changeme-defaulted when the env var is unset.

Declare the env var on the manager pod in config/manager/manager.yaml:

yaml
        env:
          - name: OPERATOR_DEFAULT_API_KEY
            valueFrom:
              secretKeyRef:
                name: operator-defaults
                key: api-key
                optional: true

The scaffolded manager has no WATCH_NAMESPACE env var at all — cluster scope is implicit when the variable is unset. Add it explicitly only when you want to flip to namespace scope (Part H).

Rebuild and redeploy - watches.yaml is baked into the operator image at docker-build time, so you need a fresh build. Generate a fresh ttl.sh URL for this iteration (24 h is plenty for working through Part 2):

bash
export IMG=ttl.sh/demoapp-$(uuidgen):24h
make docker-build IMG="$IMG"
docker push "$IMG"
make deploy IMG="$IMG"

kubectl -n demo-app-operator-system rollout status \
  deploy/demo-app-operator-controller-manager
# deployment "demo-app-operator-controller-manager" successfully rolled out

Now create a CR that tries to override image:

bash
cat <<EOF | kubectl apply -f -
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
  name: app-trying-override
spec:
  image: "nginx:1.99-bogus"        # ignored - overrideValues wins
  replicaCount: 2
  message: "I tried to set a bogus image"
  apiKey: "user-supplied"           # ignored - overrideValues wins (with default fallback)
EOF

Verify the Deployment uses the operator-pinned image, not the CR's:

bash
kubectl get deploy app-trying-override -o jsonpath='{.spec.template.spec.containers[0].image}'
# nginx:1.27-alpine

The framework also emits a Warning Event per overridden field — useful for showing CR authors why their value was ignored:

bash
kubectl get events --field-selector involvedObject.name=app-trying-override \
  --sort-by=.lastTimestamp
# LAST SEEN   TYPE      REASON                OBJECT                            MESSAGE
# 12s         Warning   OverrideValuesInUse   demoapp/app-trying-override       Chart value "image" overridden to "nginx:1.27-alpine" by operator's watches.yaml
# 12s         Warning   OverrideValuesInUse   demoapp/app-trying-override       Chart value "apiKey" overridden to "changeme-defaulted" by operator's watches.yaml

Precedence rules (full)

From highest to lowest:

Tier Source Where defined
1 overrideValues (with env-var substitution) watches.yaml in operator image
2 CR's .spec The CR YAML applied by the user
3 Chart's values.yaml helm-charts/demo-app/values.yaml
4 Template fallbacks ({{ default "x" .Values.y }}) Inside chart templates

The reconciler computes the effective values map by merging in that order and passes it to Helm.

Worked patterns

Per-environment defaults — same operator image, different defaults per cluster, sourced from the operator pod's env. Make every required env var an explicit value: (or a valueFrom:) on the manager pod — never rely on shell fallback syntax:

yaml
overrideValues:
  image: '{{ env "IMAGE_PREFIX" }}/nginx:{{ default "latest" (env "IMAGE_TAG") }}'
  ingress:
    className: '{{ default "nginx" (env "INGRESS_CLASS") }}'

Pair with config/manager/manager.yaml:

yaml
        env:
          - name: IMAGE_PREFIX
            value: "registry.prod.internal"
          - name: IMAGE_TAG
            value: "1.27-alpine"
          - name: INGRESS_CLASS
            value: "nginx-internal"

Cluster-wide labels — inject a common label every chart-rendered resource carries:

yaml
overrideValues:
  commonLabels:
    cluster: '{{ env "CLUSTER_NAME" }}'
    cost-center: '{{ default "unallocated" (env "COST_CENTER") }}'

The chart must consume these (e.g., merge into _helpers.tpl's label block). The operator just makes the values available.

Hot-fix without releasing the chart - emergency knob to force every release onto a patched image without editing every CR:

yaml
overrideValues:
  image: "registry.prod.internal/nginx:1.27.0-cve-fix-2"

Edit, rebuild, redeploy - every CR reconciles within reconcilePeriod and upgrades to the patched image. The CRs themselves are untouched.

Secret handling - the right way

Never put secret material in overrideValues literally. Two patterns work:

Pattern 1 - env var sourced from a Kubernetes Secret on the operator pod:

yaml
# watches.yaml
overrideValues:
  apiKey: "${SHARED_DEFAULT_API_KEY}"

# config/manager/manager.yaml
        env:
          - name: SHARED_DEFAULT_API_KEY
            valueFrom:
              secretKeyRef:
                name: operator-secrets
                key: shared-api-key

Pattern 2 - keep the secret entirely outside the operator; chart references a pre-existing Secret by name:

Have the chart take a secretRef and use envFrom: secretRef.name=... in the Deployment. The user provisions the Secret out-of-band; the CR only carries the name.

yaml
spec:
  secretRefName: "my-prod-credentials"

Pattern 2 keeps the secret out of the operator image entirely.


Part E - Lifecycle: upgrade and uninstall

Helm 3 vs Helm 4 note. The Helm release Secret format (sh.helm.release.v1.<name>.v<rev>) shown in this section is unchanged between Helm 3 and Helm 4 - the same naming, same storage, same helm.sh/release.v1 type field. The one operationally visible difference if/when the Operator SDK ships with Helm 4 SDK: new Helm 4 installs default to Server-Side Apply (SSA), so updates show up as field-managed patches rather than client-side full replacements. Existing releases originally installed with Helm 3 keep using client-side apply on upgrade unless you explicitly migrate them. None of the demos below need to be re-run when Helm 4 SDK lands in operator-sdk - the CR-driven flow is identical.

CR update triggers helm upgrade

Any change to .spec triggers a reconcile that calls helm upgrade. Demo:

bash
kubectl patch demoapp demoapp-sample --type=merge \
  -p '{"spec":{"replicaCount":2,"message":"Hello revision 2"}}'

# Helm tracks revisions in its release Secret - one Secret per kept revision
kubectl get secret -l owner=helm,name=demoapp-sample
# NAME                                      TYPE                 DATA   AGE
# sh.helm.release.v1.demoapp-sample.v2      helm.sh/release.v1   1      8s
# (you may also see vN secrets for older kept revisions, depending on max-release-history)

Each release Secret is named sh.helm.release.v1.<release>.v<revision>. The .vN suffix is the revision counter — it increments with every reconcile that calls helm upgrade, whether the upgrade succeeded or failed. The highest-numbered Secret is the current revision; older revisions are pruned by helm-operator's --max-release-history limit (low by default — bump it via the manager's args if you want more history for debugging). The exact number of Secrets you see at any moment depends on recent activity (a clean release after upgrades vs. one mid-rollback can leave a different count), so don't be surprised if it's 1, 3, or somewhere in between. If you've iterated on the CR a few times your suffix may already be .v4 or much higher (failed reconciles during a broken chart can push it into triple digits within an hour) — that's expected behavior, not a bug. The friendlier view of the same data:

bash
helm history demoapp-sample -n default
# REVISION  UPDATED                     STATUS      CHART            APP VERSION  DESCRIPTION
# 1         Wed Jun  3 14:00:00 2026    superseded  demo-app-0.1.0   1.16.0       Install complete
# 2         Wed Jun  3 14:12:34 2026    deployed    demo-app-0.1.0   1.16.0       Upgrade complete

The Deployment's template hash changes when the rendered Pod spec changes, triggering a rolling update:

bash
kubectl rollout status deploy/demoapp-sample
# deployment "demoapp-sample" successfully rolled out

Confirm the patched values actually landed in the rendered resources. The demo-app chart's ConfigMap exposes the message through the index.html key (it serves it as an nginx welcome page), so the jsonpath needs to escape the dot in the key name:

bash
kubectl get deploy demoapp-sample -o jsonpath='{.spec.replicas}{"\n"}'
# 2
kubectl get cm demoapp-sample -o jsonpath='{.data.index\.html}{"\n"}'
# <html><body><h1>Hello revision 2</h1></body></html>

Immutable field gotchas

Some chart edits produce manifests Kubernetes refuses to update — not because Helm or the operator does anything wrong, but because the underlying Kubernetes API rejects the change. The most common culprits:

Resource Immutable field(s)
Service spec.clusterIP (the assigned IP itself; spec.type is mutable in k8s 1.20+ — ClusterIP/NodePort/LoadBalancer transitions all work)
Deployment spec.selector (the label selector picking up pods)
StatefulSet spec.selector, spec.serviceName, spec.podManagementPolicy, most of spec.volumeClaimTemplates
Job spec.template (the whole pod template once the Job exists)
PersistentVolumeClaim spec.storageClassName, spec.accessModes, shrinking spec.resources.requests.storage

The demo-app chart in this tutorial doesn't expose any CR field that maps to one of these — every value (replicaCount, image, message, apiKey, service.type, service.port) targets a mutable Kubernetes field, so any CR patch you make will succeed end-to-end. (Try kubectl patch demoapp demoapp-sample --type=merge -p '{"spec":{"service":{"type":"NodePort"}}}'helm history will show a clean Upgrade complete.)

In a real-world chart you will eventually hit one. The failure pattern is predictable:

  1. The reconcile call to helm upgrade fails with a *.spec.<field>: Invalid value error from the Kubernetes API.
  2. The error surfaces in the operator log under the CR's name (grep <cr-name> in the manager logs).
  3. The framework sets status.conditions[type=Released].reason=UpgradeError and keeps the previous release running. Nothing in the cluster breaks — the old resources keep serving traffic; only the upgrade got rejected.

Once it happens, you have three recovery options, in increasing order of disruption:

bash
# 1. Revert the offending CR field (zero downtime, easiest):
kubectl patch demoapp <name> --type=merge -p '{"spec":{"<field>":"<previous-value>"}}'

# 2. Delete just the immutable resource, let the next reconcile recreate it (brief gap):
kubectl delete <kind>/<name>
# next reconcile (within reconcilePeriod, or instantly if you trigger one) re-creates it from the chart

# 3. Delete the CR entirely and re-apply with the new spec (full release re-install):
kubectl delete demoapp <name>
kubectl apply -f <updated-cr.yaml>

The operator will never delete-and-recreate a resource to satisfy an incompatible upgrade on its own — that's a deliberate design choice (silent destruction of stateful resources would be far worse than a noisy failure). Plan immutable-field changes deliberately, or design your chart so the field that varies between releases isn't immutable (e.g., parameterize the StatefulSet's serviceName to {{ .Release.Name }}-headless rather than exposing it through the CR).

Worked example — chart-driven Deployment selector change

The cleanest way to see the failure pattern is to make a chart change that touches Deployment.spec.selector — a field Kubernetes refuses to mutate. This is the same situation any chart maintainer hits the day they decide to add a label convention (e.g., app.kubernetes.io/component: web) to existing selectors.

Step 1 — Overwrite the chart template. Overwrite the entire contents of helm-charts/demo-app/templates/deployment.yaml with the version below (do not edit it in place — delete the existing file's contents first, then paste this). Two intentional changes from Part 1: a component: web label is added to both spec.selector.matchLabels and spec.template.metadata.labels (a selector label must also appear on the pods it selects), and the pod template's labels: {{- include "demo-app.labels" . | nindent 8 }} one-liner is swapped for three hardcoded label lines so indentation can't go wrong:

yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "demo-app.name" . }}
  labels: {{- include "demo-app.labels" . | nindent 4 }}
spec:
  replicas: {{ .Values.replicaCount }}
  selector:
    matchLabels:
      app.kubernetes.io/name: {{ include "demo-app.name" . }}
      app.kubernetes.io/component: web
  template:
    metadata:
      labels:
        app.kubernetes.io/name: {{ include "demo-app.name" . }}
        app.kubernetes.io/managed-by: demo-app-operator
        app.kubernetes.io/component: web
    spec:
      containers:
        - name: web
          image: {{ .Values.image }}
          ports:
            - containerPort: 80
          env:
            - name: API_KEY
              valueFrom:
                secretKeyRef:
                  name: {{ include "demo-app.name" . }}
                  key: api-key
          volumeMounts:
            - name: web-content
              mountPath: /usr/share/nginx/html
      volumes:
        - name: web-content
          configMap:
            name: {{ include "demo-app.name" . }}

Sanity-check the render before rebuilding:

bash
helm template demo-app helm-charts/demo-app --show-only templates/deployment.yaml \
  | grep -E 'matchLabels:|labels:|component'
#   labels:
#     matchLabels:
#       app.kubernetes.io/component: web
#       labels:
#         app.kubernetes.io/component: web

Step 2 — Rebuild and redeploy the operator. The chart ships inside the operator image, so the change reaches the cluster only after a rebuild:

bash
export IMG=ttl.sh/demoapp-$(uuidgen):24h
make docker-build IMG="$IMG"
docker push "$IMG"
make deploy IMG="$IMG"

kubectl -n demo-app-operator-system rollout status deploy/demo-app-operator-controller-manager

Step 3 — Force a reconcile and observe the failure. Trigger a reconcile:

bash
kubectl annotate demoapp demoapp-sample force-reconcile=$(date +%s) --overwrite
sleep 5

The operator log shows the failure:

bash
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager \
  -c manager --tail=80 | grep 'Release failed'
# {"level":"error","logger":"helm.controller","msg":"Release failed",
#  "namespace":"default","name":"demoapp-sample","release":"demoapp-sample",
#  "error":"upgrade failed; rollback required"}

The CR's ReleaseFailed condition carries the same wrapper:

bash
kubectl get demoapp demoapp-sample -o jsonpath='{.status.conditions[?(@.type=="ReleaseFailed")]}' | jq
# {
#   "lastTransitionTime": "2026-06-03T04:52:54Z",
#   "message": "upgrade failed; rollback required",
#   "reason": "UpgradeError",
#   "status": "True",
#   "type": "ReleaseFailed"
# }

ReleaseFailed with reason: UpgradeError is the only signal — helm-operator collapses the K8s API's actual spec.selector: Invalid value: ... field is immutable error and doesn't surface it anywhere. Diagnose from the chart diff and your knowledge of K8s immutability rules.

The live Pods, Service, ConfigMap, and Secret are unchanged — K8s rejected each patch before it touched anything. No outage.

Step 4 — Recover by deleting the Deployment. The selector is the immutable field; deleting the Deployment lets the next reconcile recreate it from scratch with the new selector:

bash
kubectl delete deploy demoapp-sample
kubectl annotate demoapp demoapp-sample force-reconcile=$(date +%s) --overwrite

# Poll until the new Deployment appears with the new selector (10–20s typically)
for i in $(seq 1 20); do
  sel=$(kubectl get deploy demoapp-sample -o jsonpath='{.spec.selector.matchLabels}' 2>/dev/null)
  if [[ "$sel" == *component* ]]; then echo "recovered: $sel"; break; fi
  sleep 2
done
# recovered: {"app.kubernetes.io/component":"web","app.kubernetes.io/name":"demoapp-sample"}

helm history demoapp-sample -n default | tail -1
# REVISION  UPDATED  STATUS    CHART            APP VERSION  DESCRIPTION
# N         ...      deployed  demo-app-0.1.0   1.16.0       Upgrade complete

If the first force-reconcile still fails (the failed-upgrade backoff can swallow it), repeat the kubectl annotate ... force-reconcile=... line once more — the second attempt almost always succeeds because the offending Deployment is gone and the operator now plans an "install" of that one resource rather than an "upgrade." Pods are absent for a few seconds while the new Deployment scales up — plan immutable-field changes around a maintenance window in production.

Step 5 — Cleanup. Overwrite templates/deployment.yaml back to the Part 1 version (restore labels: {{- include "demo-app.labels" . | nindent 8 }} and remove both component: web lines). Rebuild + push + deploy + wait rollout. The first reconcile of the reverted chart fails the same way (selector change in reverse) — repeat the Step 4 recovery once more (kubectl delete deploy demoapp-sample + force-reconcile) to land back on the original selector. Subsequent sections of this article assume the chart is in its original Part 1 shape.

CR delete triggers helm uninstall

bash
kubectl delete demoapp demoapp-sample
# demoapp.demo.example.com "demoapp-sample" deleted

kubectl get all,cm,secret -l app.kubernetes.io/name=demoapp-sample
# No resources found in default namespace.

The operator runs helm uninstall. Helm's cascade removes every resource in the release manifest. The Helm release Secret is also deleted:

bash
kubectl get secret -l owner=helm
# No resources found in default namespace.

What helm uninstall does NOT clean

Not cleaned Why
PersistentVolumeClaims (if the chart didn't own them) Helm only deletes resources in the rendered manifest
External resources (cloud DNS, IAM roles, S3 buckets) Outside the cluster - Helm cannot see them
Namespaces created externally The chart's release scope is resources, not the namespace
Stale data left behind in Secrets you reused The chart never owned them

This is the largest single reason teams outgrow the pre-built operator. Custom pre-uninstall logic (e.g., release an external IP, deregister from a service mesh) requires the Helm hybrid operator.

Status conditions populated by the framework

bash
kubectl get demoapp demoapp-sample -o yaml | grep -A 30 '^status:'

Three condition types are written by the reconciler:

Condition type When True
Initialized Reconciler has read the CR and bound it to a chart
Deployed helm install or helm upgrade succeeded; release is in deployed state
ReleaseFailed Last Helm operation failed (with message and reason populated)

Plus status.deployedRelease.name (Helm release name) and status.deployedRelease.manifest (full rendered YAML).

You cannot add a fourth condition type. This is one of the ceiling items in Part I.


Part F - Drift detection

Two mechanisms

Mechanism Trigger Latency
watchDependentResources Event on any chart-rendered resource Sub-second
reconcilePeriod Periodic timer per CR Up to the configured period

Both are on by default (true and 1m). Together they catch event-driven drift instantly and silent drift on the cadence interval.

Demo 1 - revert a manual ConfigMap edit

Make sure you have a fresh CR:

bash
kubectl apply -f config/samples/demo_v1alpha1_demoapp.yaml
sleep 10
kubectl get cm demoapp-sample -o jsonpath='{.data.index\.html}'
# <html><body><h1>Hello from demo-app</h1></body></html>

Edit the ConfigMap manually:

bash
kubectl patch cm demoapp-sample --type=merge \
  -p '{"data":{"index.html":"<html><body><h1>HACKED</h1></body></html>"}}'

Poll until the operator reverts it — in practice this is the periodic resync (default reconcilePeriod: 1m) that catches it; the dependent-resource event-driven path is best-effort and is not reliably sub-second in current helm-operator builds (see note below):

bash
for i in $(seq 1 90); do
  v=$(kubectl get cm demoapp-sample -o jsonpath='{.data.index\.html}')
  if [[ "$v" != *HACKED* ]]; then echo "reverted after ~$((i*2))s"; break; fi
  sleep 2
done
# reverted after ~30s

kubectl get cm demoapp-sample -o jsonpath='{.data.index\.html}'
# <html><body><h1>Hello from demo-app</h1></body></html>

kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager -c manager --tail=5
# "Upgraded release"  "name":"demoapp-sample"
# "Reconciled release" "name":"demoapp-sample"

The change is gone. watchDependentResources: true is supposed to fire on every ConfigMap event — in our testing with helm-operator v1.42 the event-driven path was slower than advertised, and reverts often only happened on the periodic resync (reconcilePeriod). Either way, the chart re-renders and overwrites the manual edit.

Drift timing reality check. The original docs/article promise "sub-second" event-driven reverts. On a real fresh kind cluster with helm-operator v1.42 we measured 20–40 seconds for both ConfigMap and Service drift, lining up with the reconcilePeriod: 1m boundary, not the watch-event boundary. The fastest reliable way to demonstrate sub-reconcilePeriod correction is to drop reconcilePeriod to 10s in watches.yaml for the demo and rebuild the operator image — both mechanisms are still in play, but timing assertions in client docs should be a ceiling, not a floor.

Demo 2 - recreate a deleted Service

bash
kubectl delete svc demoapp-sample
# service "demoapp-sample" deleted

# Poll until it reappears (typically ~20s; up to reconcilePeriod)
for i in $(seq 1 60); do
  if kubectl get svc demoapp-sample &>/dev/null; then
    echo "recreated after ~$((i*2))s"; break
  fi
  sleep 2
done
# recreated after ~20s

kubectl get svc demoapp-sample
# NAME             TYPE        CLUSTER-IP      PORT(S)   AGE
# demoapp-sample   ClusterIP   10.96.130.109   80/TCP    2s

The Service is back, with a new ClusterIP. Same mechanism — the deletion event (eventually) woke the operator, it re-applied the chart, the Service got recreated. Tighten reconcilePeriod if you need lower recovery latency in production.

What drift CANNOT be detected

Out-of-band change Why invisible
A NetworkPolicy added externally that blocks the chart's pods NetworkPolicy is not in the rendered manifest; not watched
PVCs the chart did not template Same - not in the rendered manifest
Changes to a Secret the chart envFrom's but does not own Watched only if the chart renders it
External infra rot (DNS, IAM, cloud LB) Outside the cluster - the operator has no view
RBAC changes affecting the chart's ServiceAccount The chart owns the SA, but RBAC rules attached externally are not

The operator only knows about resources Helm rendered. Anything else is invisible drift.

Tuning reconcilePeriod at scale

CR count Recommended reconcilePeriod Reason
< 10 30s (or default 1m) Fast feedback during development
10 - 100 1m (default) - 5m Balanced
100 - 1000 5m - 15m API server protection becomes the dominant concern
> 1000 15m - 30m and shard by selector Single operator becomes a bottleneck

Setting reconcilePeriod: 5s with hundreds of CRs is a common foot-gun - the operator pegs CPU and the API server takes the punishment. The event-driven path (watchDependentResources) is free; the periodic resync is not.

Turning drift correction off (temporarily)

For debugging or planned maintenance you can disable event-driven drift correction:

yaml
- group: demo.example.com
  version: v1alpha1
  kind: DemoApp
  chart: helm-charts/demo-app
  watchDependentResources: false   # only react to CR events + periodic resync

Rebuild and redeploy. Drift is now only corrected on the reconcilePeriod cadence. Remember to flip it back - we have seen this disabled "for an investigation" and forgotten for months.


Part G - Helm hooks (the pre-built's escape hatch for custom work)

Helm hooks are how you do "extra work" around the chart without leaving the pre-built operator. They are normal Kubernetes resources (almost always Jobs) annotated to run at a specific Helm lifecycle phase.

What Helm hooks are

Hook Runs when
pre-install Before any chart resource is rendered on first install
post-install After all chart resources have been created on first install
pre-upgrade Before any chart resource is updated on upgrade
post-upgrade After all chart resources have been updated on upgrade
pre-delete Before Helm cascades the uninstall
post-delete After Helm has cascaded the uninstall

The pre-built operator honours every hook automatically because it just calls helm install / upgrade / uninstall under the hood.

Worked example 1 - pre-install Job seeds a ConfigMap

The use case: before the app starts, write a fixed seed-data ConfigMap the app expects. Add a new template helm-charts/demo-app/templates/seed-hook.yaml:

yaml
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "demo-app.name" . }}-seed
  annotations:
    helm.sh/hook: pre-install
    helm.sh/hook-weight: "0"
    helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: seed
          image: alpine/k8s:1.31.0
          command:
            - sh
            - -c
            - |
              kubectl create configmap {{ include "demo-app.name" . }}-seed \
                --from-literal=greeting="seeded at $(date)" \
                --dry-run=client -o yaml | kubectl apply -f -
      serviceAccountName: {{ include "demo-app.name" . }}-seed
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: {{ include "demo-app.name" . }}-seed
  annotations:
    helm.sh/hook: pre-install
    helm.sh/hook-weight: "-10"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: {{ include "demo-app.name" . }}-seed
  annotations:
    helm.sh/hook: pre-install
    helm.sh/hook-weight: "-10"
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["create", "patch", "get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: {{ include "demo-app.name" . }}-seed
  annotations:
    helm.sh/hook: pre-install
    helm.sh/hook-weight: "-10"
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: {{ include "demo-app.name" . }}-seed
subjects:
  - kind: ServiceAccount
    name: {{ include "demo-app.name" . }}-seed

Rebuild and redeploy with a fresh ttl.sh URL:

bash
export IMG=ttl.sh/demoapp-$(uuidgen):24h
make docker-build IMG="$IMG"
docker push "$IMG"
make deploy IMG="$IMG"

kubectl -n demo-app-operator-system rollout status \
  deploy/demo-app-operator-controller-manager

Apply a fresh CR and verify the seed ran. Delete the previous release first — hooks live in the rendered release manifest, so the pre-install hook only runs on a clean helm install, not on an helm upgrade of an already-deployed release:

bash
kubectl delete demoapp demoapp-sample --ignore-not-found
# Wait until the namespace is fully cleaned (helm release Secret gone) before re-applying
for i in $(seq 1 30); do
  if ! kubectl get secret -l owner=helm,name=demoapp-sample -n default 2>&1 | grep -q demoapp-sample; then
    echo "release torn down"; break
  fi
  sleep 2
done

kubectl apply -f config/samples/demo_v1alpha1_demoapp.yaml

# Poll for the seed ConfigMap. First-time alpine/k8s:1.31.0 pull is ~300 MB,
# so plan for ~60–120s on a fresh cluster, ~5s on re-runs (image cached).
for i in $(seq 1 60); do
  cm=$(kubectl get cm demoapp-sample-seed -o jsonpath='{.data.greeting}' 2>&1)
  if [[ "$cm" == seeded* ]]; then echo "seeded after ~$((i*5))s: $cm"; break; fi
  sleep 5
done
# seeded after ~5s: seeded at Wed Jun  3 14:10:52 UTC 2026

The seed ConfigMap exists because the pre-install Job ran and created it before the Deployment came up. The Job itself is already gone — the helm.sh/hook-delete-policy: hook-succeeded,before-hook-creation annotation deletes the Job as soon as it succeeds (and again before the next install, to avoid stale ones). If you want to see the Job during the run, use kubectl get jobs -w in another terminal before applying the CR.

Why the first run is slow. The alpine/k8s:1.31.0 image is ~300 MB (it bundles kubectl, helm, etc.) and is not in the kind node image. On a fresh cluster the kubelet pulls it once at hook time — that adds ~1 minute. Subsequent installs (delete + apply on the same kind node) reuse the cached image and the hook completes in seconds.

Worked example 2 - pre-delete Job exports state before uninstall

The use case: before tearing down, dump the current ConfigMap to a last-known-state ConfigMap in another namespace so post-mortem analysis is possible. Add helm-charts/demo-app/templates/export-hook.yaml:

yaml
apiVersion: v1
kind: Namespace
metadata:
  name: demoapp-archive
  annotations:
    helm.sh/hook: pre-delete
    helm.sh/hook-weight: "-20"
---
apiVersion: batch/v1
kind: Job
metadata:
  name: {{ include "demo-app.name" . }}-export
  annotations:
    helm.sh/hook: pre-delete
    helm.sh/hook-weight: "0"
    helm.sh/hook-delete-policy: hook-succeeded
spec:
  template:
    spec:
      restartPolicy: Never
      serviceAccountName: {{ include "demo-app.name" . }}-export
      containers:
        - name: export
          image: alpine/k8s:1.31.0
          command:
            - sh
            - -c
            - |
              kubectl get cm {{ include "demo-app.name" . }} -o yaml > /tmp/state.yaml
              kubectl -n demoapp-archive create cm \
                {{ include "demo-app.name" . }}-$(date +%s) \
                --from-file=state.yaml=/tmp/state.yaml || true
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: {{ include "demo-app.name" . }}-export
  annotations:
    helm.sh/hook: pre-delete
    helm.sh/hook-weight: "-10"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: {{ include "demo-app.name" . }}-export
  annotations:
    helm.sh/hook: pre-delete
    helm.sh/hook-weight: "-10"
rules:
  - apiGroups: [""]
    resources: ["configmaps"]
    verbs: ["get", "create"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: {{ include "demo-app.name" . }}-export
  annotations:
    helm.sh/hook: pre-delete
    helm.sh/hook-weight: "-10"
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: {{ include "demo-app.name" . }}-export
subjects:
  - kind: ServiceAccount
    name: {{ include "demo-app.name" . }}-export
    namespace: default

The operator image bakes the chart in at build time, so rebuild and redeploy before testing the new hook:

bash
export IMG=ttl.sh/demoapp-$(uuidgen):24h
make docker-build IMG="$IMG"
docker push "$IMG"
make deploy IMG="$IMG"

kubectl -n demo-app-operator-system rollout status \
  deploy/demo-app-operator-controller-manager

Critical sequencing: the pre-delete hook templates must be present in the chart at install time so they get baked into the rendered release manifest. If a release was installed with an older chart (no pre-delete hook), uninstalling it will skip the hook even after you rebuild the operator. Always tear down the existing release first, then re-install with the new chart, then test the delete:

bash
# Tear down any existing release so the next install bakes in the new hook
kubectl delete demoapp demoapp-sample --ignore-not-found
kubectl delete ns demoapp-archive --ignore-not-found
for i in $(seq 1 30); do
  if ! kubectl get secret -l owner=helm,name=demoapp-sample -n default 2>&1 | grep -q demoapp-sample; then
    break
  fi
  sleep 2
done

# Fresh install — registers BOTH hooks in the release manifest
kubectl apply -f config/samples/demo_v1alpha1_demoapp.yaml
for i in $(seq 1 30); do
  if helm list -n default 2>&1 | grep -q "demoapp-sample.*deployed"; then
    echo "release deployed after ~$((i*5))s"; break
  fi
  sleep 5
done

# Now delete the CR — pre-delete hook runs first, then Helm cascades the uninstall
kubectl delete demoapp demoapp-sample

# Poll for the archive namespace + ConfigMap
for i in $(seq 1 60); do
  if kubectl get ns demoapp-archive &>/dev/null && \
     [[ $(kubectl -n demoapp-archive get cm --no-headers 2>&1 | grep -c demoapp-sample) -ge 1 ]]; then
    echo "pre-delete export captured after ~$((i*3))s"; break
  fi
  sleep 3
done

kubectl -n demoapp-archive get cm
# NAME                                  DATA   AGE
# demoapp-sample-1780477875             1      30s

The archive lives even after helm uninstall finishes. The numeric suffix is the $(date +%s) (Unix epoch) baked into the ConfigMap name inside the hook Job, so it's unique per uninstall.

Gotcha we hit while testing this: if you don't tear down the previous release before applying the new hook-aware chart, you may run into a stale-finalizer condition (Failed to add CR uninstall finalizer) where the operator never actually calls helm uninstall on the subsequent delete — so the pre-delete hook silently skips. The teardown loop above prevents that.

The limits of Helm hooks

Limit Impact
Hooks run as in-cluster Jobs Cannot call external APIs without bundling clients in image
No retry-with-backoff semantics A failed Job fails the entire install/upgrade/delete
Cannot block the API server (they run after Helm has decided to proceed) No way to veto a delete based on external state
Hard to read complex cluster state (yes you can kubectl get, but it's clunky) Not the right tool for "is the cluster ready for this?"
Hook YAML lives in the chart, not the operator Same chart used in two operators gets the same hook logic

For anything beyond "run a Job at this phase," you have outgrown the pre-built operator. The Helm hybrid operator replaces hooks with Go finalizers and pre/post-reconcile logic that can do anything.


Part H - Operator scope and multi-tenancy

Default: cluster-scoped, watches all namespaces

make deploy ships an operator pod with no WATCH_NAMESPACE env var at all — unset = watch all namespaces. RBAC is granted via ClusterRole. Verify:

bash
kubectl -n demo-app-operator-system get deploy demo-app-operator-controller-manager \
  -o yaml | grep -A1 WATCH_NAMESPACE
# (no output — modern operator-sdk scaffolds with WATCH_NAMESPACE unset)

# Cross-check from the manager's startup log instead:
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager \
  -c manager | grep -m1 'Watching all namespaces'
# {"level":"info","ts":"...","logger":"cmd","msg":"Watching all namespaces"}

kubectl -n demo-app-operator-system get clusterrole | grep demo-app-operator-manager-role
# demo-app-operator-manager-role

The operator can manage DemoApp CRs in any namespace.

Flipping to namespace-scoped

Two changes: the env var and the RBAC binding.

Change 1 - WATCH_NAMESPACE env var. Edit config/manager/manager.yaml:

yaml
        env:
          - name: WATCH_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace

This makes the operator only watch its own namespace (demo-app-operator-system by default - you would probably also rename it).

Change 2 - swap ClusterRole/ClusterRoleBinding for Role/RoleBinding. Edit config/rbac/role.yaml:

yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: Role          # was: ClusterRole
metadata:
  name: manager-role
  namespace: system   # add a namespace
rules:
  # ... same rules ...

Edit config/rbac/role_binding.yaml:

yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding   # was: ClusterRoleBinding
metadata:
  name: manager-rolebinding
  namespace: system   # add a namespace
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role          # was: ClusterRole
  name: manager-role
subjects:
  - kind: ServiceAccount
    name: controller-manager
    namespace: system

Note: the CRD itself stays cluster-scoped - CRDs are always cluster resources. What changes is that CR instances in other namespaces are no longer reconciled by this operator.

Rebuild (the changes are in config/, not the image, so make deploy is enough — reuse whatever IMG you set earlier):

bash
make undeploy IMG="$IMG"
make deploy   IMG="$IMG"

Verify the operator log:

bash
kubectl -n demo-app-operator-system logs deploy/demo-app-operator-controller-manager \
  -c manager | head -5
# "Watching resource"  "kind":"DemoApp"  "namespace":"demo-app-operator-system"

namespace is now the operator's own namespace, not "".

Per-tenant operator pattern - three options

Option Pro Con
1. One cluster-scoped operator (the default) Single deployment, single image to upgrade Single failure domain; one bad CR can stall all
2. N namespace-scoped operators, one per tenant Hard isolation; tenant blast-radius bounded N deployments to manage; image upgrades multiplied
3. One cluster-scoped operator + selector per CR Single deployment but per-tenant CR filtering Selector overlap risk; not a real isolation layer

Pick Option 1 for internal teams that trust each other. Option 2 for hard multi-tenancy. Option 3 only when you need horizontal sharding for performance, not isolation.

selector-based multi-tenancy (Option 3 walkthrough)

watches.yaml with a selector:

yaml
- group: demo.example.com
  version: v1alpha1
  kind: DemoApp
  chart: helm-charts/demo-app
  selector:
    matchLabels:
      tier: production

Run a second operator with a different selector (different watches.yaml, different image tag, or use a Kustomize overlay to patch watches.yaml):

yaml
- group: demo.example.com
  version: v1alpha1
  kind: DemoApp
  chart: helm-charts/demo-app
  selector:
    matchLabels:
      tier: staging

CRs are labeled at creation:

yaml
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
  name: prod-app
  labels:
    tier: production              # picked up by prod operator only
spec:
  message: "Production app"
---
apiVersion: demo.example.com/v1alpha1
kind: DemoApp
metadata:
  name: staging-app
  labels:
    tier: staging                 # picked up by staging operator only
spec:
  message: "Staging app"

The operators silently ignore CRs whose labels do not match. Verify:

bash
# in the production operator's log:
kubectl logs -n demo-app-operator-prod deploy/demo-app-operator-controller-manager
# "Reconciling release"  "name":"prod-app"
# (no mention of staging-app)

When NOT to flip to namespace-scoped

Operator role Why cluster-scope is the right choice
Ingress controller Watches Ingress across all namespaces
Monitoring/observability stack Aggregates metrics cluster-wide
cert-manager-style certificate issuer Issues certs for resources in any namespace
Network policy controller Enforces policy across the whole cluster

If the operator's job is to do something cluster-wide, namespace-scoping it does not make sense.


Part I - The hard ceiling (what the pre-built CANNOT do)

These are the features no amount of YAML tweaking will give you. Each one is the reason teams outgrow the pre-built operator. The Helm hybrid operator article addresses every one of them with concrete Go code.

1. Custom finalizer logic against external systems

The pre-built operator's "finalizer" is helm uninstall followed by Helm's cascade. There is no place to insert "first, deregister this instance from our internal billing API and wait for confirmation, then uninstall." Pre-delete hooks are the closest approximation but cannot easily call external APIs with retries or block the deletion on external state.

→ Hybrid: a Go finalizer that does anything before the Helm uninstall fires.

2. Custom status fields

The reconciler writes status.conditions with exactly three types (Initialized, Deployed, ReleaseFailed) plus status.deployedRelease. There is no way to add status.lastBackupTimestamp, status.externalServiceState, or any other custom field without forking the operator binary itself.

→ Hybrid: a DemoAppStatus Go struct with any fields you want.

3. Custom install/upgrade decision logic

The pre-built reconciler always installs on CR create and upgrades on CR update. "Only upgrade between 02:00 and 04:00 UTC," "wait for the database to be ready before upgrading," "do a dry run first and require manual approval" - none of these are expressible in YAML.

→ Hybrid: a few lines of Go in Reconcile to gate the install/upgrade call.

4. Reading external state during reconcile

The reconciler computes Helm values from .spec and overrideValues only. "Render with replicas equal to whatever our autoscaler service reports as the right number right now," "block install if our cloud quota check fails" - the pre-built operator has no place to make those calls.

→ Hybrid: standard Go HTTP/SDK calls inside Reconcile.

5. Cross-CR coordination

The pre-built reconciler treats each CR in isolation. "CR app-B waits until CR app-A reports Deployed: True before installing" requires reading another CR's status during reconcile - not supported.

→ Hybrid: list other CRs with the controller-runtime client and gate on their status.

6. Conditional resource rendering based on cluster state

You can write conditional logic inside Helm templates based on .Values, but you cannot read live cluster state. "Render the Ingress only if the nginx-ingress controller is installed on this cluster" requires querying the API for the existence of resources - the chart cannot do this; the operator cannot do this.

→ Hybrid: detect cluster capabilities at startup and inject them as values.

7. Watching arbitrary non-chart resources

watchDependentResources watches only the resource types the chart renders. "Reconcile DemoApp when a NetworkPolicy in the same namespace changes" requires adding an external watch - not supported by the pre-built operator.

→ Hybrid: controller.Watches(&source.Kind{Type: &netv1.NetworkPolicy{}}, ...) in SetupWithManager.


Summary - what's free, what you tune, what you can't have

Capability Pre-built (Parts A-H) Hard ceiling (Part I) - hybrid required
Install / upgrade / uninstall Free (Parts A and E) -
CRD generation Generated, permissive by default (Part B) -
CR .spec → Helm values Implicit, full mapping (Part D) -
Operator-level value injection overrideValues with env-var substitution (D) -
Drift detection watchDependentResources + reconcilePeriod (F) Watching non-chart resources (#7)
Custom pre-uninstall Helm pre-delete hook (G) Custom finalizer logic (#1)
Status conditions Fixed three: Initialized/Deployed/ReleaseFailed Custom status fields (#2)
Install/upgrade gating Always-on Custom decision logic (#3)
Reading external state in reconcile - Required for #4
Cross-CR coordination - Required for #5
Conditional render on cluster state - Required for #6
Multi-tenancy selector or namespace-scope swap (H) -

If your operator's job stays inside the left column, you may never need anything else. If any of the right column applies, the Helm hybrid operator is the next stop - same chart, your own Go reconciler, none of the ceiling.


Further reading


Summary

Part 2 of the Helm-based operator tutorial walked the full operating envelope of the pre-built helm.sdk.operatorframework.io/v1 plugin. The CR's .spec is passed to Helm verbatim, with overrideValues in watches.yaml taking final precedence and env-var substitution covering per-environment defaults. Lifecycle is helm install / upgrade / uninstall driven by CR events, with the framework writing three fixed status conditions. Drift detection is event-driven (watchDependentResources) plus periodic (reconcilePeriod); together they cover noisy and silent drift on chart-rendered resources. Helm hooks are the only mechanism for custom pre/post install/upgrade/delete work, useful for in-cluster Jobs but unable to do real external-system work. Scope flips from cluster-wide to namespace via WATCH_NAMESPACE plus a ClusterRole-to-Role swap; multi-tenancy patterns range from one operator with selector to one operator per tenant. The hard ceiling - custom finalizers, custom status, custom decision logic, external state, cross-CR coordination, conditional rendering, non-chart watches - is the moment to graduate to the Helm hybrid operator.

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with over a decade of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive experience, he excels across development, DevOps, …

  • Red Hat Certified System Administrator in Red Hat OpenStack
  • Certified Kubernetes Application Developer (CKAD)
  • Red Hat Certified Specialist in Ansible Automation
  • Go (programming language)
  • Python (programming language)
  • DevOps
  • Computer Security