What is a Kubernetes Operator?

Last reviewed: by
What is a Kubernetes Operator?

If you have ever heard the phrase "Kubernetes Operator" and thought "is that a person, a robot, or some kind of Kubernetes plugin?" — you are in the right place. The short answer: it is a small program that runs inside your cluster and takes care of an application the same way a senior on-call engineer would, except it never sleeps, never forgets the runbook, and never paged you at 3 a.m.

In plain English:

A Kubernetes Operator is a small piece of software, running inside the cluster, that knows how to install, run, and look after a specific application (like Postgres or Prometheus) — automatically.

That's the whole idea. The rest of this article unpacks why the pattern exists, how it works under the hood, and when you actually need one. By the end you will be able to read any Operator's documentation without getting lost in jargon like "reconcile loop", "CRD", or "controller-runtime".


A simple analogy: think of it as a smart thermostat

Before we touch a single line of YAML, picture the thermostat in your house.

You tell the thermostat: "keep this room at 72 °F." That is the desired state. The thermostat then checks the actual temperature every minute. If the room is too warm, it turns on the AC. If it is too cold, it turns on the heater. It never asks you again — it just keeps nudging reality toward what you asked for.

A Kubernetes Operator does the exact same thing, but for an application in a cluster. You tell it "I want a 3-replica Postgres cluster with nightly backups." It then checks the cluster every second or so. If only 2 replicas are running, it creates the missing one. If a backup didn't fire last night, it triggers one. You never have to micromanage the steps.

Side-by-side analogy showing how a Kubernetes Operator works like a smart thermostat: both compare a desired state against the actual state and continuously take small corrective actions to close the gap.

A Kubernetes Operator is essentially a thermostat for your application — desired state in, continuous corrections out.

That picture — desired vs actual, plus a loop that closes the gap — is the single most important mental model in this entire tutorial. Everything else is detail.


What is a Kubernetes Operator? (the 60-second definition)

Now that the analogy is in place, here is the precise definition you will see in books, conference talks, and the official Operator Framework documentation:

A Kubernetes Operator is a piece of software that:

  1. Adds a new object type to Kubernetes through something called a Custom Resource Definition (CRD) — for example, a new kind: PostgresCluster that didn't exist before. Don't worry about CRDs for now; we'll cover them just below. Think of them as "new kinds of LEGO bricks you teach Kubernetes about."
  2. Watches the Kubernetes API server for those new objects.
  3. Reconciles — every time someone creates, edits, or deletes one, the Operator inspects the cluster and takes the smallest action needed to make reality match what the YAML asked for.

The pay-off is huge: stateful, multi-step applications suddenly become declarative. You write a 20-line YAML that says "I want a 3-node Postgres cluster with nightly backups and HA failover," and the Operator performs the hundreds of API calls required to make that real — and to keep it real, forever, without your intervention.


Life with vs life without an Operator

The fastest way to feel why this pattern matters is to walk through one example with and without an Operator. Let's use a familiar one: running Postgres with 1 primary, 2 replicas, nightly backups, and TLS.

Without an Operator, you (or your team) need to:

  • Write StatefulSets, Services, ConfigMaps, Secrets, and PersistentVolumeClaims by hand.
  • Bootstrap the primary, configure streaming replication, and join replicas in the correct order.
  • Schedule and verify nightly pg_basebackup jobs.
  • Manually run a failover script when the primary crashes.
  • Rotate TLS certificates every 90 days.
  • Test that none of the above broke during an upgrade.

If you miss any step, the database loses data or won't recover from a node restart. This is the world many teams lived in before 2017.

With an Operator, you write one small piece of YAML — usually a single custom resource — and the Operator does the rest. The runbook lives inside the cluster as code, not in a Confluence page that nobody updates.

Side-by-side illustration comparing running Postgres without an Operator (stressed engineer, manual commands, sticky notes, alerts) with running it through an Operator (relaxed engineer, one small YAML, healthy primary plus two replicas with backups and TLS).

Without an Operator: you are the runbook. With an Operator: the runbook runs itself.

This is why teams running Kafka, Postgres, Redis, or Elasticsearch in production almost always reach for an Operator. The cost of not having one shows up at 3 a.m.


Why are they called "operators"?

The pattern is named after the human cluster operator — the SRE who used to SSH into a node, restart a database replica, edit a config, rotate a certificate, and re-balance shards. Everything that human did from a runbook got translated into code.

The Operator pattern was introduced by CoreOS in 2016 and the slogan from that announcement still summarises it perfectly:

"Operators are a way of packaging, deploying, and managing a Kubernetes application."

Red Hat acquired CoreOS in 2018 and the Operator Framework — which includes Operator SDK, the Operator Lifecycle Manager (OLM), and OperatorHub.io — is now stewarded by the CNCF Operator Framework community. So when someone says "there's an Operator for that," they mean: somebody already wrote the runbook, packaged it as a Pod, and you can install it with one command.


How does a Kubernetes Operator actually work?

Let's slow this down and walk through the lifecycle, step by step, using our Postgres example.

  1. You write a Custom Resource (CR) in YAML. Something tiny, like:

    yaml
    apiVersion: postgresql.cnpg.io/v1
    kind: Cluster
    metadata:
      name: my-db
    spec:
      instances: 3
      storage:
        size: 10Gi

    That's only 8 lines. No StatefulSets, no Services, no replication scripts — just what you want.

  2. You run kubectl apply -f. The Kubernetes API server stores the object because a CRD has already told it that kind: Cluster is a valid type.

  3. The Operator (which is itself a Pod running in your cluster) sees the change. It maintains a long-lived watch on the API server.

  4. The reconcile loop fires. The Operator's Reconcile() function reads the current cluster state ("how many Postgres pods exist? are they healthy? is there a backup CronJob?") and compares it against the spec ("the user wants 3 instances and 10 Gi of storage.").

  5. The Operator creates whatever is missing. Behind the scenes it spins up StatefulSets, Services, Secrets, PVCs, backup CronJobs, and any other plain Kubernetes resource needed.

  6. It keeps watching. When a pod crashes, when the user edits the CR, when the storage class changes, when a backup fails — the loop runs again.

That last point is the magic. An Operator is not a one-shot install script; it is a process that never stops watching. This is what people mean by "continuous reconciliation" and it is the single biggest difference between an Operator and tools like Helm or kubectl apply.

The typical Kubernetes resources an Operator manages on your behalf are:

  • Deployments and StatefulSets
  • Services (so other apps can reach the database)
  • PersistentVolumeClaims (for storage)
  • Secrets and ConfigMaps (for credentials and config)
  • Ingress resources or LoadBalancer Services (for external access)
  • CronJobs (for backups and other scheduled tasks)
  • ServiceMonitors or other monitoring CRs

Unlike Helm — which renders a YAML bundle once at install time and then walks away — an Operator stays inside the cluster and keeps watching. If someone runs kubectl delete pod on your database primary, the Operator restores it. If someone edits a ConfigMap by hand, the Operator overwrites it with the value from the CR. Drift is corrected automatically.

Architecture diagram showing the Kubernetes Operator workflow: a user writes a Custom Resource YAML, the API server stores it via a CRD, the Operator watches the API server, runs its reconcile loop, and creates or updates the underlying StatefulSets, Services, Secrets, and PVCs in the cluster.

The full picture: the Operator reads the desired state from the CR and continuously shapes the underlying Kubernetes resources to match it.


But first — what is a "Custom Resource"?

We have used the words Custom Resource and CRD a few times. Let's lock them in, because half the confusion around Operators evaporates the moment this clicks.

Kubernetes ships with built-in object types: Pod, Deployment, Service, ConfigMap, Secret, Job, and so on. These are resources. You can list them, edit them, delete them, and kubectl describe them.

A Custom Resource Definition (CRD) is a small YAML file you apply to the cluster that says: "hey API server, please also accept a new kind of object called PostgresCluster, with these fields, and this schema validation." The moment that CRD is installed, kubectl get postgresclusters becomes a real command — just like kubectl get pods — and you can kubectl apply -f a custom resource against it.

A Custom Resource (CR) is just an instance of that new type. The same way a Pod is an instance of the built-in Pod type, your my-db object is an instance of the PostgresCluster type.

Diagram of the Kubernetes API server showing built-in object types (Pod, Deployment, Service, ConfigMap, Secret) alongside custom object types (PostgresCluster, KafkaTopic, Certificate, PrometheusRule) that have been added through Custom Resource Definitions plugged in from the side.

Custom Resources sit right next to the built-in ones. CRDs are how you teach the Kubernetes API server about a new "kind".

So an Operator is really two halves working together:

  • The CRD is the vocabulary — it lets users describe what they want using a new noun like PostgresCluster.
  • The controller is the verb — it actually does what the vocabulary describes.

Strip away either half and the magic disappears. A CRD with no controller just gives you a fancy object type the API server stores but nothing acts on. A controller with no CRD has nothing application-specific to watch and is just a generic controller.

For a deeper dive, our next article in the series — Custom Resource Definitions Explained — walks through writing one from scratch.


Operator vs Controller vs CRD — the difference

These three terms are constantly confused, even in conference talks. Here they are in one sentence each:

Term What it is Built into Kubernetes?
CRD (Custom Resource Definition) A schema that teaches the API server about a new object type. Pure YAML, no code. Yes — apiextensions.k8s.io/v1
Controller Any control loop that watches the API server and reconciles state. The Deployment, ReplicaSet, and Job controllers are built-in examples. Many are — dozens ship with kube-controller-manager
Operator A controller that uses a CRD to manage an application (Postgres, Prometheus, Kafka, …) and encodes its domain expertise. No — you install one per application

Every Operator is a controller. Every Operator uses a CRD. But only the combination — CRD + controller + domain knowledge of a specific application — earns the name Operator.

A deeper comparison lives in Operator vs Controller vs CRD — the difference explained.


When do you actually need a Kubernetes Operator?

Operators are powerful but not free — they add a moving piece to your cluster and require their own monitoring. Use one when at least two of the following apply:

  • The application is stateful (databases, message brokers, caches).
  • The lifecycle has multiple ordered steps ("provision storage → bootstrap primary → join replicas → run smoke test → mark ready").
  • Day-2 operations matter (failover, backup, upgrade, certificate rotation, slow rolling restart). "Day 2" is just jargon for "everything that happens after the install button is clicked."
  • You need continuous drift correction, not a one-shot install.
  • The application has business rules that pure YAML can't express ("never upgrade a primary while a replica is lagging").

Use Helm or kustomize instead when the workload is stateless, installs once, and is happy with a kubectl rollout restart when its config changes. The two are not mutually exclusive — many teams use Helm to install the Operator and then let the Operator manage the application.

Tool Good at Not good at
Helm / kustomize One-shot install of templated YAML Continuous reconciliation, multi-step workflows
Custom controller Cluster-wide policies, light glue Application-specific lifecycles
Operator Full day-1 + day-2 automation of an application Trivial stateless deployments (overkill)
GitOps (Argo / Flux) Drift detection on declarative manifests Imperative steps like "promote replica X"

A useful rule of thumb: if your install runbook is longer than 3 commands, or if you would page a human to upgrade or fail over the workload, an Operator probably belongs on the shortlist.


Real-world Kubernetes Operator examples

You have almost certainly run an Operator already — most managed Kubernetes distributions ship with several pre-installed:

  • Prometheus Operator — manages Prometheus, Alertmanager, and the ServiceMonitor / PodMonitor / PrometheusRule CRDs. Probably the single most-installed Operator in the CNCF ecosystem.
  • cert-manager — issues and renews TLS certificates from Let's Encrypt, Venafi, Vault, and internal CAs via Issuer and Certificate CRDs.
  • CloudNativePG — production-grade Postgres Operator: primary election, streaming replication, point-in-time recovery, and rolling upgrades.
  • Strimzi — Apache Kafka Operator with Kafka, KafkaTopic, and KafkaUser CRDs.
  • Argo CD — GitOps controller with the Application and ApplicationSet CRDs.
  • etcd-operator — the original Operator from CoreOS that gave the pattern its name.

All of these expose two surfaces: a CRD you write YAML against, and a controller Pod doing the work. If you want to browse hundreds more, head to OperatorHub.io — every entry there follows the same pattern.


The five Operator capability levels

If you start reading Operator documentation you will keep seeing the phrase "Level 4 Operator" or "Capability Level III". This is a maturity model defined by the Operator Framework and it tells you, at a glance, how much the Operator can do.

Level Name What it can do
I Basic Install Provision the application from a CR. Nothing more.
II Seamless Upgrades In-place upgrades between minor versions.
III Full Lifecycle Backup, restore, scale, fail-over.
IV Deep Insights Metrics, alerting, logging, workload analysis.
V Auto Pilot Auto-scaling, auto-healing, auto-tuning based on telemetry.

Most production Operators live at Level III. Reaching Level V usually requires months of soak time and rich telemetry. As a user, you mainly care about this number when comparing two Operators that manage the same application — Level III with active maintainers beats Level V that hasn't shipped a release in a year.


Kubernetes Operator glossary (quick reference)

If you only remember three terms, remember these:

  • Custom Resource (CR)the YAML you write. An instance of a custom type, e.g. a PostgresCluster you kubectl apply.
  • Custom Resource Definition (CRD)the schema that registers the new type with Kubernetes. Without this, kubectl doesn't know what a PostgresCluster is.
  • Reconcilethe function the Operator runs on every change. It reads the cluster, compares it to the CR, and takes the smallest action that closes the gap.

Once those click, the rest of the vocabulary slots in:

  • Controller — the control loop that watches and reconciles.
  • Manager — the controller-runtime object that hosts one or more controllers and handles caching, leader election, and metrics.
  • Watch — a long-lived API connection that streams changes to a resource type into the Operator's local cache.
  • Finalizer — a string on metadata.finalizers that blocks deletion until the Operator runs cleanup logic (so a kubectl delete postgrescluster doesn't leave orphaned PVCs behind).

Every term in that second list gets its own chapter later in this course — the reconcile loop deep dive, CRDs explained, and finalizers in Kubernetes are the natural next reads.


Frequently Asked Questions

1. What is the purpose of a Kubernetes Operator?

A Kubernetes Operator automates the operational tasks (install, configure, upgrade, backup, failover, scale) of a stateful or complex application so that a human cluster operator does not have to perform them manually. It packages domain expertise as code that runs inside the cluster.

2. What is the difference between a Kubernetes Operator and a Controller?

Every Operator is a controller, but not every controller is an Operator. A controller is any control loop that watches the API server and acts on resources. An Operator is specifically a controller that watches a Custom Resource (defined by a CRD) representing an application, and encodes that application's domain knowledge - so it knows how to install Postgres, fail over Redis, or rotate certificates safely.

3. Is Helm a Kubernetes Operator?

No. Helm is a templating and packaging tool that renders YAML and pushes it to the API server in a one-shot install. An Operator runs continuously inside the cluster, watches for drift, and re-applies the correct state. The two are complementary - operator-sdk even lets you wrap a Helm chart as an Operator.

4. What is the difference between an Operator and a Deployment?

A Deployment is a built-in Kubernetes object that manages stateless replicas of a Pod. A Kubernetes Operator is a custom controller that may itself create Deployments, StatefulSets, Services, and other resources - and then keep them aligned with a higher-level Custom Resource describing the whole application.

5. Do I need to write a Kubernetes Operator?

Not always. Use an existing Operator from OperatorHub.io when one exists (Prometheus, cert-manager, Postgres, Kafka all have mature ones). Write your own only when the application has multi-step lifecycle logic, stateful failover, or domain-specific upgrades that Helm and kustomize cannot express declaratively.

6. Who created the Kubernetes Operator pattern?

The Operator pattern was introduced by CoreOS in 2016, articulated by Brandon Phillips. Red Hat acquired CoreOS in 2018 and donated the Operator Framework to the CNCF Operator Framework community, which now maintains Operator SDK, Operator Lifecycle Manager (OLM), and OperatorHub.io.

What's next?

You now have the mental model. The next step depends on where you are in the journey:

Deepak Prasad

R&D Engineer

Founder of GoLinuxCloud with over a decade of expertise in Linux, Python, Go, Laravel, DevOps, Kubernetes, Git, Shell scripting, OpenShift, AWS, Networking, and Security. With extensive experience, he excels across development, DevOps, …

  • Red Hat Certified System Administrator in Red Hat OpenStack
  • Certified Kubernetes Application Developer (CKAD)
  • Red Hat Certified Specialist in Ansible Automation
  • Go (programming language)
  • Python (programming language)
  • DevOps
  • Computer Security