If you have ever opened a Kubebuilder or Operator-SDK project, you have
stared at something like mgr.GetClient(), ctrl.NewControllerManagedBy(mgr),
or builder.WithEventFilter(...) and wondered what is actually happening
under the hood. The answer is one library: controller-runtime — the Go
SDK that almost every modern Kubernetes Operator builds on.
This guide walks the full architecture top-down, names every component, and
traces a single API server event from the apiserver all the way to your
Reconcile() function so you know exactly which knob to turn when something
misbehaves.
If you have read the earlier Foundations chapters — especially the reconcile loop explained — this article tells you what the machinery around the reconcile loop looks like.
The controller-runtime architecture: a Manager owns the shared Cache and one SharedInformer per GVK; events flow through Source -> EventHandler -> Predicate into a per-controller Workqueue, which feeds the Reconciler.
A quick analogy: think of it as a post office
Before we name a single Go type, picture a small-town post office. Every piece on the controller-runtime architecture diagram has a real-world twin:
| controller-runtime piece | Post-office equivalent |
|---|---|
| Cache | The mail-sorting room. A local copy of every letter that came in today, indexed by address. |
| Informer (Reflector) | The truck driver who keeps driving from the regional hub and dropping off mail in the sorting room. |
| Source / EventHandler / Predicate | The supervisor who watches the sorting room and decides which letters need action right now and which can wait. |
| Workqueue | The outbound trolley with slots. Letters going to the same address consolidate into one slot. |
Reconciler (your Reconcile()) |
The postal carrier who picks one slot at a time and walks the route to deliver it. |
| Manager | The postmaster — opens up in the morning, makes sure every staff member is at their station, and locks up at night. |
If you can hold that picture in your head, every name in this article will click as you meet it. The Manager turns everything on, the Cache holds the local copy, the Informer keeps it fresh, the Predicate filters noise, the Workqueue de-duplicates, and your Reconciler actually does the work.
The 60-second answer
controller-runtime is a
Go library, maintained under the kubernetes-sigs GitHub organisation, that
gives every Operator the same six building blocks:
- Manager - owns the lifecycle of everything else (cache, controllers, webhooks, leader election, metrics, health probes).
- Cache - a read-only, eventually-consistent in-memory replica of every object you watch, populated by SharedInformers.
- Client - a
sigs.k8s.io/controller-runtime/pkg/client.Clientthat reads from the cache and writes directly to the API server. - Controller - one per primary resource type; owns a workqueue and runs
Reconcile(). - Builder DSL - the fluent
For(...).Owns(...).Watches(...).Complete(r)API that wires sources, event handlers, and predicates without boilerplate. - Source / EventHandler / Predicate - the three-piece pipeline that turns watch deltas into reconcile keys on the workqueue.
If you have read the reconcile loop explained, you already know what Reconcile() does. This article explains the machinery that makes Reconcile() happen in the first place.
Why controller-runtime exists (vs raw client-go)
You can write a Kubernetes controller using raw client-go -
the sample-controller
repo proves it. The catch is that a production-quality controller needs:
- A SharedInformerFactory so multiple controllers do not each spin up their own watch.
- A workqueue with rate limiting and exponential backoff.
- Leader election so you can run multiple replicas safely.
- A metrics endpoint, a health probe endpoint, and a graceful shutdown path.
- Owner-reference lookups so a Pod change can requeue its owning custom resource.
- Webhook serving plumbing.
Wiring those by hand against client-go is roughly 500 lines of boilerplate per controller. controller-runtime collapses all of it into one Manager and a Builder DSL, with defaults that match how Kubernetes itself runs its own controllers. Both Kubebuilder and the Operator SDK generate code that uses controller-runtime; the two frameworks are mostly scaffolding plus a few extras (RBAC markers, OLM bundles) sitting on top of the same library.
The Manager - lifecycle owner of everything
The Manager is the first object every Operator constructs and the last one to shut down. It owns:
| Resource | Why the Manager owns it |
|---|---|
| Cache | Shared between every controller; must be started before any controller runs. |
| Client | Reads through the Cache, writes through a direct REST client - both share the Manager's restConfig. |
| Controllers | Each Builder.Complete(r) registers a controller with the Manager. |
| Webhook server | Mutation and validation webhooks run on the same HTTP server the Manager starts. |
| Leader-election lease | The Manager acquires the lease before starting controllers; if it loses the lease, it stops them cleanly. |
| Metrics endpoint | A single /metrics HTTP listener serving Prometheus scrapes. |
| Health endpoints | /healthz and /readyz map to manager-level checks. |
Constructing a Manager looks the same in every Operator:
mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
LeaderElection: true,
LeaderElectionID: "my-operator-leader",
HealthProbeBindAddress: ":8081",
Metrics: metricsserver.Options{BindAddress: ":8080"},
})
if err != nil {
setupLog.Error(err, "unable to start manager")
os.Exit(1)
}
if err = (&MyReconciler{
Client: mgr.GetClient(),
Scheme: mgr.GetScheme(),
}).SetupWithManager(mgr); err != nil {
setupLog.Error(err, "unable to create controller")
os.Exit(1)
}
if err := mgr.Start(ctrl.SetupSignalHandler()); err != nil {
setupLog.Error(err, "problem running manager")
os.Exit(1)
}Two things to remember:
mgr.GetClient()is not safe to use untilmgr.Start(ctx)runs. Before the cache is started the client has no informers to read from. If you need to do a one-shot read at startup, use a direct API client (client.New(...)) or callmgr.GetAPIReader().- One Manager process = one cache. Two Managers in the same binary would mean two informer trees - usually a bug.
The Cache - read once, share everywhere
Every read your controller does (r.Get(ctx, key, &obj), r.List(ctx, &list))
goes through the controller-runtime cache by default. The cache is a thin
wrapper over client-go's SharedInformer
machinery:
┌───────────────────────────┐
│ Kubernetes API │
│ (watch /apis/...?watch=1)│
└─────────────▲─────────────┘
│ HTTP/2 long-lived stream
│
┌───────────┴────────────┐
│ SharedInformer │
│ ┌──────────────────┐ │
│ │ Reflector │ │ ← does the actual List+Watch
│ │ DeltaFIFO │ │ ← buffered deltas
│ │ Indexer/Store │ │ ← thread-safe local cache
│ └──────────────────┘ │
└─────────▲──────────────┘
│ reads return cached objects
┌───────────────────┴────────────────────┐
│ client.Client │
│ (mgr.GetClient() returns this) │
└────────────────────────────────────────┘The important properties of the cache:
- Eventually consistent. A
kubectl applymay not be visible to your controller for tens of milliseconds. Always treatr.Get(...)as "may be slightly stale" and write your Reconcile() to be idempotent. - One informer per GVK, shared by every controller. Add a fourth controller that watches Pods and you are not opening a fourth watch against the apiserver.
- Reads are O(1). The cache exposes an
Indexerkeyed bynamespace/name, soGetandListare in-memory map lookups - they never round-trip to etcd. - Writes go straight to the API server, not through the cache. After a write your code may briefly read a stale value through the cache until the watch propagates.
You can ask the cache for a fresh, uncached read with the APIReader:
err := mgr.GetAPIReader().Get(ctx, key, &cm)Use that sparingly - typically only at startup or when you genuinely cannot tolerate cache lag, because it shifts load from your in-memory store to the apiserver.
Tuning the cache
Cache.Options (passed via ctrl.Options{Cache: cache.Options{...}}) gives
you three knobs that matter in production:
DefaultNamespaces- if your Operator is namespaced, restricting the cache to specific namespaces dramatically reduces memory.ByObject- per-GVK filters: namespace allow-list, label-selector, field-selector. Useful to make a cluster-scoped Operator only watch the pods it cares about.SyncPeriod- the periodic resync that re-delivers every cached object as a synthetic "Update" event. Defaults to 10 hours; lower it only if you genuinely cannot trust your watches.
SharedInformerFactory and the watch pipeline
Under the cache, the actual machinery is the same SharedInformerFactory
client-go ships. For each watched GVK, controller-runtime creates one
SharedInformer that contains three pieces from client-go's
tools/cache package:
- Reflector - opens an HTTP/2 long-lived watch against the apiserver,
feeding every observed
WatchEvent(ADDED/MODIFIED/DELETED/BOOKMARK) into the DeltaFIFO. - DeltaFIFO - a thread-safe queue of deltas keyed by
namespace/name. Compactions collapse rapid-fire updates into the latest delta for that key. - Indexer / Store - the thread-safe in-memory map that the cache exposes to your client.
When the Reflector receives an event, it does three things in one tick:
- Updates the Indexer (so subsequent
r.Get(...)returns the new object). - Calls every registered
ResourceEventHandler(the controller-runtimeEventHandleris one of these). - Repeats every
SyncPeriodeven if nothing changed, so a missed watch event is corrected within at most one resync interval.
This is why level-triggered control loops are robust: even if the watch
connection drops and a few events are lost, the next resync hands every
controller a fresh Update delta for every object and reconciliation
converges.
The Workqueue and rate limiter
Between the Reflector's event delivery and your Reconcile() function sits a
workqueue - the same k8s.io/client-go/util/workqueue
implementation Kubernetes core controllers use. It does three jobs:
| Job | What it means |
|---|---|
| Deduplicate | Two updates to the same namespace/name collapse into one work item. |
| Rate-limit | Errored items are re-added with exponential backoff (default 5 ms -> 1000 s per item, plus a global 10 qps / 100 burst limiter). |
| Order | FIFO; per-key serialization is guaranteed (no two goroutines reconcile the same key in parallel). |
You almost never interact with the workqueue directly - controller-runtime
hides it behind the Reconciler interface. The interaction points you do
control:
Result{Requeue: true}puts the current key back through the rate limiter.Result{RequeueAfter: 30*time.Second}schedules a re-reconcile bypassing the rate limiter.MaxConcurrentReconciles(incontroller.Options) sets how many different keys can reconcile in parallel. Default 1; raise it to 4-8 for I/O-bound reconcilers, leave it at 1 if your Reconcile() touches shared state.
For the full Reconcile contract - including the three Result return paths and the anti-patterns that cause hot loops - see the reconcile loop explained.
The Builder DSL - For, Owns, Watches
This is the part you see in every controller's SetupWithManager:
func (r *MyReconciler) SetupWithManager(mgr ctrl.Manager) error {
return ctrl.NewControllerManagedBy(mgr).
For(&appsv1alpha1.MyKind{}). // primary
Owns(&appsv1.Deployment{}). // child
Owns(&corev1.Service{}). // child
Watches(
&corev1.ConfigMap{}, // unrelated input
handler.EnqueueRequestsFromMapFunc(r.mapCMtoMyKind),
).
WithEventFilter(predicate.GenerationChangedPredicate{}).
WithOptions(controller.Options{MaxConcurrentReconciles: 4}).
Complete(r)
}The three verbs map directly to three workqueue-enqueue strategies:
| Verb | What gets enqueued | Typical use |
|---|---|---|
For(&v1.MyKind{}) |
A Request{namespace/name} of the primary object itself. |
Always exactly one For per controller. |
Owns(&corev1.Pod{}) |
The owning MyKind's key, derived from the child's metadata.ownerReferences. |
Reconcile the parent whenever its child changes. Requires you to controllerutil.SetControllerReference(...) on creation. |
Watches(obj, handler) |
Anything you compute from the event - one, many, or zero keys. | ConfigMaps, Secrets, external Operators' resources, cluster-scoped triggers. |
The cleanest mental model: **For is "this is my type", Owns is "I created
that, wake me when it changes", Watches is "wake me when that changes, and
I will figure out which of my objects to reconcile".`
Source / EventHandler / Predicate - the filtering pipeline
For every Owns and Watches call, the Builder constructs three things:
Source- the producer of watch events (almost alwayssource.Kind(mgr.GetCache(), &SomeKind{}), which reads from the shared cache).EventHandler- the mapper that turns one event into zero or moreRequest{namespace/name}items pushed to the workqueue.handler.EnqueueRequestForObject- the default forFor.handler.EnqueueRequestForOwner- the default forOwns.handler.EnqueueRequestsFromMapFunc(fn)- the workhorse forWatches.
Predicate- a filter applied before the handler runs. Return false and the event is dropped.
The two built-in predicates you will reach for constantly:
predicate.GenerationChangedPredicate{}- only fires whenmetadata.generationchanges, i.e. when.spec(ormetadataannotations in some cases) was actually modified. This is the single biggest hot-loop cure for Operators that update.statusevery reconcile.predicate.ResourceVersionChangedPredicate{}- fires on every server-side update including.statusandmetadata.managedFields. Use only when you genuinely need every change.
You can also write your own:
labelPred := predicate.Funcs{
UpdateFunc: func(e event.UpdateEvent) bool {
return e.ObjectNew.GetLabels()["tier"] == "production"
},
CreateFunc: func(e event.CreateEvent) bool { return true },
DeleteFunc: func(e event.DeleteEvent) bool { return true },
}For the catalog of built-in predicates and the gotchas (missed delete events,
predicate ordering, the predicate.And / predicate.Or combinators), see the
companion article Watches, events, and predicates in Kubernetes operators.
Tracing one event end-to-end
Putting the pieces together, here is what happens when a user runs
kubectl edit mykind sample:
1. kubectl PATCH -> kube-apiserver
2. apiserver writes etcd, returns success
3. apiserver pushes a WatchEvent on every active watch for MyKind
4. controller-runtime Reflector receives the event over its HTTP/2 stream
5. Reflector updates the Indexer (the cache) and emits to its handlers
6. Source.Kind(cache, &MyKind{}) delivers the event to the EventHandler
7. EnqueueRequestForObject pushes Request{ns/name} to the workqueue
8. Predicate (GenerationChangedPredicate) decides whether to keep it
9. Workqueue deduplicates, applies rate limiter, hands the key to a worker
10. worker calls reconciler.Reconcile(ctx, Request{ns/name})
11. Reconcile() does r.Get(ctx, ...) - reads from the *same* cache
12. Reconcile() decides on actions, calls r.Update / r.Status().Update / r.Create
13. Writes go directly to the apiserver (not through the cache)
14. Those writes generate fresh WatchEvents -> step 4 againWhen something goes wrong in production, the failure usually maps to one of these steps. A few common diagnoses:
- "My reconciler runs constantly" -> step 8 is missing a
GenerationChangedPredicate; status writes are triggering watches. - "My reconciler misses events" -> step 8 has a predicate returning false
on Delete events; check the four
UpdateFunc/CreateFunc/DeleteFunc/GenericFuncbranches explicitly. - "Reconciles are slow under load" -> step 9 has
MaxConcurrentReconciles=1on an I/O-bound workload; raise it. - "Cache memory is huge" -> step 5 is indexing every Pod in the cluster;
add a
Cache.ByObjectlabel selector.
When to drop down to client-go
Inside an Operator, almost never. The exceptions:
- One-shot reads before the cache is started. Use
client.New(mgr.GetConfig(), client.Options{})ormgr.GetAPIReader(). - Custom watches that need full DeltaFIFO control - e.g. for a custom scheduler that re-orders events. controller-runtime hides DeltaFIFO from you.
- CLI / kubectl plugins. No Manager, no cache - direct typed clients are the right tool.
- A second cache scoped to different credentials, e.g. a cross-cluster
Operator that needs one Manager-managed cache plus N independent ones. See
multicluster-runtimefor prior art.
For everything else inside a normal Operator, mgr.GetClient() and the
Builder DSL give you the same primitives with safer defaults and a tenth of
the boilerplate.
Frequently Asked Questions
1. What is controller-runtime in Kubernetes?
controller-runtime is the Go library maintained under the kubernetes-sigs organisation that provides the high-level building blocks every modern Operator uses - Manager, Cache, Client, Builder, Predicate, Source, EventHandler. Kubebuilder and the Operator-SDK both generate code that calls into controller-runtime; you almost never use raw client-go directly any more.2. What is the difference between controller-runtime and client-go?
client-go is the low-level Kubernetes client - REST verbs, typed and dynamic clients, informers, workqueues. controller-runtime is a higher-level library built on top of client-go that bundles a Manager, a shared Cache, a sigs.k8s.io/controller-runtime/pkg/client.Client, and a Builder DSL so you can write a controller in 30 lines instead of 300.3. What does the Manager do in controller-runtime?
The Manager owns the lifecycle of every long-running component - the shared cache, the controllers, the webhooks, the leader-election lease, the metrics server, the health probes. Calling mgr.Start(ctx) starts them in the right order and stops them cleanly when the context is cancelled.4. What is the Cache in controller-runtime?
The Cache is a read-only, eventually-consistent local replica of the objects your controller watches. It is populated by SharedInformers - one per GVK - that watch the API server. Reads through mgr.GetClient() hit the Cache by default, which is what makes a controller fast enough to run thousands of Reconcile() calls per second without flattening the API server.5. What is the workqueue in controller-runtime?
Each controller has a private workqueue, a rate-limited deduplicating queue keyed by namespace/name. Watch events arrive at the queue through EventHandlers, the queue collapses duplicates, applies exponential backoff on errors, and feeds one item at a time (by default) to Reconcile().6. What is the difference between For, Owns, and Watches in the Builder?
For(&v1.MyKind{}) registers the primary resource - the one whose name and namespace become the reconcile key. Owns(&corev1.Pod{}) registers a child resource whose events should requeue the owner via owner-reference lookups. Watches(...) is the general escape hatch - it lets you map events from any resource (even unrelated ones, like a ConfigMap) into reconcile requests through a custom handler.MapFunc.7. When should I drop down to raw client-go instead of controller-runtime?
Almost never inside a normal Operator. The legitimate cases are: writing a kubectl plugin or one-shot CLI, building your own custom scheduler, doing very low-level reflector or watch experimentation, or needing a separate cache or informer that the Manager does not manage. For everything else, controller-runtime gives you the same primitives with better defaults.8. Does each controller in a Manager have its own informer?
No. The Manager keeps a single shared cache, and within that cache a single SharedInformer per GVK serves every controller that watches that GVK. Adding a second controller that watches Pods does not double the watch load on the API server - both controllers read from the same informer's delta FIFO.What's next?
You now know how every controller-runtime piece fits together. Natural next steps in this course:
- Watches, events, and predicates in Kubernetes operators — the deep dive on Source / EventHandler / Predicate, the three event types, and the pitfalls that cause missed reconciles.
- The Kubernetes reconcile loop explained —
the level-triggered control loop, the three
Resultreturn paths, and the anti-patterns that cause hot loops. - Custom Resource Definitions explained — the schema half of the Operator pattern that the cache and informers are watching.
- Future deep dives: Operator leader election, operator metrics with Prometheus, and operator graceful shutdown all build on the Manager primitives covered here.
- Ready to scaffold one? See install Operator-SDK on Linux, then walk through your first Operator project layout to see Manager, Builder, and Reconciler wired up in generated code.

