When should I return an error from Reconcile instead of RequeueAfter?

Return a non-nil error when the failure is unexpected or infrastructure-level (apiserver timeout, RBAC denied, leader loss) so the workqueue applies exponential backoff. Return RequeueAfter with a nil error when the system is healthy but not ready yet (waiting for an external dependency, polling a condition) so you keep control of the retry interval instead of the global backoff curve.

Does RequeueAfter bypass the rate limiter?

Yes. A successful return with RequeueAfter schedules the next reconcile after that duration, independent of the per-key failure backoff. Combine that fact with predicates so you do not create a tight hot loop by requeueing too aggressively.

What happens if the object was deleted while Reconcile was running?

client.Get typically returns NotFound. Treat that as success with no requeue by returning client.IgnoreNotFound(err) or an explicit apierrors.IsNotFound branch, because there is nothing left to converge for that key.

How do I stop retrying a permanently invalid Custom Resource?

Surface the failure in status or events, then return Result{}, nil. Optionally wrap the error with reconcile.TerminalError on controller-runtime versions that support it so the framework drops the key from failure-driven retries while still logging the cause once.

controller-runtime Requeue, RequeueAfter & Error Handling (2026)

Every Kubernetes Operator built with Kubebuilder or Operator SDK eventually comes down to one function: Reconcile. The behavior users feel in production—fast recovery, slow storms, or “stuck for an hour”—is often decided by how you combine ctrl.Result, Requeue, RequeueAfter, and the error return path.

This guide is the focused companion to the full pipeline story in the Kubernetes reconcile loop explained and the machinery underneath it in controller-runtime architecture. Read those for informers, caches, and workqueues; read this page when you are choosing what to return from Reconcile and debugging retry behavior.

The Reconcile contract in one minute

In controller-runtime, reconciliation implements:

go

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error)

The framework hands you a key (req.NamespacedName). It does not hand you a guaranteed-up-to-date object; you Get, observe, mutate, update status, then return.

`ctrl.Result` vs `error`: two different signals

ctrl.Result answers: “Assuming my logic ran as intended, when should this key be processed again?”
error answers: “Something went wrong; please apply the failure retry policy.”

Those channels are independent in the type system but interact in operator behavior: a non-nil error always implies a requeue with exponential backoff, regardless of what you put in Result.

What the workqueue does with each combination

Under the hood, controller-runtime uses client-go’s rate limiting workqueue (see controller-runtime architecture). In practice:

Return	Typical effect
`Result{}, nil`	Success. Backoff counter resets. The key is forgotten until a new watch event, resync, or an explicit `Requeue` / `RequeueAfter` you returned earlier fires.
`Result{Requeue: true}, nil`	Immediate requeue; still subject to the rate limiter for that key.
`Result{RequeueAfter: d}, nil`	Schedule the same key after `d`; bypasses the failure backoff timer (still not a license to spin hot).
`Result{}, err`	Failure path: log, increment backoff, re-add the key with exponential delay.

Exact tuning (QPS, bucket rate limiter defaults) can shift between versions; treat the table as the mental model operators rely on.

Decision guide: which return should I use?

Summary table (scenario → return → what happens next)

Scenario	Recommended return	Why
Object converged; nothing to do until the world changes	`ctrl.Result{}, nil`	Lets watches and resync drive the next wake-up.
Healthy “not ready yet” (external API, certificate pending)	`ctrl.Result{RequeueAfter: d}, nil`	You pick `d`; you avoid poisoning backoff with benign “errors”.
Need another pass quickly after a benign local decision	`ctrl.Result{Requeue: true}, nil`	Immediate requeue under rate limiting—use sparingly.
Unexpected failure (network, RBAC, bug you cannot classify)	`ctrl.Result{}, err`	Lets the workqueue backoff protect the apiserver.
User error that will never succeed without a spec edit	`ctrl.Result{}, nil` after status update (or `reconcile.TerminalError`)	Stops infinite retry noise; see terminal errors.

When not to use `return err` for “try again later”

If the cluster is fine but time must pass, prefer:

go

return ctrl.Result{RequeueAfter: 30 * time.Second}, nil

instead of:

go

return ctrl.Result{}, fmt.Errorf("still waiting")

Errors should mean “this reconcile attempt failed,” not “come back in thirty seconds.” Misusing error inflates backoff, hides intent in logs, and makes SLO dashboards look like your Operator is failing when it is actually waiting.

Steady state: finish without forcing another reconcile

`return ctrl.Result{}, nil`

This is the happy path once .spec (and any derived child resources you own) match your intent and you have written status if needed. Returning nil error tells the framework: this attempt succeeded.

Operators should reach this state often. If you never return Result{}, nil, you may be over-requeueing or leaking work.

When the next run happens anyway (watches, resync)

Even after Result{}, nil, Reconcile will run again when:

A watch delivers an event for the primary object or something you Owns / Watches.
The shared informer resyncs (periodic full relist behavior—depends on configuration and predicates).

That is why reconciliation must be idempotent: the same key can arrive many times without you asking. See watches, events, and predicates to avoid reconcile storms from status-only churn.

Success with an explicit follow-up

`Requeue: true` (immediate requeue and rate limiting)

Requeue: true means “enqueue again as soon as the workqueue allows,” not “run again in the same goroutine.” It still passes through the rate limiter, so it is safer than a raw loop but can still amplify load if combined with noisy watches.

Use it when you deliberately need another pass soon, but not on a fixed clock— for example right after a write where you expect a child object to appear quickly and you want to re-read before declaring phase complete.

`RequeueAfter` (scheduled requeue and typical use cases)

RequeueAfter is the idiomatic timer for polling and backoff you control:

Waiting on an external SaaS API quota window.
Rechecking a TLS certificate that will not flip to Ready until the CA responds.
Throttling noisy checks while still faster than global error backoff.

Because it bypasses failure backoff, pair it with sane durations and with predicates so unrelated object updates do not drown the queue.

Choosing a delay (jitter, ceilings, avoiding thundering herds)

Production tips:

Add small jitter when many objects might line up on the same delay.
Cap extremely large RequeueAfter values with domain sense (a daily check does not need sub-second precision).
If every replica of your controller hits the same external system on the same cadence, stagger via hashing the object UID into a few-second spread.

Failure path: `return ctrl.Result{}, err`

Exponential backoff and logging

When you return a non-nil error, controller-runtime logs the error and requeues with exponential backoff (per-key). That protects the apiserver when etcd is slow, RBAC is wrong, or a webhook is temporarily down.

This path is ideal for transient infrastructure problems: timeouts, 429 Too Many Requests, unexpected 500 from aggregated APIs, and similar.

Transient failures that belong on the error path

Good error candidates:

Network blips talking to the Kubernetes API.
Errors you truly cannot classify without another full attempt.
Temporary admission webhook outages affecting creates/updates.

Bad error candidates:

“Replicas still not ready” when that is normal rollout progress—use RequeueAfter and/or rely on Owns on Pods.
Validation failures the user must fix in the spec—treat as terminal; see below.

Picking one signal when `Result` and `error` both feel tempting

If you return both a non-nil error and a RequeueAfter, the failure path wins: the error handling path is authoritative. Do not rely on combining them; choose either a controlled timer (RequeueAfter, nil) or an error for backoff.

Terminal errors: stop infinite retry on bad input

Status, events, and returning success with no requeue

When .spec is invalid in a way no amount of retry will fix (bad image name format, contradictory fields), the operator should:

Write a condition or phase on the status subresource.
Emit a Kubernetes event if your pattern includes user-visible hints.
Return ctrl.Result{}, nil.

That pattern stops the workqueue from hammering the same broken object forever and matches how mature projects communicate “user must change the CR.”

`reconcile.TerminalError` (version note)

On supported controller-runtime versions, reconcile.TerminalError(err) wraps an error so the manager logs it once and does not apply the usual failure requeue behavior—useful when you still want rich error context in logs but not infinite backoff loops.

Check your go.mod version and the upstream sigs.k8s.io/controller-runtime/pkg/reconcile documentation before relying on this API; behavior evolved across minor releases.

Validation failures vs reconciliation failures

Validation failure (spec does not satisfy CRD OpenAPI, or your own semantic checks): user action required → terminal pattern.
Reconciliation failure (cannot reach apiserver): infrastructure → return err.

Admission webhooks blur the line; see mutating and validating webhooks for how denied requests surface.

Special API errors every reconciler should handle

`NotFound` and `client.IgnoreNotFound`

At the top of Reconcile, after Get:

go

if err := r.Get(ctx, req.NamespacedName, &obj); err != nil {
    return ctrl.Result{}, client.IgnoreNotFound(err)
}

IgnoreNotFound maps NotFound to nil, which becomes Result{}, nil—the object is gone, so the desired state for that key is vacuously satisfied from the controller’s point of view. This pairs naturally with finalizers when you still have cleanup work before the object disappears; once the object is truly gone, NotFound is success.

Optimistic concurrency and `Conflict`

Update can fail with Conflict when resourceVersion changed between read and write—common in busy clusters or when multiple writers touch the same object.

Patterns:

RequeueAfter short delay with nil error for benign races.
Switch hot paths to patch or Server-Side Apply to reduce contention when field ownership is clear.

After conflict: requeue vs patch strategy (pointers to deeper guides)

If conflicts are frequent, your reconcile may be doing read-modify-write on large objects. Consider:

Narrower patches (only fields you own).
SSA with a dedicated field manager (see SSA in operators).
Reviewing whether status updates should use Status().Patch / SSA subresource flows described in status and conditions.

Idempotency checklist (same key, many runs)

Side effects and ordering

Reconcile must tolerate N runs for the same resourceVersion or equivalent logical state. External side effects (S3 buckets, DNS records, billing calls) should be guarded by:

deterministic names,
compare-and-create patterns,
or finalizer-gated cleanup (see finalizers explained).

Write, re-read, and cache assumptions

After a write, the in-memory object can be stale. Either re-fetch, use the returned object from Create/Patch, or rely on the next event—but do not assume fields you did not set stayed untouched.

Status writes and reconcile noise (link to predicates)

Writing status on every pass can create watch noise. Combine careful diffing (only patch when status changed) with predicates such as generation-based filters so .status updates do not fan out into full spec reconciles unless you intend that.

Anti-patterns and production surprises

Using errors for normal “waiting” states

This is the most common beginner bug: it turns “waiting for rollout” into “exponential backoff climbing to minutes.” Prefer RequeueAfter or child-resource watches.

Tight `RequeueAfter` loops

Sub-second RequeueAfter everywhere can still overload the apiserver if cardinality is high. Combine with shared work, batching, or higher-level coordination when reconciling thousands of objects.

Masking bugs with infinite retry

If logic is wrong, backoff only slows the failure—it does not fix it. Investigate reconciliation logs, object events, and diff of managed children.

Logging volume and cardinality on hot paths

Every return err may log a stack or message each backoff cycle. For known benign states, downgrade to debug logs inside a branch that returns RequeueAfter instead.

Putting it together: a small skeleton `Reconcile`

Ordered phases (get → validate → mutate → status → return)

go

func (r *MyReconciler) Reconcile(ctx context.Context, req ctrl.Request) (ctrl.Result, error) {
    logger := log.FromContext(ctx)

    var cr myv1alpha1.MyKind
    if err := r.Get(ctx, req.NamespacedName, &cr); err != nil {
        return ctrl.Result{}, client.IgnoreNotFound(err)
    }

    // Terminal validation example
    if err := validateSpec(&cr); err != nil {
        // update status / event, then:
        return ctrl.Result{}, nil
    }

    if err := r.ensureChildren(ctx, &cr); err != nil {
        if apierrors.IsConflict(err) { // k8s.io/apimachinery/pkg/api/errors
            return ctrl.Result{RequeueAfter: time.Second}, nil
        }
        return ctrl.Result{}, err // transient
    }

    if !r.ready(&cr) {
        return ctrl.Result{RequeueAfter: 10 * time.Second}, nil
    }

    if err := r.patchStatusReady(ctx, &cr); err != nil {
        return ctrl.Result{}, err
    }

    logger.Info("steady state", "name", cr.Name)
    return ctrl.Result{}, nil
}

Commented returns for each branch

IgnoreNotFound: deleted object → success, no requeue.
Validation branch: user must fix spec → success, no requeue (terminal pattern).
Conflict: short RequeueAfter, not necessarily an error signal.
Generic err: infrastructure / unknown → backoff via return err.
ready false: controlled polling via RequeueAfter.
Final nil: converged until watches fire again.

Adapt naming to your APIs; the structure is what matters in review.

Frequently Asked Questions

Does returning Result{Requeue: true}, nil replace watches?
No. Watches should still be your primary driver. Requeue: true is for edge cases where you know another pass is needed before external events will arrive.

Should multi-resource operators always use RequeueAfter?
Not by default. Prefer Owns / Watches for child objects (see multi-resource reconciliation) so the queue wakes on real changes instead of timers.

How does this relate to drift detection?
Drift checks often add periodic verification; align timers with drift detection patterns instead of hammering error returns.

The Reconcile contract in one minute

ctrl.Result vs error: two different signals

What the workqueue does with each combination

Decision guide: which return should I use?

Summary table (scenario → return → what happens next)

When not to use return err for “try again later”

Steady state: finish without forcing another reconcile

return ctrl.Result{}, nil

When the next run happens anyway (watches, resync)

Success with an explicit follow-up

Requeue: true (immediate requeue and rate limiting)

RequeueAfter (scheduled requeue and typical use cases)

Choosing a delay (jitter, ceilings, avoiding thundering herds)

Failure path: return ctrl.Result{}, err

Exponential backoff and logging

Transient failures that belong on the error path

Picking one signal when Result and error both feel tempting

Terminal errors: stop infinite retry on bad input

Status, events, and returning success with no requeue

reconcile.TerminalError (version note)

Validation failures vs reconciliation failures

Special API errors every reconciler should handle

NotFound and client.IgnoreNotFound

Optimistic concurrency and Conflict

After conflict: requeue vs patch strategy (pointers to deeper guides)

Idempotency checklist (same key, many runs)

Side effects and ordering

Write, re-read, and cache assumptions

Status writes and reconcile noise (link to predicates)

Anti-patterns and production surprises

Using errors for normal “waiting” states

Tight RequeueAfter loops

Masking bugs with infinite retry

Logging volume and cardinality on hot paths

Putting it together: a small skeleton Reconcile

Ordered phases (get → validate → mutate → status → return)

Commented returns for each branch

Frequently Asked Questions

Further reading

Related articles

Upstream references

Related Articles

Pause and Resume Patterns for Kubernetes Operators

From Commit to Cluster: End-to-End Operator Release Pipeline

Configuration for Operators: Flags, Environment Variables, and Live Reload

CEL Validation in CRDs: Practical Rules for Operator APIs

Kubernetes Operator Hardening Beyond RBAC: Pod Security and Supply Chain

Migrate a Helm Chart to a Helm-Based Kubernetes Operator

OLM Bundles Explained: Package, Ship, and List on OperatorHub

OpenTelemetry Tracing for controller-runtime Operators

Search GoLinuxCloud