Human-in-the-Loop Overwhelm

What it is

Many production agents are designed to escalate sensitive actions to a human for approval — a refund over $X, a database delete, an outbound email to a new domain. Human-in-the-loop is one of the strongest defenses available, because it puts a thinking person between the agent and the side effect. But that defense degrades quickly when the queue gets long. Attackers can deliberately flood the queue with low-stakes approvals (or recruit an agent into doing so) until the reviewer is approving items in seconds rather than seconds-of-actual-thought. The malicious request is buried in the noise; by the time it surfaces, the reviewer's habit is "click approve."

Why it matters

Decision fatigue is well-studied — judges grant fewer paroles late in a session, doctors miss diagnoses on long shifts, security analysts wave through alerts after the hundredth false positive. AI agents change the volume curve dramatically: a single agent can generate more approval requests in an hour than a human team historically saw in a week. The mitigation is not "have more humans" — it is queue prioritization (high-risk surfaces, not high-volume), per-reviewer rate limits, mandatory cooling-off on long sessions, and randomized "canary" requests that test whether reviewers are still reading.

Key components

Volume flooding — burying high-stakes requests in low-stakes ones
Decision fatigue — degraded judgment after sustained review
Habituation — reviewers learning to click-approve as default
Insider variant — staff intentionally flooding to push something through
Mitigation — risk-weighted prioritization, reviewer rate limits, canary requests, escalation breakers

What it is

Why it matters

Key components

Related terms

Human-in-the-Loop (HITL)

Agent Governance

Guardrails

Need Help Implementing This?