When Agents Should Stop: Designing Safety Boundaries That Work

An agent in my homelab posted “HEARTBEAT_OK” to the ops channel 47 times over one weekend. Every message was technically correct. The scheduled jobs were healthy, the agent verified them, and it reported in exactly like it was told to. By Monday morning I had muted the channel, which meant the one message that mattered (a failed backup verification) scrolled past unread sometime around 3 AM.

That incident wasn’t an alignment problem or a runaway loop. It was a stopping problem. The agent had no concept of “nothing to say,” so it said something every time it woke up. Most agent safety writing focuses on preventing harmful actions. In practice, the boundary I’ve had to engineer most carefully is more mundane: teaching agents when to do nothing and exit quietly.

If you run scheduled agents, autonomous loops, or anything where an LLM makes decisions on a timer, this is for you. The patterns below come from running multi-agent pipelines on my own infrastructure, and from the specific ways they’ve failed. I covered the theory in Three-Layer Safety for Autonomous Agents; this post is the operational follow-up, the part where theory meets a crash-looping gateway at 2 AM.

Stopping is a feature, not a failure state

The core mistake I made early on: treating an agent that stops as an agent that failed. My first orchestration scripts retried everything. Agent exits without completing the task? Retry. Agent says it’s blocked? Rephrase the prompt and retry. The result was agents that burned tokens grinding against problems they’d already correctly identified as unsolvable from inside the loop.

What fixed it was giving agents a vocabulary for stopping. Mine boils down to three boundary types, all enforced outside the model:

Budget boundaries cap what an agent can spend: iterations, tokens, wall-clock time. These are the easy ones, and most frameworks give you something here. The mistake is setting them as emergency brakes (high enough that they never trigger) instead of as scoping decisions. If a task should take 3 iterations, cap it at 5, not 50. A cap that triggers at 50 means you’ve already wasted 45 iterations of spend before learning anything. I also set budgets per stage rather than per pipeline: a global 30-minute cap on a five-stage pipeline tells you nothing about which stage ran away.

Progress boundaries detect when the agent is still spending but no longer changing anything. This is the infinite-loop killer, and it’s the one almost nobody implements. An agent can stay under every budget cap while making zero progress: rewriting the same file back and forth, re-running the same failing test with cosmetic tweaks. You detect this by hashing the observable state between iterations and stopping when the hash stops changing.

Reporting boundaries define when the agent is allowed to speak. This is the HEARTBEAT_OK lesson: an agent that reports success on every run trains humans to ignore it. Silence on success, noise on failure. The inversion matters more than it looks.

The configs

Progress detection is the highest-value boundary, so start there. The wrapper below runs an agent task in a loop and kills it when two consecutive iterations produce identical state:

#!/usr/bin/env bash
# agent-loop.sh: run an agent task with hard stop conditions
MAX_ITERATIONS=5
previous_state=""

for i in $(seq 1 "$MAX_ITERATIONS"); do
  run_agent_iteration "$TASK_FILE"   # your agent invocation here

  # Hash everything the agent can change: working tree + state dir
  current_state=$(
    { git diff; git status --porcelain; cat state/*.json 2>/dev/null; } \
    | sha256sum | cut -d' ' -f1
  )

  if [[ "$current_state" == "$previous_state" ]]; then
    echo "iteration $i produced no state change, stopping" >&2
    exit 2   # stopped at boundary, not failed
  fi
  previous_state="$current_state"

  task_complete && exit 0
done

echo "hit iteration cap ($MAX_ITERATIONS) without completing" >&2
exit 2

Exit code 2 is doing real work there. I use a three-value contract for every agent wrapper:

0 = done: task complete, verified
1 = failed: something broke, a human needs to look
2 = stopped: hit a boundary with partial progress, safe to resume

The distinction between 1 and 2 is the whole point. A failure pages someone. A boundary stop writes a state file and waits for the next scheduled run, which picks up where the last one left off. Collapsing those into one exit code gives you either alert fatigue or silent data loss, depending on which direction you collapse them.

Notice that all of this lives in the wrapper, not in the prompt. You can (and should) tell the agent about its budget in the prompt, because a model that knows it has two iterations left plans differently. But the prompt is advice. The wrapper is the boundary.

Reporting boundaries live in the scheduler config. Here’s the shape I use for scheduled agent jobs after the heartbeat incident:

{
  "name": "nightly-health-check",
  "schedule": "0 6 * * *",
  "task": "Verify backup jobs completed and volumes are healthy.",
  "notify": {
    "on_success": "silent",
    "on_failure": "channel:#ops",
    "on_boundary_stop": "channel:#ops-low"
  },
  "deadman": "https://hc.example.com/ping/nightly-health"
}

Two things to notice. Success is silent: the channel only gets a message when something needs a human. And the deadman URL replaces the heartbeat message entirely: instead of the agent telling humans “I’m alive,” it pings a dead-man’s-switch endpoint (Healthchecks.io, or any self-hosted equivalent) that alerts only when the ping stops arriving. Machines are good at noticing absence. Humans are terrible at it. Route the liveness signal to the machine and the failure signal to the human.

Gotcha 1: silence can hide breakage

About a month after I made my agents quiet on success, a memory MCP server’s tools started failing silently. Calls returned empty results instead of errors. The agents treated “no results” as “nothing to report” and exited cleanly, status 0, for eleven days. From the outside everything looked healthy: exit codes were green and the dead-man pings kept arriving, because the agent itself was running fine. Only the tools inside it were broken.

The lesson: “silence on success” requires verifying success, not just the absence of an exception. My health-check agents now end every run with an assertion phase that demands positive evidence:

# Don't trust "no errors". Demand proof of work.
results=$(query_memory_store "test-canary-record")
if [[ -z "$results" ]]; then
  echo "canary record missing: memory store is lying to us" >&2
  exit 1
fi

Plant a canary record you know exists, and fail loudly if the tooling can’t find it. A tool that fails silently turns every downstream stop condition into a lie, because the agent is deciding “nothing to do” based on data it never received.

Gotcha 2: validate config before the gateway eats it

Stop conditions usually live in config files, which means they inherit every config-deployment failure mode. I learned this when I added a plausible-looking concurrency cap to an agent gateway’s config. The key didn’t exist in the schema. Older versions ignored unknown keys; the version I was running had switched to strict validation and rejected the whole file. The gateway crash-looped on restart, taking every scheduled agent down with it, including the ones whose job was to report that things were down.

Strict validation is the right behavior (a typo’d max_iteratons silently ignored is a budget cap that doesn’t exist), but it means you treat agent config like any other production config: validate before reload, never after.

# Never restart a gateway on unvalidated config
agentctl validate --config /etc/agent/gateway.json || {
  echo "config invalid, refusing to restart" >&2
  exit 1
}
systemctl restart agent-gateway

If your agent platform ships a doctor or validate subcommand, wire it into the deploy path and make the restart conditional on it passing. If it doesn’t ship one, a JSON Schema check in CI is twenty minutes of work and saves you a crash-looped orchestrator. Same idea as validating Kubernetes manifests before merge, just pointed at your agent stack.

Gotcha 3: a stopped agent must leave a note

Early versions of my boundary stops just exited. The next scheduled run started from scratch, re-derived the same context, hit the same boundary, and exited again. Functionally an infinite loop, just with a 24-hour period and a cron job in the middle.

Now every boundary stop writes a handoff file before exiting:

{
  "stopped_at": "2026-06-08T03:12:44Z",
  "reason": "no_progress",
  "iterations_used": 4,
  "progress_summary": "Identified failing PVC, replica rebuild blocked on node disk pressure",
  "blocking_on": "needs human: node disk cleanup or replica eviction",
  "resume_hint": "check node disk usage before retrying"
}

The next run reads the handoff first. If blocking_on names a human action and nothing in the environment has changed, it exits immediately at near-zero cost instead of re-deriving the same dead end. When the blocker clears, it resumes from the summary instead of from nothing. This one file turned boundary stops from an expensive pause into an actual checkpoint mechanism.

What I considered and rejected

Letting the model decide when to stop. Tempting, because the model often knows it’s stuck. But a stop condition that lives inside the thing being bounded isn’t a boundary, it’s a suggestion. Models are also systematically optimistic that one more iteration will help. I let agents request an early stop (which short-circuits the loop), but enforcement stays in the wrapper, outside the model’s reach.

Confidence thresholds. Some frameworks stop when the model’s self-reported confidence drops below a cutoff. I tried it; self-reported confidence was noise, uncorrelated with whether the next iteration helped. The state-hash check costs one sha256sum and doesn’t depend on the model grading its own homework.

Watchdog agents. A second agent that monitors the first and decides whether to kill it. This works, and for high-stakes pipelines I still use a reviewer stage (the pattern shows up in Multi-Agent AI Systems: Architecture Patterns That Actually Work). But as a stop mechanism it’s expensive and introduces a new question: who stops the watchdog? Deterministic boundaries in the wrapper give you 90% of the value at roughly zero marginal cost.

Where this lands

Stopping is the cheapest safety mechanism you have, and it’s the one most agent deployments skip because it doesn’t feel like a feature. Nobody demos an agent exiting cleanly. But the boundaries above have prevented more incidents on my cluster than any prompt-engineering guardrail I’ve written: budget caps treated as scoping decisions instead of emergency brakes, state-hash progress detection, the 0/1/2 exit contract, silent success paired with loud failure and a machine-checked dead-man switch, and handoff files so a stop is a checkpoint instead of a discard.

Reach for this the moment any agent runs without a human watching: scheduled jobs, overnight batch pipelines, CI agents. Building agent systems that run unattended against real infrastructure is part of what I help teams with at GuatuLabs, and stop-condition design is reliably the piece nobody thought about before the first incident. If your ops channel has a recurring message in it right now that everyone has learned to scroll past, that’s not a reporting feature. That’s a stop condition nobody designed.

Stopping is a feature, not a failure state

The configs

Gotcha 1: silence can hide breakage

Gotcha 2: validate config before the gateway eats it

Gotcha 3: a stopped agent must leave a note

What I considered and rejected

Where this lands

Related Posts

Three-Layer Safety for Autonomous Agents: Stopping the Infinite Loop

Self-Improving AI Infrastructure: How Your Homelab Wiki Updates Itself

Your Vector DB Snapshots Are Landing on the Same Disk That Will Fail

Comments