·6 min read·The PayGraph Team

When AI agents should ask a human before spending

Human-in-the-loop AI payments need clear triggers, not vibes. Here are the three triggers that matter, the UX patterns that work, and the dollar thresholds that hold up in production.

Agents that spend money will eventually propose something you don't want them to do. The question isn't whether to involve a human — it's when, how fast, and on what signal.

What is human-in-the-loop for AI payments?

Human-in-the-loop AI payments means a person approves a transaction before the money moves, triggered by a rule the agent cannot bypass. The agent proposes, the policy evaluates, and if the proposal crosses a threshold, the agent pauses until a human clicks approve or reject in Slack, email, or a dashboard.

It is not the same as review-after-the-fact. Post-hoc review catches patterns; pre-flight approval catches the specific transaction before it clears. Both matter. Only one prevents the charge.

For a deeper look at where approvals fit inside the broader stack, see the PayGraph homepage and the related piece on policy-controlled spending in the blog.

What are the three triggers for human approval?

There are three signals worth pausing an agent for. Everything else should run on policy alone.

1. Amount threshold. The simplest trigger and the one you ship first. Any single transaction above $X, or cumulative daily spend above $Y, routes to a human. Most teams set the per-transaction threshold well below the maximum the agent is theoretically allowed, so routine charges flow through but outliers stop.

2. Category risk. Some spend categories need human eyes regardless of amount. Wire transfers, new vendor payments, anything marked tax, legal, or payroll, and first-time recipients are the common ones. A $50 wire to an unknown account is more dangerous than a $5000 charge to a vendor you pay every week.

3. Anomaly detection. Transactions that deviate from the agent's normal pattern. Five charges in a minute when the agent usually makes one per hour. A charge at 3am when the agent historically only runs business hours. A vendor the agent has never paid before. None of these are policy violations on their own; all of them deserve a second look.

What dollar thresholds actually work in production?

There is no universal number, but deployments tend to cluster around predictable ranges based on agent authority. Here is what holds up:

Agent typePer-transaction auto-approveRequires humanHard cap
Ads buyerUnder $250$250–$2,500$2,500
Cloud / SaaS top-upUnder $100$100–$1,000$1,000
Procurement assistantUnder $500$500–$10,000$10,000
Travel bookerUnder $750$750–$5,000$5,000
Invoice payer$0 (always approve)All amountsPolicy-defined

Two rules of thumb from teams running these in production:

  • The auto-approve ceiling should be roughly 5–10% of the agent's weekly budget. Higher and one bad decision wipes the week. Lower and you drown approvers in notifications.
  • The human-approval band should cover the 80th to 99th percentile of normal transactions. Below that, automate. Above that, it's a hard reject — even a human shouldn't rubber-stamp it via Slack.

What does a good approval UX look like?

Slow approvals kill agents. If a purchase order approval takes four hours, the agent is effectively offline for four hours. The UX target is a 30-second round trip for 90% of approvals.

What makes that possible:

  • Send the decision, not the data. The approval message should show amount, vendor, category, the agent's stated reason, and two buttons. Not a JSON blob. Not a link to a dashboard.
  • One-click approve or reject from mobile. Slack interactive messages and signed webhook links both work. Email with unique approve/reject URLs works for low-volume cases.
  • Deadlines. Every approval request has a TTL. If nobody responds in 15 minutes, the agent gets an automatic rejection and moves on or escalates. Silence is not approval.
  • Context, not history. Show the last 3 transactions from this agent for this vendor. Do not dump the full audit log into the message.

Here is the minimum policy that implements all three triggers with PayGraph:

from paygraph import PolicyEngine, Policy, Anomaly
 
policy = Policy(
    # Trigger 1: amount threshold
    require_approval_above_usd=250,
    daily_cap_usd=5000,
 
    # Trigger 2: category risk
    always_require_approval_categories=[
        "wire_transfer",
        "new_vendor",
        "legal",
        "payroll",
    ],
 
    # Trigger 3: anomaly detection
    anomaly_rules=[
        Anomaly.rate_limit(max_per_minute=2),
        Anomaly.first_time_vendor(),
        Anomaly.off_hours(business_hours="09:00-19:00"),
    ],
 
    approval_webhook="https://hooks.slack.com/services/...",
    approval_timeout_seconds=900,
)
 
engine = PolicyEngine(policy)
 
@engine.guarded_tool
def make_payment(amount_usd: float, vendor: str, category: str):
    # existing Stripe Issuing / x402 / internal API call
    ...

The approval_timeout_seconds=900 is the deadline. The anomaly_rules handle cases the amount threshold misses. The category list covers the blast-radius cases where dollar amount is the wrong lens.

How do you avoid approval fatigue?

The fastest way to destroy a human-in-the-loop system is to send too many approvals. Reviewers stop reading, start rubber-stamping, and the control is worse than no control because everyone thinks it works.

Four patterns that keep approval volume sustainable:

  1. Tier your approvers. Charges under $1000 go to the team lead. Above that goes to the budget owner. Above the weekly cap goes to finance. No single person sees every approval.
  2. Batch low-urgency approvals. Non-time-sensitive charges can queue into a once-daily digest. A human reviews 20 proposed ad spends in one sitting instead of 20 Slack pings.
  3. Auto-approve repeats. If the agent has paid this vendor, this amount, this category successfully three times in the last 30 days, the fourth can skip approval — within the policy ceiling.
  4. Track approval rate. If 98% of approvals are approved, your threshold is too low. Raise it. If 30% are rejected, your agent needs better policy constraints upstream, not more human review.

The goal is not to approve every transaction. It's to approve the transactions that a human would actually catch something on.

Where to start

Pick your three triggers, set the thresholds at the 5-10% weekly budget line, and give approvers a 30-second UX. That's the whole recipe.