← Back to blog

How to detect lost Stripe webhooks

Most teams do not discover lost Stripe webhooks from monitoring. They discover them from angry customers.

  • "I paid but my account is still locked"
  • "I canceled last week and was billed again"
  • "Refund completed in Stripe, but your app still shows active"

By the time support sees this, the real issue already happened: an event was generated in Stripe, but your system did not process it correctly.

This guide gives you a fast, repeatable way to detect lost webhooks before they become revenue leakage.

What "lost webhook" actually means

A webhook can be "lost" in several ways:

  1. Stripe generated the event, but your endpoint never received it.
  2. Your endpoint received it, but returned a 4xx/5xx and retries never recovered.
  3. Your endpoint returned 200, but your handler failed internally after acknowledgment.
  4. The event was processed, but state mutation failed (for example, DB timeout).

From a business perspective, all four lead to the same outcome: Stripe and your product state diverge.

Step 1: define critical events and expected state transitions

Start with a short table of events that directly affect money and access.

| Stripe event | Expected app action | | --- | --- | | checkout.session.completed | provision account / grant plan | | invoice.paid | extend subscription period | | invoice.payment_failed | mark account at risk / trigger dunning | | customer.subscription.deleted | revoke or downgrade access | | charge.refunded | apply refund state and entitlement rules |

If your team cannot state this mapping clearly, detection will always be noisy.

Step 2: compare Stripe truth vs app truth daily

Run a scheduled reconciliation job (at least daily, ideally every 15 minutes) that asks:

  • Did Stripe emit a critical event?
  • Do we have a matching internal mutation?
  • If yes, how long did it take?

At minimum, persist these fields for each webhook attempt:

  • event id (evt_...)
  • event type
  • response status
  • delivery timestamp
  • processing result
  • trace id / correlation id

Without this data, every incident becomes guesswork.

Step 3: alert on divergence, not just on HTTP failures

HTTP 500 alerts are useful but insufficient.

Add divergence alerts such as:

  • "Stripe cancellation found, but local subscription still active after 10 min"
  • "Stripe payment succeeded, but no entitlement granted"
  • "Refund event processed in Stripe, but invoice state unchanged"

This catches silent failure modes where your endpoint still returns 200.

Step 4: measure the dollar impact

Engineering alerts get ignored. Revenue impact gets prioritized.

For each unresolved divergence, estimate impact in cents and aggregate:

  • total at risk
  • recovered amount
  • unresolved critical count
  • top divergence types by value

Once leadership sees "we have $12,400 at risk from missed payment webhooks," these bugs move from backlog to roadmap.

Step 5: close the loop with replay

Detection without recovery creates more toil.

For every divergence, you should be able to:

  1. inspect original payload and delivery attempts,
  2. replay the exact event to your endpoint,
  3. track whether replay resolved the issue.

Replay is how you convert observability into recovered revenue.

A practical baseline you can ship this week

If you need a concrete first milestone, ship this:

  • persist raw webhook payload + headers before processing,
  • store attempt status and latency,
  • run a scheduled reconciliation for critical Stripe events,
  • alert when Stripe state and app state diverge,
  • support one-click replay.

That baseline is enough to prevent most "we had no idea this was broken" incidents.

Final checklist

  • [ ] Critical event mapping documented
  • [ ] Durable webhook ingestion enabled
  • [ ] Reconciliation job running on schedule
  • [ ] Divergence alerts configured
  • [ ] Replay workflow tested end to end
  • [ ] Revenue impact reported weekly

If you want this without building custom infra first, Revenue Recovery Autopilot gives you scanner + monitor + recovery workflows on top of your Stripe webhook flow, so your team can detect and resolve revenue divergences fast.

Start here: https://katsuralabs.com

Revenue Recovery Autopilot will detect broken webhooks that cost you money. Join the early access waitlist.

Join the early access waitlist →