Replay Safely After Code Changes: Why Naive Webhook Replay Causes More Problems Than It Solves
You fix a handler bug that was silently dropping Stripe subscription events. Now you replay the 40 failed events. But between the failure and now, 15 of those customers cancelled and re-subscribed. Naive replay creates subscription state conflicts. 3 customers get double-charged.
You ship a fix at 2:15 PM on Thursday. The fix corrects a handler bug that was silently swallowing customer.subscription.updated events from Stripe. The bug has been in production for 11 days. During those 11 days, 40 subscription update events were received, returned 200, and never wrote the correct state to your database.
You know the events failed because a customer reported it, and you traced the issue to a missing break statement in a switch case that caused certain plan changes to fall through to the default branch, which logged and returned without writing.
The fix is deployed. New events process correctly. The 40 historical events are sitting in your request log. You need to replay them.
This is the moment where most teams reach for the simplest possible solution: iterate over the 40 events, re-send each one to the handler, and let the fixed code process them. This is also the moment where most teams create a second incident while trying to recover from the first.
Why the code change makes replay dangerous
The naive mental model for replay is: "The event failed before. The code is fixed now. Send it again."
This model treats replay as a deterministic retry — same input, better code, correct output. In practice, replay after a code change is a fundamentally different operation than the original processing. Four things changed between the original failure and the replay attempt:
1. The handler code changed. The bug fix corrected the switch statement. But the fix may have also changed how the handler maps Stripe plan IDs to your internal tier names. Or it may have added validation that did not exist before. Or it may have changed the database query from an UPDATE to an UPSERT. The replayed event will be processed by code that has never seen this specific payload before — the original code saw it and failed silently; the new code has never seen it at all.
2. Application state changed. During the 11 days between the original failure and the replay, your application's state evolved. Customers who were affected by the failed events did not sit idle — they contacted support, cancelled, re-subscribed, upgraded through a different flow, or received manual fixes from your operations team. The state that existed when the event originally arrived no longer exists.
3. Time-dependent logic may behave differently. If your handler includes time-based logic — trial periods, grace windows, proration calculations, SLA timers — replaying an event 11 days late produces different results than processing it on time. A subscription change that should have been prorated for 20 remaining days is now prorated for 9 days. A trial that should have started 11 days ago now starts today.
4. Side effects may re-trigger. If your handler sends a confirmation email, creates a charge, enqueues a downstream job, or fires a notification, replaying the event triggers those side effects again. The customer who already received a manual fix from support now gets an automated email saying their plan was upgraded — 11 days late, after they already cancelled.
The Stripe subscription scenario in detail
The 40 failed events are customer.subscription.updated events. Here is what happened to the affected customers during the 11-day gap:
- 18 customers never noticed. Their plan change did not take effect, but they did not use the affected features during those 11 days.
- 12 customers contacted support. Your operations team manually updated their subscription records in the database.
- 7 customers cancelled and re-subscribed to a different plan.
- 3 customers upgraded through a different path (the Stripe checkout portal, which bypasses the webhook flow).
Now you replay all 40 events through the fixed handler.
The 18 unaware customers: Replay succeeds. The handler writes the correct subscription state. These customers are fine.
The 12 manually-fixed customers: Replay conflicts. The handler tries to update the subscription record, but the manual fix already set the correct state. If your handler uses an UPDATE with a WHERE clause that matches the old state, the UPDATE matches zero rows and silently does nothing. If your handler uses an UPSERT or a simple UPDATE by user ID, it overwrites the manual fix. If the manual fix set a different plan than what the original event specified (because the customer changed plans in the interim), the replay now writes stale data over current data.
The 7 cancel-and-resubscribe customers: Replay is dangerous. The original event says "upgrade to Pro." The customer cancelled Pro and resubscribed to Team. Replaying the event sets their plan back to Pro. Their Team features stop working. They see a plan they cancelled displayed in their dashboard. If the plan change triggers a billing adjustment, they may be charged the wrong amount.
The 3 portal-upgrade customers: Replay creates duplicates. The customer upgraded through Stripe's checkout portal, which generated its own webhook event that was processed correctly by the fixed handler. Replaying the original event processes the same logical operation a second time. If your handler is idempotent by Stripe event ID, the duplicate is caught. If your handler is idempotent by operation type (upgrade to Pro), it may conflict with the portal-initiated upgrade, which had a different event ID.
Out of 40 events, naive replay produces correct results for 18, conflicts for 12, dangerous results for 7, and potentially duplicate results for 3. A 45% success rate is not recovery — it is a new incident.
The idempotency trap
Teams that have invested in idempotency feel protected from replay failures. "Our handler is idempotent — replaying events is safe by definition."
This is true only if the idempotency key is correct and the handler version is the same.
Most webhook handlers use the provider's event ID as the idempotency key. The handler checks: "Have I already processed event evt_1234abc?" If yes, skip. If no, process.
This works for deduplication of the same event delivered twice by the provider. It does not work for replay after a handler version change, because the original processing of evt_1234abc may have been recorded as "processed" even though it failed silently.
If your handler recorded evt_1234abc as processed before the silent failure, replay will skip it — the idempotency check says "already handled." The event is skipped. The customer remains in the wrong state.
If your handler did not record evt_1234abc as processed (because the failure happened before the idempotency write), replay will process it. But the handler code changed. The new handler may produce a different outcome than the original handler would have — not because it is buggy, but because the mapping, validation, or business logic evolved.
Idempotency protects against duplicate delivery. It does not protect against replay with a different handler version. These are different problems that require different solutions.
What controlled replay looks like
Safe replay after a code change requires five capabilities that naive replay does not have.
Filtering. Replay should not be "replay all 40 events." It should be "replay the 18 events where the affected customer has not had any subsequent state changes." Filtering by customer state, event type, time window, and processing status reduces the blast radius from 40 events to the subset that can be safely replayed.
Replay request:
Source: failed events between 2026-03-22 and 2026-04-03
Filter: customer.subscription.updated AND processing_status != 'applied_confirmed'
Exclude: customers with subscription changes after original event timestamp
Result: 18 of 40 events eligible for replay
Dry-run preview. Before executing the replay, preview what would happen. The dry-run evaluates each eligible event against current application state and flags conflicts:
Dry-run results:
18 events eligible
15 events: no conflicts detected (customer state unchanged since failure)
2 events: WARNING — customer subscription modified by support ticket
1 event: WARNING — customer cancelled and resubscribed since failure
Risk assessment: 15 LOW, 2 MEDIUM, 1 HIGH
The operator reviews the dry-run output and decides which events to proceed with. The 15 low-risk events are approved. The 3 flagged events are reviewed individually — maybe two of them are safe after manual inspection, maybe one needs a custom fix.
Receipt-aware skip logic. If your system uses outcome receipts — explicit confirmation that an event's side effect was committed — replay checks for existing receipts before re-delivering. An event with an applied_confirmed receipt is skipped automatically. This prevents the most common replay failure: re-processing events that actually succeeded but were not visible to monitoring.
For the full receipt model and how it changes replay safety, the key insight is: a receipt is proof of outcome, not proof of delivery. Health checks prove the system received the event. Receipts prove the system committed the result.
Batch risk assessment. When replaying multiple events, the system evaluates the batch as a whole, not just individual events:
Batch assessment:
15 events, 8 distinct customers
Estimated processing time: 4.2 seconds
State conflicts detected: 0
Side effects: 15 confirmation emails will be sent
Billing impact: 0 charge adjustments
Recommendation: PROCEED with email suppression flag
The side effects assessment matters. If your handler sends confirmation emails, replaying 15 events sends 15 emails — 11 days late. The batch assessment surfaces this so the operator can decide whether to suppress email side effects during replay.
Audit trail. Every replayed event is tagged with metadata that makes the replay inspectable after the fact:
{
"replay_job_id": "rj_abc123",
"operator": "jane@company.com",
"reason": "Handler bug fix — silent failure on subscription updates",
"approved_at": "2026-04-03T14:22:00Z",
"original_event_id": "evt_stripe_xyz",
"original_failure_timestamp": "2026-03-25T08:41:12Z",
"replay_timestamp": "2026-04-03T14:23:01Z",
"handler_version": "v2.4.1",
"dry_run_risk": "LOW"
}
When the finance team asks "what happened to those 40 events?" six months later, the answer is documented: 15 were replayed by Jane on April 3rd after a dry-run assessment, 22 were resolved manually, and 3 were skipped because the customers cancelled. Every action is traceable. This is what retries do not provide — retries are automated and uncontrolled; replay is deliberate and documented.
The post-deploy blind spot
The scenario above is a handler bug fix. But the same risks apply to any code change that modifies webhook processing logic.
Schema migrations. You add a column to the subscriptions table and update the handler to write to it. Events that were logged before the migration do not include the new column's value. Replaying them through the updated handler either leaves the new column NULL (if the handler has a default) or fails (if the column is NOT NULL without a default).
Dependency updates. You update the Stripe SDK version. The new SDK parses event payloads differently — maybe it deserializes amounts as integers instead of floats, or it changes how nested objects are structured. Events logged with the old SDK's serialization format may not parse correctly with the new SDK.
Feature flags. You enable a feature flag that changes handler behavior. Events that were processed when the flag was off are now replayed with the flag on. The handler takes a different code path. The outcome is different from what the original processing would have produced.
Environment changes. You migrate from one database to another, or change the connection string from the primary to a pooled endpoint. Events replayed after the migration are processed against the new database. If the migration included schema changes, the replay may fail.
Each of these is a case where the assumption "same input, same output" is false. The input is the same (the stored webhook payload). The processing context is different. Controlled replay accounts for this by evaluating current context before execution.
What HookTunnel provides for replay safety
HookTunnel's replay system is designed around the assumption that replay always happens after something changed — otherwise, why would you need to replay?
Filter before replay. Select events by time window, event type, processing status, provider, and custom criteria. Exclude events that have subsequent state changes.
Dry-run before execution. Preview the replay batch with risk assessment. See which events have conflicts, which would trigger side effects, and which are safe.
Receipt-aware skip. Events with confirmed outcome receipts are automatically excluded from replay. No double-processing of events that already succeeded.
Stop-on-receipt during replay. If a receipt arrives for an event while a replay batch is in progress — because the original processing completed asynchronously after a delay — the replay stops for that event. The remaining events continue.
Operator approval. Replay requires explicit approval. The dry-run output is the approval artifact. The operator reviews, decides, and commits. No automated replay without human oversight for high-risk batches.
Full lineage. Every replayed event links back to the replay job, the operator who approved it, the reason for replay, the risk assessment, and the handler version that processed it. The audit trail is complete and permanent.
Replay is not a button that says "try again." It is a workflow that says "here is what happened, here is what would happen if we replay, here is the risk, do you want to proceed?" The difference between these two is the difference between creating a second incident and recovering from the first.
Stop guessing. Start proving.
Generate a webhook URL in one click. No signup required.
Get started free →