Is It Your Code or Their Outage? Know Before You Spend an Hour Debugging the Wrong Thing
Your Stripe webhook handler is failing. The error is a 500 from your own endpoint. The payload looks valid. The signature verifies. But the handler is throwing an exception you have never seen before.
You check the Stripe status page. It says "All systems operational."
You spend the next 45 minutes reading your handler code. You add logging. You redeploy. You test against a sample payload in your local environment. The handler works fine locally. You widen your search — maybe it is the deployment environment. You check your environment variables. Everything looks correct.
At 11:47pm, Stripe updates their status page: "Degraded webhook delivery for us-east-1 endpoints. Investigating."
The problem was not your code. The problem was that Stripe was delivering malformed events to endpoints in your region. Your handler threw an exception because the events were malformed, not because your handler was broken. You spent two hours debugging code that was not the problem.
The Stripe status page said "All systems operational" for most of that window because Stripe's automated monitors had not yet detected the degradation. The customers affected were in a specific region, with a specific event type, and the failure rate was below the threshold that triggers automatic status updates.
But here is what was true the entire time: multiple other developers — other HookTunnel users, other businesses using Stripe webhooks — were seeing the exact same failure. The same error codes. The same event types. The same region. The same narrow time window. The pattern was there. Nobody could see it because they were all looking at their own logs in isolation.
Cross-customer pattern detection makes the pattern visible.
How it works
HookTunnel collects aggregate failure data across its user base. When an event fails — your handler returns a non-2xx status, or the delivery attempt errors — HookTunnel records the failure with:
- The provider (Stripe, Twilio, GitHub, etc.)
- The event type (payment_intent.payment_failed, etc.)
- The HTTP status code or error code
- The timestamp
This data is collected for every user. The individual event data stays in your account, private to you, enforced by row-level security at the database level. What crosses the user boundary is only aggregate data: counts, error codes, timestamps, and provider identifiers. No payload contents. No customer data. No account identifiers.
When 3 or more distinct users — the WIDESPREAD_THRESHOLD — hit the same failure pattern within a 24-hour window, HookTunnel flags it as a potential widespread issue.
What "same failure pattern" means
Two failure events match a pattern when they share:
- The same provider (both are Stripe failures)
- The same error code or HTTP status (both are 422 Unprocessable Entity, or both are timeout errors)
- A narrow time window (within 4 hours of each other)
Optional: same event type (both are payment_intent.payment_failed specifically). Event type matching narrows the detection but increases precision — it distinguishes "Stripe is having a broad outage" from "Stripe's payment webhook specifically is behaving incorrectly."
The widespread issue badge
When a failure in your account matches a widespread pattern, a "Widespread Issue" badge appears in two places:
- On the failing event in your event list, so you see it inline while debugging
- In your investigation panel, if you have an active investigation attached to events with this pattern
The badge includes a link to the provider's status page. Stripe, Twilio, GitHub, and the major webhook providers all have public status pages. That link is pre-populated — you do not have to go find it.
The 2am scenario
It is 2:14am. Your pager goes off. Payment webhook failures, high rate, started 18 minutes ago.
You open HookTunnel. The first thing you see on the failing events is a "Widespread Issue" badge: 6 other distinct HookTunnel users are experiencing the same Stripe payment webhook failures within the same 4-hour window.
You click the Stripe status page link. The page says "Monitoring" for webhook delivery. There was an update 9 minutes ago.
You are not the cause. You do not need to wake up anyone on your team. You do not need to deploy a fix. You need to:
- Verify that no customer action is required right now
- Set up a replay for when Stripe resolves the issue
- Go back to sleep
That is what cross-customer pattern detection does in its most important use case: it answers the question "is this my fault?" before you spend two hours assuming it is.
The emotional arc matters. There is a specific kind of 2am anxiety that comes from staring at failing events and not knowing whether you broke something. That anxiety produces bad decisions — rushed deploys, undocumented changes, late-night escalations that turn out to be unnecessary. Knowing that the pattern is widespread does not fix the outage, but it completely changes how you respond. You go from reactive and anxious to informed and calm.
Privacy by design
Cross-customer pattern detection is built on an explicit privacy contract.
What crosses the user boundary:
- Provider name (Stripe, Twilio, GitHub)
- Error codes (HTTP status codes, provider-specific error codes)
- Event type identifiers (payment_intent.payment_failed)
- Timestamps (to establish temporal correlation)
- Aggregate counts (how many distinct users are affected)
What never crosses the user boundary:
- Payload contents
- Customer names, emails, IDs
- Your account identifier
- Your hook IDs or hook names
- Your endpoint URLs
- Any PII from any webhook payload
The database query that powers pattern detection is an aggregate query — it counts distinct users by (provider, error_code, event_type, time_bucket). The result is a number and a pattern descriptor. The individual rows that feed the aggregate are never exposed to anyone outside the account that generated them.
Row-level security at the database level enforces this boundary. Even if there were a bug in the application layer, the RLS policy on the underlying tables prevents cross-user data access. This is not a soft privacy promise — it is a hard database constraint.
Why this architecture matters for enterprise customers
Enterprise customers evaluating HookTunnel for production use ask the privacy question early: "Our webhook payloads contain PII — customer payment data, health information, personal identifiers. Can you guarantee that this data does not leak to other users?"
The answer is: yes, and the guarantee comes from two independent layers. The application layer never reads payload contents for pattern matching — it only reads provider, error codes, and timestamps, which are metadata, not PII. The database layer enforces RLS policies that make cross-user payload access structurally impossible.
Pattern detection is designed to be useful precisely because it does not need payload contents. The signal that matters — "is this a provider-wide failure?" — comes entirely from error codes and timing. Payloads are irrelevant to the detection.
When widespread detection fires and when it does not
It fires when:
- Your failure matches a pattern already seen by 2 other distinct users in the last 24 hours (your failure is the third, crossing the threshold)
- The pattern persists — if you are the 7th affected user, the badge is already there when you land on the failing event
It does not fire when:
- Your failure is unique to your account — a misconfiguration, a broken handler, a bad environment variable. These are not widespread, and no badge appears. This is the correct behavior: if the pattern detection fires, it is meaningful. If it does not fire, that is signal too — it points toward your code or your configuration.
The threshold is intentional: 3 distinct users is a conservative threshold. A single other user seeing a similar failure could be coincidence. Two other users is suspicious. Three or more suggests a systemic pattern — either a provider issue or a widespread misconfiguration scenario that affects a class of users.
Comparison with alternatives
| Capability | HookTunnel | Stripe Status Page | Datadog | ngrok | |---|---|---|---|---| | Detects provider issues before status page updates | Yes (when 3+ users affected) | No (status page lags) | No | No | | Links from failing event to provider status | Yes | N/A | No | No | | Privacy-safe (no PII crosses user boundary) | Yes | N/A | N/A | N/A | | Works for Stripe, Twilio, GitHub, custom providers | Yes (all providers) | Stripe only | No | No | | Appears inline in debugging context | Yes | No | No | No |
The Stripe status page tells you when Stripe has detected and acknowledged an incident. Cross-customer pattern detection tells you when enough users are affected to suggest a systemic problem — which often precedes the official status update by 10-30 minutes. In an on-call scenario, 10-30 minutes of knowing versus not knowing is the difference between a measured response and an anxious scramble.
Frequently asked questions
Does cross-customer detection require any configuration?
No. It runs automatically for all providers on all plans. There is nothing to turn on or configure.
How many HookTunnel users are needed for this to be useful?
The threshold is 3 distinct users, so the feature requires at least 3 users of the same provider to be meaningfully useful. For major providers — Stripe, Twilio, GitHub — the HookTunnel user base is large enough that widespread detection fires reliably when there is a genuine provider incident. For less common providers, the signal is less reliable, though it is still useful when it fires.
Can I see the aggregate data — how many users are affected?
Yes. The widespread issue badge shows the aggregate count: "6 other users affected." You see the count, the provider, and the pattern descriptor. You do not see who those users are or any details about their accounts.
What if my failure is legitimately unique — my handler is broken, not the provider?
No badge appears. The absence of a widespread badge is meaningful signal: it suggests the problem is specific to your account. This narrows your debugging: you are looking at your handler, your configuration, or your credentials — not at a provider incident you cannot control.
Does this work for custom internal webhooks?
Pattern detection aggregates by provider. If you have internal webhooks that you have labeled with a custom provider name, detection only fires if 3 other HookTunnel users have the same custom provider label and hit the same error codes. In practice, custom internal providers are unique to your account, so widespread detection does not apply. The feature is designed for shared providers — Stripe, Twilio, GitHub, and similar services used across many organizations.
How long is the detection window?
24 hours. A failure pattern must involve 3+ distinct users within a 24-hour window to be flagged as widespread. Older data does not contribute to the threshold.
Know whether it is your bug or their outage
Cross-customer pattern detection runs automatically. No configuration needed.
Get started free →