Reliability

The Dependency Map Lives in Your Team's Heads. That Is a Problem.

Ask any engineer on your team: which webhooks depend on which targets? They will give you an answer. The answer will be approximately correct. It will be missing the edge cases. It will be wrong about at least one thing that matters.

This is not a criticism of your team. It is a description of how distributed systems work. Dependencies accumulate without ceremony. A new microservice starts receiving Stripe events. A second target gets added to a hook for redundancy, then becomes the primary when someone updates a URL field and forgets to tell anyone. A new engineer sets up a webhook on Friday and the Monday standup doesn't mention it.

The dependency map is real. It just isn't written down. And the first time you discover that a critical payment target is actually a single point of failure for eight different hooks is almost always during an incident, when you're trying to understand why eight different things broke at the same time.

HookTunnel's blast radius analysis reads the dependency map from the traffic itself. You don't write it. You don't maintain it. It infers it, visualizes it, and tells you which nodes are most critical — while you still have time to do something about it.


How Dependency Inference Works

Dependency graphs for webhooks are not obvious. Webhooks don't have explicit dependency declarations. A Stripe payment_intent.succeeded event going to payments.internal.company.com and a Stripe customer.subscription.updated event going to subscriptions.internal.company.com might be entirely independent — or they might both be triggered by the same upstream customer action, making their temporal correlation meaningful.

HookTunnel uses three inference strategies in combination.

Temporal Inference

When two hooks consistently receive events within a short time window — say, within 2 seconds of each other — that correlation is recorded. If this pattern repeats across thousands of events, it is almost certainly not coincidence. It suggests a common upstream trigger, which in turn suggests that both targets will be affected simultaneously when that upstream system has a problem.

Temporal inference finds the implicit dependencies that nobody bothered to document because they were obvious at the time.

Payload Inference

Events often share payload fields: the same customer ID, the same order ID, the same session token. When hook A and hook B consistently carry the same field values, they are processing data about the same business entities. That relationship is structurally meaningful for blast radius analysis — if the service that creates those entities goes down, both hooks will stop receiving events.

Payload inference surfaces semantic connections between hooks that happen to point at different targets.

Sequential Inference

Some webhook workflows are explicitly sequential. An order.created event triggers a fulfillment hook. The fulfillment hook's target creates a fulfillment.started event. The fulfillment.started event triggers a notification hook. The three hooks form a chain.

Sequential inference detects these chains by looking at event patterns over time. Breaking any link in the chain affects everything downstream. Sequential inference makes those chains visible.

The resulting graph has a maximum of 50,000 edges and is rebuilt continuously from live traffic with a 5-minute cache. The graph you see is never more than 5 minutes stale.


What the Graph Tells You

The Overview Tab

A force-directed graph of your webhook topology. Nodes are targets. Edges represent inferred dependencies. Node size corresponds to the number of hooks pointing at that target. Color indicates health state, using the same circuit breaker state information — green for closed, orange for half-open, red for open.

You can immediately see the shape of your dependency graph. A well-designed system looks like a relatively flat graph with many small nodes. A system that has grown organically looks like a few large central nodes with many edges radiating outward. Those large central nodes are your risk.

The SPOFs Tab

Single Points of Failure are targets that, if unavailable, would affect three or more hooks simultaneously. They are listed in descending order of betweenness centrality — a graph theory measure of how often a node appears on the shortest path between other nodes.

High betweenness centrality means a target is a traffic bottleneck. It is the node that, if removed, breaks the most connections in the graph.

Each SPOF entry shows:

  • How many hooks depend on it
  • Which hooks specifically
  • Its current circuit breaker state
  • Whether it has a shadow target configured (and therefore a fallback available)

The presence of a SPOF is not automatically a problem. Some targets are supposed to be central. The analysis tells you which ones are critical so you can make an informed decision: add shadow delivery to distribute the risk, decompose the service, or document the dependency explicitly so incident responders know what to look for.

The Dependencies Tab

A structured list of every inferred dependency relationship, sortable by confidence score, hook count, and last-seen timestamp. Confidence is derived from the frequency and consistency of the inference signal. A pair of hooks that have co-occurred in temporal proximity 10,000 times in the last 30 days has high confidence. A pair that co-occurred 12 times in the last 48 hours has low confidence.

Low-confidence dependencies are not hidden — they appear in the list with a confidence score. They may represent real relationships that are just new or rare. They may be coincidences. You can flag them or dismiss them.


The On-Call Engineer Scenario

It is 2:17am. PagerDuty fires. Eight monitors are alerting simultaneously. You are the on-call engineer. You open the dashboard.

Without blast radius analysis, you are doing triage from scratch. What are these eight things? Are they related? Which one is the root cause and which are cascades? You open eight browser tabs. You look at eight different services. You try to form a mental model of which ones share infrastructure.

That process takes 15 to 30 minutes on a calm day. At 2am after being woken up, with cortisol spiking and eight Slack messages arriving simultaneously, it takes longer and produces worse results.

With blast radius analysis, you open the Blast Radius page. The graph shows a single large red node — payments-core.internal.company.com — with eight hooks depending on it. That node has a circuit breaker in the open state. Everything downstream from it is failing because the root is down.

You now know the following in under 30 seconds:

  • The root cause is the payments-core service
  • All eight failures are cascades from that single failure
  • No other investigation is needed for the other seven
  • Your action is to restore payments-core, not to debug eight separate incidents

You page the payments team. You go back to sleep. The cascade resolves when payments-core recovers and the circuit breakers close.


The Pre-Incident Review Scenario

Your team is decomposing a monolith. The monolith currently serves as a webhook target for 23 different hooks. You are planning to split it into four domain services over the next quarter.

Before blast radius analysis, this kind of review required a meeting where engineers tried to reconstruct the dependency map from memory and documentation that was last updated eight months ago.

With blast radius analysis, you run the review against the actual traffic graph. You discover that 19 of the 23 hooks can be cleanly attributed to one of the four planned domain services. Four hooks are ambiguous — they carry payload fields that span multiple domains, which means they cannot be cleanly routed after the decomposition without webhook payload transformation logic.

You also discover that the monolith has a betweenness centrality score in the 99th percentile — it is the most critical node in your webhook graph by a large margin. This is expected, but it confirms the decomposition priority: until the monolith is split, every webhook incident is a potential multi-hook incident.

You have this information before the decomposition begins, not after a production incident reveals it.


Technical Details

The inference engine processes events from request_logs continuously. Edge weights are updated on every event. The graph is materialized in memory every 5 minutes and served from cache. Five API endpoints back the dashboard:

  • GET /api/v1/blast-radius/graph — full adjacency list with weights and node metadata
  • GET /api/v1/blast-radius/spofs — SPOF list with centrality scores
  • GET /api/v1/blast-radius/dependencies — dependency list with confidence scores
  • GET /api/v1/blast-radius/impact/:targetId — projected impact if a specific target fails
  • GET /api/v1/blast-radius/summary — aggregate statistics for the dashboard header

The graph maximum of 50,000 edges is per-tenant. Tenants with very high hook counts and complex traffic patterns will see the most critical edges retained when the limit is approached — centrality-weighted pruning, not arbitrary truncation.


Comparison

| Capability | HookTunnel | ngrok | Webhook.site | Hookdeck | Svix | |---|---|---|---|---|---| | Webhook dependency graph | Yes | No | No | No | No | | Automatic inference from traffic | Yes | No | No | No | No | | SPOF identification with centrality scoring | Yes | No | No | No | No | | Graph visualization dashboard | Yes | No | No | No | No | | Impact projection for specific targets | Yes | No | No | No | No | | Integration with circuit breaker state | Yes | No | No | No | No |


FAQ

How long does it take to build a useful graph?

The graph starts accumulating data immediately. After a few hundred events across your hooks, temporal and payload inference produce useful signal. After a week of normal traffic, the graph reflects your real dependency structure with high confidence. The graph is most useful for production systems with established traffic patterns.

What if my hooks are intentionally independent?

Independent hooks appear as disconnected nodes in the graph. This is useful information — it confirms your intended design. A perfectly flat graph with no edges means your hooks have no inferred dependencies, which is either correct or means they are so loosely coupled that the inference has no signal to detect.

Does the graph expose data from other tenants?

No. The graph is scoped entirely to your tenant. Payload inference operates on your event payloads only. No cross-tenant data is used in any inference computation.

Can I manually add or remove dependency edges?

Manual edge management is on the roadmap. Currently, the graph is entirely inferred. You can flag low-confidence edges as dismissed, which removes them from the display but does not affect the underlying inference. This is useful for known coincidental correlations.

How does this interact with shadow delivery?

The Dependencies tab shows whether each hook has a shadow target configured. SPOFs without shadow targets are highlighted — they are the highest-priority candidates for shadow delivery, because a single target failure produces maximum blast radius with no fallback.

See your webhook dependency graph before an incident forces you to

Blast radius analysis infers your dependency topology from real traffic — no manual mapping, no stale diagrams.

Get started free →