SQS FIFO Exactly-Once Processing: What It Means for Webhooks

"Exactly-once delivery" is the most abused phrase in distributed systems marketing. It sounds like a hard guarantee. In practice, most systems that claim it mean one of three things: "we try hard not to send duplicates," "we deduplicate within a short window with some caveats," or "exactly-once in a narrow technical sense that requires specific conditions your production environment may not meet."

SQS FIFO is precise. It gives you exactly-once processing within a 5-minute deduplication window, per deduplication ID. That is a bounded, explicit guarantee with documented conditions. It is not "we try hard." It is a mechanism with defined inputs and defined outputs. That precision is rare, and it is worth understanding exactly what it provides — and what lies outside the boundary.

The Distributed Systems Problem

Here is why exactly-once is hard. A webhook provider sends an event. Your server receives it and begins processing. Partway through — after the database write, before the 200 response — your process crashes. Did the provider receive the timeout? It will retry. You will process the same event twice. Your database row now has inconsistent state, or a duplicate order was created, or a charge was attempted twice. For a complete treatment of how this plays out with Stripe specifically, see Stripe duplicate webhook events.

The standard fix is idempotency in your handler: check if you have already processed this event ID, skip if yes. That covers the application layer. But what covers the race condition where two concurrent invocations both pass the idempotency check before either commits the dedup record? A database unique constraint on the event ID. Now you have two layers of defense.

This is the correct architecture and most engineers eventually build it. SQS FIFO wraps a version of this deduplication into the queue itself, before the event reaches your application code. You do not have to build the dedup mechanism — you supply a dedup ID, and SQS guarantees the second message with the same ID does not exist in the queue within the 5-minute window.

What SQS FIFO Actually Provides

Two properties worth separating clearly. The AWS SQS documentation covers both in detail, but the practical implications for webhook workloads deserve examination.

Exactly-once processing within a 5-minute deduplication window. When you send a message with a MessageDeduplicationId, any subsequent message with the same ID sent within 5 minutes of the original is silently discarded. Not rejected with an error. Silently dropped. From the consumer's perspective, only one message was ever in the queue.

Content-based deduplication is the alternative: you enable it on the queue, and SQS computes a SHA-256 hash of the message body to use as the deduplication ID automatically. Useful when the message body itself is the canonical form of the event and you do not have a natural ID to supply.

Strict ordering per message group. Messages with the same MessageGroupId are delivered in the order they were sent, and only one message in a group is processed at a time. Different message groups are processed in parallel. This is the right primitive for webhook workloads where order matters within a customer (subscription state machine transitions) but not across customers.

import { SQSClient, SendMessageCommand } from "@aws-sdk/client-sqs";

const client = new SQSClient({ region: "us-east-1" });

async function enqueueWebhookEvent(event: WebhookEvent): Promise<void> {
  await client.send(new SendMessageCommand({
    QueueUrl: process.env.FIFO_QUEUE_URL!,
    MessageBody: JSON.stringify(event),

    // Order events per customer — subscription state transitions stay sequential
    MessageGroupId: event.customerId,

    // Deduplicate by Stripe's idempotency key — survives their 24h retry window
    // (Note: SQS dedup window is 5 minutes; see below for what this does not cover)
    MessageDeduplicationId: event.stripeEventId,
  }));
}

The throughput ceiling for FIFO queues is 3,000 messages per second with batching. For most webhook workloads this is not a constraint. If you hit it, you are probably in a category where you need a different architecture anyway.

SQS FIFO vs RabbitMQ: What Each Actually Wins

RabbitMQ is a serious message broker. It has been in production since 2007. The core model — producers publish to exchanges, exchanges route to queues, consumers ack — is elegant and flexible.

RabbitMQ's strengths:

At-least-once delivery with explicit consumer acknowledgment. A message stays in the queue until the consumer sends an ack. If the consumer crashes, the message is requeued. You control the ack behavior — you can nack with or without requeue, implement selective processing, route to dead letter exchanges on repeated failure.

Full exchange routing: direct, fanout, topic, headers. A single message can be routed to multiple queues based on routing keys. This is flexible and powerful for complex messaging topologies.

You run it. That is a strength for teams that want broker-level control — custom plugins, specific clustering configurations, strict latency requirements. And it is the primary operational cost.

SQS FIFO's strengths:

Managed infrastructure at AWS scale. No cluster to operate. No memory pressure to monitor. No queue depth alarms to tune. AWS's published 99.9% SLA covers the queue availability, and the retry infrastructure is external to your application code.

The exactly-once deduplication primitive. RabbitMQ gives you at-least-once with acks. If you need deduplication, you build it in your consumer. With SQS FIFO, the deduplication is in the queue — the second message with the same ID does not reach your consumer.

The honest split: RabbitMQ wins when you need broker control — custom routing logic, specific exchange topologies, on-premises deployment, the flexibility of the AMQP protocol. SQS FIFO wins when you need managed exactly-once semantics at AWS scale without building or operating the dedup infrastructure. For teams fully committed to AWS who are not hitting the FIFO throughput ceiling, SQS FIFO is the lower-friction path.

Neither replaces the other. Teams with sophisticated messaging needs on-premises run RabbitMQ. Teams building new AWS-native services reach for SQS first.

What the 5-Minute Window Does Not Cover

Here is the part that vendor documentation buries. SQS FIFO's exactly-once deduplication window is 5 minutes. Webhook providers do not limit their retries to 5 minutes. The 5-minute window is the most dangerous gap in SQS-based webhook pipelines.

Stripe retries failed webhook deliveries for up to 72 hours on an exponential backoff schedule. Shopify retries for 48 hours. GitHub retries for up to 72 hours. See Stripe's webhook best practices for their official guidance on handling retries. If your handler is down for 10 minutes, you may receive duplicate events 20 minutes later — outside the SQS FIFO deduplication window. The queue cannot help you at that point. The dedup has expired.

This is not a flaw in SQS FIFO's design. The 5-minute window is appropriate for transient duplicates that occur at near-simultaneous delivery. It is not designed to cover the 24-hour retry gap from an upstream provider.

Shopify's event ID: Shopify sends an X-Shopify-Webhook-Id header with every delivery. That header value is stable across retries of the same event. If your receiver captures that header and stores it in your processed-events table, you can deduplicate at any point in time — hours or days later — by checking that table before processing.

Stripe's idempotency key: Stripe events have an id field — a string like evt_3OxCQX2eZvKYlo2C1xBzp7Lk. This ID is stable. If you use it as your SQS FIFO deduplication ID, you get protection within the 5-minute window. For protection against 72-hour retries, you need an application-level processed-events store keyed on this ID.

The full idempotency stack for production webhook processing has three layers:

// Layer 1: SQS FIFO deduplication (5-minute window)
await sqsClient.send(new SendMessageCommand({
  QueueUrl: process.env.FIFO_QUEUE_URL!,
  MessageBody: JSON.stringify(event),
  MessageGroupId: event.customerId,
  MessageDeduplicationId: event.providerEventId, // stable across provider retries
}));

// Layer 2: Application-level check (any time horizon)
async function processEvent(event: WebhookEvent): Promise<void> {
  const alreadyProcessed = await db.query(
    'SELECT 1 FROM processed_events WHERE event_id = $1',
    [event.providerEventId]
  );
  if (alreadyProcessed.rows.length > 0) return;

  await db.transaction(async (trx) => {
    // Layer 3: DB unique constraint (concurrency safety)
    await trx.query(
      'INSERT INTO processed_events (event_id, processed_at) VALUES ($1, NOW())',
      [event.providerEventId]
      // Throws unique violation if concurrent execution beats us here
    );

    // Business logic — idempotent upserts, not blind inserts
    await trx.query(`
      INSERT INTO orders (stripe_event_id, customer_id, amount, status)
      VALUES ($1, $2, $3, 'confirmed')
      ON CONFLICT (stripe_event_id) DO UPDATE
        SET status = 'confirmed', updated_at = NOW()
    `, [event.providerEventId, event.customerId, event.amount]);
  });
}

The unique constraint on processed_events.event_id is the floor. The application check is a performance optimization (avoid the failed insert). SQS FIFO deduplication catches the first wave of duplicates before they reach your consumer. All three layers together are what "exactly-once" actually looks like in production.

Where HookTunnel Sits in This Picture

SQS FIFO deduplication requires a stable MessageDeduplicationId. For that ID to be stable, you need to capture it from the inbound HTTP request — from the X-Shopify-Webhook-Id header, the Stripe event id field, or whatever stable identifier your provider uses. See HookTunnel's webhook inspection features for how boundary capture works in practice, or review our webhook debugging checklist for the full diagnostic workflow.

If your receiver Lambda does not capture that identifier before enqueuing, you cannot use content-based deduplication effectively (because the message body often differs between retries due to timing fields). You need the inbound HTTP context.

HookTunnel captures the complete inbound HTTP request — every header, the exact body bytes, the delivery timestamp. When you replay an event from HookTunnel's history, it resends the original HTTP request with the original headers intact. Your receiver Lambda sees the same X-Shopify-Webhook-Id header it saw on the first delivery. The dedup ID is consistent. The SQS FIFO guarantee applies.

HookTunnel also implements Shopify event ID deduplication at the HTTP boundary itself. Events with a previously-seen X-Shopify-Webhook-Id are flagged in the capture history. You can inspect which events were duplicates, when they arrived, and how they differed from the original — before they ever reach your queue.

For the 72-hour retry problem — the one SQS FIFO cannot help with — HookTunnel's capture history becomes the forensics layer. When you get a duplicate Stripe event 36 hours after the original, HookTunnel's history shows you the original delivery, the processing status, and whether a receipt was confirmed. You can determine immediately whether the duplicate is safe to process or needs investigation.

The architecture with all layers in place:

Stripe / Shopify / Provider
         │
         ▼
  HookTunnel (captures raw HTTP, deduplicates at boundary by provider event ID)
         │
         ▼
  Receiver Lambda (extracts event ID, enqueues with MessageDeduplicationId)
         │
         ▼
  SQS FIFO (deduplicates within 5-minute window, orders per MessageGroupId)
         │
         ▼
  Worker Lambda (application-level idempotency check + DB unique constraint)
         │
         ▼
  processed_events table (permanent dedup record)

Each layer covers a different failure mode. HookTunnel covers the boundary capture and long-horizon duplicate detection. SQS FIFO covers the near-simultaneous duplicate window. Application idempotency covers the rest.

The Honest Assessment

SQS FIFO's exactly-once guarantees are some of the most honest in the industry — they document the window, the conditions, and the ceiling without overpromising. The deduplication window is documented. The conditions are specified. The throughput ceiling is published. AWS does not claim this covers your 72-hour provider retry window — it covers the 5-minute window around initial delivery, which is where most transient duplicates occur.

RabbitMQ is a genuinely excellent broker for teams who need control over their messaging infrastructure. The at-least-once model is well-understood, and the exchange routing flexibility is real. But you operate it. If you are not in a position to own that operational complexity, SQS FIFO is the managed path.

Neither SQS FIFO nor RabbitMQ captures the raw HTTP request at the inbound boundary. That is a different layer — the forensics layer, the replay layer, the "what exactly did Stripe send me at 2:47 AM when my handler was down" layer. For that, there is a new tool that has entered the neighborhood. It runs at our flat $19/month Pro plan, stores everything that arrives, and replays on demand. If you are evaluating webhook tools more broadly, start with the webhook vendor evaluation checklist.

The duplicate problem starts before your queue. Make your own call about which layers you need — but do not leave the boundary unobserved.

How SQS FIFO Enables Exactly-Once Webhook Processing