Why Your Safety Net Is Dropping Messages

Why Your Safety Net Is Dropping Messages
by Brad Jolicoeur
02/28/2026

The try/catch That's Quietly Deleting Your Data

There's a moment that happens on nearly every team that adopts Rebus. Errors show up in the logs. A few transactions get dropped. An engineer—experienced, conscientious, doing exactly what worked in their last ten projects—adds a try/catch to the handler. The errors in the log go quiet. The dropped transactions continue. Everyone assumes the problem is solved.

It isn't. It got worse.

If your team is skeptical of Rebus's retry approach, or if you've been burned by messages disappearing without explanation, this article is for you. We're going to walk through why letting exceptions bubble up is not an antipattern—it's the framework contract—and show you how to build genuinely resilient handlers without reintroducing the bugs you were trying to fix.

The code examples here apply to Rebus, but as we'll show, the same contract exists in MassTransit and NServiceBus. This isn't a Rebus quirk. It's the distributed messaging model.


First: Understanding the Middleware Pipeline

Before discussing what to do or not do, you need a mental model of how Rebus actually executes your handler. Without this, the "bubble up" guidance sounds arbitrary.

Rebus operates as an onion pipeline. Your handler is the innermost layer—the core. Wrapped around it are several middleware layers that execute before and after your code runs. One of those outer layers is the Retry Strategy.

Here's the execution order when a message arrives:

Transport Layer          → Pulls the message from the queue (RabbitMQ / Azure Service Bus)
      ↓
Retry Middleware         → Wraps execution in its own try/catch
      ↓
[Other Middleware]       → Logging, unit of work, serialization, etc.
      ↓
Your Handler             → Your code runs here
      ↑
Retry Middleware         → If no exception: tells transport to ACK (delete) the message
                        → If exception: increments retry count, schedules next attempt

This is key: the Retry Middleware is already catching exceptions for you. It knows whether your handler succeeded or failed based on one thing—whether an exception propagated back up to it.

When your handler completes without throwing, the retry middleware signals the transport to acknowledge and delete the message from the queue. Success.

When your handler throws, the retry middleware catches it, keeps the message on the queue, and schedules a retry.

Now here's the problem.


The Trapdoor: What Happens When You Catch Inside the Handler

When you put a try/catch inside your handler and do not re-throw the exception, your handler returns normally. No exception reaches the retry middleware. As far as the framework is concerned, your handler succeeded.

The retry middleware dutifully signals the broker: delete this message.

The message is gone. Permanently.

// ❌ PATTERN TO AVOID: The "Silent Drop"
// This is the most dangerous pattern in Rebus.
public async Task Handle(OrderMessage message)
{
    try
    {
        await _orderRepository.SaveAsync(message);
    }
    catch (Exception ex)
    {
        // You log it. You feel safe. The message is already deleted.
        // There is no retry. There is no recovery. The data is gone.
        _logger.LogError(ex, "Failed to save order {OrderId}", message.OrderId);
    }
}

The log shows an error. The queue shows a success. The database has nothing. The engineer who wrote this code sees "ERROR" in Splunk and thinks: "Good thing I caught that." Meanwhile, the order never processed, and no one knows how to recover it.

This is what we mean when we call it a trapdoor. It looks like a floor. It isn't.


Why Engineers Reach for try/catch (And Why Those Reasons Don't Apply Here)

There are three legitimate instincts driving this pattern. None of them apply in a messaging context.

"I need to log the error before it disappears."

In a console application or a one-shot script, this is true. If you don't log it, it's gone. In Rebus, the framework logs it for you—including the full stack trace, the message payload headers, the delivery count, and the source queue. If you're using Fleet Manager or a similar tool, failed messages are queryable and replayable. Your manual log is redundant at best, misleading at worst.

"I need to handle specific exceptions differently."

This is a valid need. We'll cover how to handle it safely in the patterns section. Hint: you can still catch—you just need to throw afterward.

"I need to prevent the handler from crashing."

Rebus handlers don't "crash" the application on unhandled exceptions. The exception is caught by the retry middleware, the message is retained, and the worker moves on to the next message. Letting the exception bubble up is exactly what keeps the application running and preserves the data.


"But Rebus Is the Only Framework That Does This"

It isn't. Every major .NET service bus uses the same contract: your handler must signal failure through an unhandled exception. Swallowing exceptions breaks them all.

Framework How Retries Are Triggered What Happens if You Swallow the Exception
Rebus Unhandled exception bubbles to SimpleRetryStrategy middleware Message is ACKed and deleted
MassTransit Unhandled exception bubbles to UseMessageRetry middleware Message is ACKed and deleted
NServiceBus Unhandled exception bubbles to Recoverability pipeline Message is ACKed and deleted

The "bubble up" contract isn't a Rebus design choice you can debate. It's the fundamental model for how all these frameworks deliver durability guarantees. The framework cannot protect a message it doesn't know is in trouble.

If your team has experience with MassTransit or NServiceBus, point to this table. The critique of Rebus is actually a critique of the entire category.


Patterns to Avoid

❌ The Silent Drop

Already shown above. The most dangerous pattern. Logs the error, deletes the message, no recovery possible.

// ❌ AVOID
public async Task Handle(OrderMessage message)
{
    try { await _repository.SaveAsync(message); }
    catch (Exception ex)
    {
        _logger.LogError(ex, "Error"); // Message silently deleted after this
    }
}

❌ The Manual Retry Loop

This pattern is an attempt to handle retries "manually" inside the handler. It blocks the thread while looping, prevents Rebus from processing other messages during that time, and still fails to preserve the message if all retries are exhausted.

// ❌ AVOID
public async Task Handle(OrderMessage message)
{
    for (int i = 0; i < 5; i++)
    {
        try
        {
            await _repository.SaveAsync(message);
            return; // "Success," message gets deleted
        }
        catch (Exception ex)
        {
            if (i == 4) _logger.LogError(ex, "Gave up"); // Message deleted, no recovery
            await Task.Delay(1000);
        }
    }
}

❌ The Manual Dead Letter Table

Some engineers respond to dropped messages by building their own "bad message" table or queue. This creates a parallel, non-standard observability channel that doesn't integrate with any tooling and is quickly forgotten.

// ❌ AVOID
public async Task Handle(OrderMessage message)
{
    try { await _repository.SaveAsync(message); }
    catch (Exception ex)
    {
        // Builds a homemade dead-letter mechanism that no one monitors
        await _deadLetterTable.InsertAsync(message, ex.Message);
    }
}

❌ The Stack Trace Destroyer

If your team insists on catching and re-throwing, make sure they know the difference between throw; and throw ex;. Using throw ex; resets the stack trace origin to the throw statement itself, wiping out the line number where the actual failure occurred.

// ❌ AVOID: Destroys the original stack trace
catch (Exception ex)
{
    _logger.LogError(ex, "Caught something");
    throw ex; // Stack trace now points to THIS line, not where it actually failed
}

Patterns to Embrace

✅ Let It Fail (The Default)

The simplest, cleanest pattern. No try/catch needed. If _repository.SaveAsync throws, Rebus catches it, retains the message, and retries automatically. The full stack trace, message payload, and retry count are captured by the framework.

// ✅ PREFER: Clean handler, full retry protection
public async Task Handle(OrderMessage message)
{
    await _repository.SaveAsync(message);
}

✅ Logging Scopes for Context (Without the Catch)

The most common reason engineers want a try/catch is to add business context to the log—"I want to know which OrderId failed." You can do this without catching anything using ILogger.BeginScope. The scope attaches properties to every log entry generated within the block, including the ones Rebus generates automatically on failure.

// ✅ PREFER: Context without swallowing the exception
public async Task Handle(OrderMessage message)
{
    using (_logger.BeginScope(new Dictionary<string, object>
    {
        ["OrderId"] = message.OrderId,
        ["CustomerId"] = message.CustomerId
    }))
    {
        // If this fails, the Rebus error log automatically includes
        // OrderId and CustomerId in its metadata. No catch needed.
        await _repository.SaveAsync(message);
        await _eventBus.PublishAsync(new OrderSavedEvent(message.OrderId));
    }
}

If you're using Serilog, LogContext.PushProperty accomplishes the same thing and may feel more natural:

// ✅ PREFER (Serilog variant)
public async Task Handle(OrderMessage message)
{
    using (LogContext.PushProperty("OrderId", message.OrderId))
    using (LogContext.PushProperty("CustomerId", message.CustomerId))
    {
        await _repository.SaveAsync(message);
    }
}

✅ Catch-and-Rethrow for Wrapping Context

If you need to add context to the exception itself—not just the log—you can catch, wrap, and rethrow. Use throw; (not throw ex;) to preserve the original stack trace.

// ✅ ACCEPTABLE: Wrap with context, preserve stack trace
public async Task Handle(OrderMessage message)
{
    try
    {
        await _repository.SaveAsync(message);
    }
    catch (Exception ex)
    {
        // Adds domain context to the exception without destroying the trace
        throw new OrderProcessingException(message.OrderId, "Failed to persist order", ex);
    }
}

✅ Exception Filtering for Specific Types

When you genuinely need different behavior for different exception types—such as skipping retries for validation errors—filter by type and let everything else bubble up untouched.

// ✅ ACCEPTABLE: Type-specific handling with explicit re-throw for unknowns
public async Task Handle(OrderMessage message)
{
    try
    {
        await _orderService.ProcessAsync(message);
    }
    catch (ValidationException ex)
    {
        // Validation errors are not retriable. Log and discard intentionally.
        _logger.LogWarning(ex, "Order {OrderId} failed validation, discarding", message.OrderId);
        // No throw—this is an explicit, intentional discard for a known bad message
    }
    catch (Exception)
    {
        throw; // Everything else: let Rebus handle it
    }
}

Advanced Recovery: Configuring the Safety Net

Now that your handlers are correctly signaling failures, you can configure Rebus to respond to those failures intelligently.

Delayed and Second-Level Retries

The default retry behavior in Rebus fires immediate retry attempts as fast as possible. If a message rolls back, the broker immediately redelivers it. For transient issues like a momentary database lock, this rapid fire might be fine. But for scenarios where you need to give the infrastructure time to recover—like a database timeout or a rate-limited API—you need to introduce a delay.

In Rebus, configuring an actual delay (like exponential backoff) between retries is typically done using Second-Level Retries (SLR).

(Note: Don't confuse this with o.SetBackoffTimes(...) in the Rebus configuration—that setting controls the transport's polling behavior when the queue is idle, not retry delays!)

With SLR enabled, when a message exhausts its immediate rapid-fire retry attempts, instead of going directly to the error queue, Rebus re-dispatches it as an IFailed<TMessage>. Your handler can then decide to defer it for a period of time, dead-letter it intentionally, or apply custom logic like an escalating mathematical delay sequence.

This is an important distinction from what you might expect: SLR is not just a configuration option—it requires you to write a handler that implements IHandleMessages<IFailed<TMessage>>.

Step 1: Enable SLR in configuration

Configure.With(activator)
    .Options(o =>
    {
        o.RetryStrategy(
            maxDeliveryAttempts: 5,       // Immediate retries before promoting to SLR
            secondLevelRetriesEnabled: true
        );
    })
    .Start();

Step 2: Write the IFailed handler in your message handler class

public class OrderHandler : IHandleMessages<OrderMessage>,
                            IHandleMessages<IFailed<OrderMessage>>
{
    readonly IBus _bus;
    readonly ILogger<OrderHandler> _logger;

    public OrderHandler(IBus bus, ILogger<OrderHandler> logger)
    {
        _bus = bus;
        _logger = logger;
    }

    // Normal handler — let exceptions bubble up
    public async Task Handle(OrderMessage message)
    {
        await _orderRepository.SaveAsync(message);
    }

    // SLR handler — called after normal retries are exhausted
    public async Task Handle(IFailed<OrderMessage> failedMessage)
    {
        const int maxDeferCount = 5;

        // Rebus automatically tracks how many times this message has been deferred
        var deferCount = Convert.ToInt32(
            failedMessage.Headers.GetValueOrDefault(Headers.DeferCount));

        if (deferCount >= maxDeferCount)
        {
            // Exceeded SLR attempts — dead-letter with context
            await _bus.Advanced.TransportMessage.Deadletter(
                $"Failed after {deferCount} deferrals\n\n{failedMessage.ErrorDescription}");

            _logger.LogError(
                "Order message dead-lettered after {DeferCount} second-level attempts. Error: {Error}",
                deferCount, failedMessage.ErrorDescription);
            return;
        }

        // Defer the transport message for 30 seconds and try again
        // This preserves all original headers including message ID
        await _bus.Advanced.TransportMessage.Defer(TimeSpan.FromSeconds(30));

        _logger.LogWarning(
            "Order message deferred (attempt {DeferCount} of {Max}). Retrying in 30s.",
            deferCount + 1, maxDeferCount);
    }
}

The Headers.DeferCount value (rbs2-defer-count) is automatically maintained by Rebus, incrementing to 1, 2, 3... with each deferral. The example above gives up after 5 deferrals and dead-letters the message with an explanation—rather than silently dropping it.

Note that _bus.Advanced.TransportMessage.Defer is used here rather than the standard _bus.Defer. The transport message API preserves all original headers (including the message ID), which is important for traceability.

The threshold question: do you actually need this?

Before reaching for SLR, ask: does the failure scenario require a delay, or is immediate rapid-fire retry enough? With 5 immediate retries, you cover very brief transient failures (like momentary locks).

SLR becomes the right tool when you need a time delay between attempts, or when the recovery window is measured in minutes instead of instant fractions of a second—and when losing the message to the error queue during that window is unacceptable. Concrete examples:

Scenario Use SLR? Why
Momentary network hiccup (< 1s) ❌ No Immediate retries handle this
Database connection pool exhaustion ✅ Yes Needs a delay between attempts to avoid hammering the database
Scheduled maintenance window (5–30 min) ✅ Yes Immediate retries can't bridge this gap
Downstream service deployment rolling restart ✅ Yes Minutes of unavailability, data must not be lost
Third-party API rate limit with 1-minute reset ✅ Yes Needs to pause and retry after the window resets
Persistent bad data / validation failure ❌ Never No amount of waiting fixes bad data; discard intentionally

If your team is considering SLR primarily because "the error queue makes us nervous," that's a sign the real fix is operator confidence—understanding that the error queue is a holding pen, not a graveyard, and that messages can be replayed once the root cause is resolved. SLR should be a deliberate architectural decision for known infrastructure patterns, not a default safety blanket.

Circuit Breaker

The circuit breaker pattern addresses a different problem: "What if the database is down for an hour? I don't want a thousand messages failing and clogging the error queue."

A circuit breaker monitors the failure rate and, when a threshold is exceeded, stops pulling new messages entirely—preventing a flood of failures during a known outage.

Rebus supports this via the Rebus.CircuitBreaker package:

Configure.With(activator)
    .Options(o =>
    {
        o.EnableCircuitBreaker(c =>
        {
            // Open the circuit if we see 10 SqlExceptions within 60 seconds
            c.OpenOn<SqlException>(
                trackingPeriodInSeconds: 60,
                attempts: 10
            );

            // Also trip on general infrastructure exceptions
            c.OpenOn<TimeoutException>(
                trackingPeriodInSeconds: 30,
                attempts: 5
            );

            // After tripping, wait 30 seconds before trying a single message (half-open state)
            c.SetHalfOpenPeriod(30);
        });
    })
    .Start();

When to use a circuit breaker: Extended infrastructure outages where continued processing would only generate noise and fill error queues. Also valuable when your downstream systems have rate limits or cost implications for failed calls.

Important: If you implement a Polly circuit breaker inside your handler (instead of at the Rebus configuration level), the BrokenCircuitException must still bubble up to Rebus. Catching and swallowing BrokenCircuitException is the same trapdoor problem—Rebus sees success and deletes the message.

// ✅ Polly circuit breaker inside handler - exception must still propagate
public async Task Handle(OrderMessage message)
{
    // If the circuit is open, Polly throws BrokenCircuitException.
    // Let it bubble up. Rebus will retain and retry the message.
    await _circuitBreaker.ExecuteAsync(() => _externalService.CallAsync(message));
}

Putting It Together: A Decision Guide

Scenario Tool
Transient blips (sub-second) Immediate retries (default)
Transient failures lasting seconds or minutes Second-level retries (SLR) with deferral
Extended outages, flood prevention Circuit breaker
Non-retriable bad data Intentional discard (explicit, documented)

The Observability Payoff

When exceptions bubble up correctly, the framework produces something far more valuable than a log entry—it produces a replayable artifact.

Each failed message in the error queue carries metadata that no manual log can match:

Header Value
rbs2-exception Full stack trace
rbs2-error-details Reason the last attempt failed
rbs2-source-queue Origin queue for tracing
rbs2-delivery-count Number of attempts made

If you're using Rebus Fleet Manager, this metadata powers a UI where you can search failed messages by exception type, read the formatted stack trace, and replay messages directly to the source queue—after you've fixed the root cause.

A swallowed exception produces a log line. An unhandled exception produces a recoverable message. Only one of those lets you fix the problem and recover the data.


Summary

The try/catch instinct is deeply ingrained, and it's the right instinct in most programming contexts. Distributed messaging is the exception. In Rebus—and in MassTransit and NServiceBus—the exception is the signal. It's the mechanism by which the framework knows your handler needs help.

Catching and swallowing that signal doesn't protect your data. It destroys it while giving you the appearance of safety.

The safer path is counterintuitive until you internalize the middleware model:

  • Don't catch to prevent errors. The framework catches them for you.
  • Don't catch to add log context. Use logging scopes instead.
  • Do catch when you need to filter exception types or wrap with domain context—but always re-throw.
  • Configure SLR for transient failures strictly requiring time delays or bridged infrastructure outages.
  • Configure circuit breakers to prevent flood scenarios.

Once the exceptions are flowing correctly, the retry infrastructure becomes your most powerful reliability tool—and every failed message becomes something you can monitor, investigate, and recover without writing a line of custom recovery code.


Resources

You May Also Like


Scaling Out Wolverine: What I Learned Coming from Rebus and NServiceBus

scale-wolverine.png
Brad Jolicoeur - 04/12/2026
Read

Do You Still Need Wolverine When AI Can Write the Code?

wolverine-with-ai.png
Brad Jolicoeur - 04/12/2026
Read

Architecting for Concurrency: Wolverine's Approach to Shared Resources

data-stream-conflict.png
Brad Jolicoeur - 04/09/2026
Read