Heisenbug Hunting in Async .NET Systems

You know that feeling when a bug just... vanishes the moment you try to look at it? You fire up the debugger, step through carefully, and everything works perfectly. No exception. No race condition. No problem. Until you run it again in production.

That's a Heisenbug—a bug that changes its behavior (or disappears entirely) when you try to observe it. The name comes from Heisenberg's uncertainty principle, and if you've ever built async message-driven systems in .NET, you know exactly what I'm talking about.

I've shipped async systems built on Rebus, NServiceBus, and Wolverine that worked beautifully in staging and blew up spectacularly under production load. The problem isn't the frameworks—it's that async distributed systems fail in fundamentally different ways than synchronous code, and our debugging instincts from the sync world don't translate.

This article is about a practical methodology for hunting these bugs down, based on Preethi Viswanathan's whitepaper "A Heisenbug Hunting Toolkit". The framework is built around six phases using open-source tools, and I'm going to show you how it applies specifically to .NET async systems using Marten, Wolverine, and NBomber.

What you'll need: .NET 8+ with Marten 7.x, Wolverine 3.x+, NBomber 6.x (via NBomber.Http.CSharp), and WireMock.Net. The chaos engineering sections use Rancher Desktop and LitmusChaos.

Why Async Changes Everything

When you're debugging synchronous code, you can step through a debugger line by line and trust what you see. The order of operations is predictable. A web API request comes in, you process it, you return a response. If something breaks, you can reproduce it locally, throw a breakpoint in, and watch the failure happen.

Async message-driven systems break that mental model completely.

You've got multiple message handlers running concurrently. Messages get retried when handlers fail. They might land in error queues. Competing consumers pull from the same queue. Eventual consistency means different parts of your system see different states at different times. A timing window that only opens when five specific messages arrive within 200 milliseconds of each other isn't something you can step through in a debugger.

Let me give you a concrete example. Here's a Wolverine message handler for reserving inventory that looks perfectly reasonable:

using Marten;
using Wolverine;

namespace TicketingSystem;

public record ReserveInventory(string ItemId, int Quantity, string OrderId);

public class InventoryHandler
{
    public async Task Handle(ReserveInventory command, IDocumentSession session)
    {
        var item = await session.LoadAsync<InventoryItem>(command.ItemId);
        
        if (item == null)
            throw new InvalidOperationException($"Item {command.ItemId} not found");
        
        if (item.Available >= command.Quantity)
        {
            item.Available -= command.Quantity;
            session.Store(item);
        }
        else
        {
            throw new InvalidOperationException("Insufficient inventory");
        }
    }
}

public class InventoryItem : IVersioned
{
    public string Id { get; set; } = string.Empty;
    public string Name { get; set; } = string.Empty;
    public int Available { get; set; }
    public int Version { get; set; }
}

This code will pass your unit tests. It'll work great when you manually test it. It might even work fine in load testing if your load test doesn't create the right kind of concurrency.

But when you get a flash sale and 50 concurrent messages arrive for the same inventory item? You'll oversell. The race condition is right there: load the item, check availability, decrement, save. Between the load and the save, other handlers are doing the exact same thing with stale data.

The debugger won't show you this. Stepping through the code with a breakpoint changes the timing enough that the race condition disappears. That's the Heisenbug.

And here's what makes these bugs so insidious: you can run 10,000 concurrent requests in a standard load test and never see this race condition. A generic load test spreads requests across many endpoints and resources. The timing window where two handlers are both between load and save on the same document might be 5–10 milliseconds. Unless your load test is deliberately targeting that exact contention point with enough concurrency, the window never opens. Your tests pass. Your dashboard is green. And you ship a race condition to production.

The Six-Phase Framework

Viswanathan's whitepaper proposes a systematic approach to finding and fixing these bugs before they reach production. The core idea is to move from reactive debugging ("it broke in production, now what?") to proactive chaos testing ("let's break it in controlled ways before shipping").

Here's the framework:

Phase Tool Purpose
Predict MiroFish Identify high-risk service boundaries via swarm simulation
Stress NBomber Generate high-concurrency load to manufacture contention
Fuzz Bogus Stochastic edge-case data—extreme values, nulls, boundary conditions
Isolate WireMock Inject controlled latency and probabilistic failures into dependencies
Contain Rancher Desktop Local Kubernetes with CPU throttling and infrastructure-native faults
Break LitmusChaos Chaos injection—pod kills, network lag—to verify fixes hold under pressure

Each phase addresses a different failure mode. Predict helps you find where to look. Stress manufactures the concurrency needed to reproduce race conditions. Fuzz explores edge cases. Isolate lets you control timing. Contain adds infrastructure realism. Break validates that your fix actually works under chaos.

I'm going to focus on Phases 2, 4, and 6—Stress, Isolate, and Break—because those are the phases where .NET-specific tooling matters most and where I've gotten the most value in my own systems. Phases 1 (Predict), 3 (Fuzz), and 5 (Contain) are covered in the whitepaper.

Phase 2: Stress Testing with NBomber

You can't fix a race condition you can't reproduce. NBomber is a .NET load testing framework that lets you generate realistic concurrency patterns. Here's how you'd stress-test the inventory reservation endpoint:

using NBomber.CSharp;
using NBomber.Http.CSharp;

var httpFactory = HttpClientFactory.Create(
    name: "http_factory",
    initClient: () => new HttpClient { BaseAddress = new Uri("http://localhost:5000") }
);

var itemId = "concert-ticket-front-row";

var scenario = Scenario.Create("inventory_stress", async context =>
{
    var payload = new 
    { 
        ItemId = itemId, 
        Quantity = 1, 
        OrderId = Guid.NewGuid().ToString() 
    };
    
    var request = Http.CreateRequest("POST", "/reserve")
        .WithJsonBody(payload);
    
    var response = await Http.Send(httpFactory, request);
    
    return response.IsError ? Response.Fail() : Response.Ok();
})
.WithLoadSimulations(
    Simulation.Inject(rate: 100, interval: TimeSpan.FromSeconds(1), during: TimeSpan.FromMinutes(2))
);

NBomberRunner
    .RegisterScenarios(scenario)
    .Run();

This simulates 100 concurrent reservation attempts per second for two minutes, all targeting the same inventory item. If you've got a race condition, this will find it. You'll see inventory go negative, or more reservations succeed than you have inventory for.

The key is that you're manufacturing the exact contention pattern that happens in production during a flash sale, but in a controlled environment where you can observe and measure the failure.

Phase 4: Isolate with Controlled Latency

Sometimes the race condition only appears when there's latency in your dependencies. Maybe your Marten document store is on a slower connection in production. Maybe there's network jitter. You can inject that latency deliberately to widen the timing window.

For HTTP-based services, use WireMock to virtualize dependencies and control timing:

var server = WireMockServer.Start();

server
    .Given(Request.Create().WithPath("/inventory/*").UsingGet())
    .RespondWith(Response.Create()
        .WithStatusCode(200)
        .WithDelay(TimeSpan.FromMilliseconds(800))
        .WithBodyAsJson(new { id = "item1", available = 10 }));

Adding latency widens the timing window. A race condition that happens 0.3% of the time at normal speed might jump to 1.2% with 800ms of latency. That makes it much easier to observe and fix.

Note: WireMock virtualizes HTTP dependencies. If you need to inject latency into database calls or other non-HTTP connections, look at Toxiproxy or network-level throttling tools.

The Fix: Optimistic Concurrency

Now that we can reproduce the bug consistently, we can fix it. Marten supports optimistic concurrency control out of the box. The InventoryItem document now implements IVersioned, which enables automatic version tracking:

using Marten;

public class InventoryItem : IVersioned
{
    public string Id { get; set; } = string.Empty;
    public string Name { get; set; } = string.Empty;
    public int Available { get; set; }
    public int Version { get; set; }  // Marten tracks version here
}

public class InventoryHandler
{
    public async Task Handle(ReserveInventory command, IDocumentSession session)
    {
        var item = await session.LoadAsync<InventoryItem>(command.ItemId);
        
        if (item == null)
            throw new InvalidOperationException($"Item {command.ItemId} not found");
        
        if (item.Available >= command.Quantity)
        {
            item.Available -= command.Quantity;
            session.Store(item);
            // Marten automatically checks Version on save
            // ConcurrencyException thrown if version changed
        }
        else
        {
            throw new InvalidOperationException("Insufficient inventory");
        }
    }
}

With IVersioned enabled, Marten checks the document version on save. If another transaction modified the document between your load and your save, you get a ConcurrencyException and the transaction fails. Wolverine will automatically retry the message, loading fresh data and trying again.

Here's why this kills the race condition: remember, the bug was two handlers both reading Available = 10, both decrementing to 9, both saving. With optimistic concurrency, the second save sees that the version changed and throws instead of silently overwriting. The retry loads the updated value of 9, and the system behaves correctly.

An Even Better Solution: Wolverine Sagas

Optimistic concurrency works, but there's an architecturally superior approach that eliminates the race condition structurally rather than detecting it after the fact: Wolverine sagas.

A saga (also called a process manager) handles stateful workflows as event-driven processes. The key insight: instead of competing for a shared document, you make the saga instance itself the coordination boundary. Here's how it looks:

using Wolverine;
using Marten;

public class InventoryReservationSaga : Saga
{
    public string ItemId { get; set; } = string.Empty;
    
    // Wolverine serializes message processing per saga ID
    // All ReserveInventory commands for the same ItemId are processed sequentially
    public static string DetermineId(ReserveInventory command) => command.ItemId;
    
    public async Task<object> Handle(ReserveInventory command, IDocumentSession session)
    {
        var item = await session.LoadAsync<InventoryItem>(command.ItemId);
        
        if (item == null)
            throw new InvalidOperationException($"Item {command.ItemId} not found");
        
        if (item.Available >= command.Quantity)
        {
            item.Available -= command.Quantity;
            session.Store(item);
            await session.SaveChangesAsync();
            // Emit success event downstream via Wolverine's cascading messages
            return new InventoryReserved(command.ItemId, command.Quantity, command.OrderId);
        }
        else
        {
            // Clean rejection—no exception, no retry, just emit unavailable event
            return new InventoryUnavailable(command.ItemId, command.Quantity, command.OrderId);
        }
    }
}

public record InventoryReserved(string ItemId, int Quantity, string OrderId);
public record InventoryUnavailable(string ItemId, int Quantity, string OrderId);

Here's what makes this powerful: Wolverine serializes message processing for each unique saga ID. When you set the saga ID to command.ItemId, all ReserveInventory messages for the same inventory item are automatically processed one at a time. The race condition becomes structurally impossible—not because you're detecting conflicts or acquiring locks, but because the concurrent messages simply queue behind the active saga instance.

Why this beats distributed locking:

Under load, distributed locks become a bottleneck and actually make your problems worse. Here's what happens with locks:

  • Lock contention escalates: When 50 concurrent requests hit the same inventory item, 49 of them block waiting for the lock. Under sustained load, you get cascading timeouts as handlers pile up waiting.
  • Thundering herd on release: When the lock releases, all waiting handlers compete again. This creates spikes of contention instead of smooth throughput.
  • Timeout brittleness: If a handler holding the lock crashes or times out, you either hold the lock forever (deadlock) or release it too early (race condition).
  • No graceful degradation: Locks force serialization and blocking. Every handler pays the latency cost of waiting.

Sagas, by contrast, scale with load:

  • No blocking: Messages queue in Wolverine's durable inbox. They don't block threads—they wait their turn.
  • Natural backpressure: If inventory processing slows down, messages queue. The system stays responsive—you're not burning threads on blocked handlers.
  • Failure is clean: If a saga handler fails, Wolverine retries with fresh state. No risk of "stuck" locks.
  • Observability: The saga state is persisted in Marten, so you can inspect what's happening. Locks are invisible—you can't see who's waiting or why.

This is the async mindset shift I often talk about: don't coordinate around shared mutable state—redesign so the race condition can't happen structurally. Sagas make the ordering explicit in your architecture, not bolted on with locks after the fact.

Phase 6: Validation Under Chaos

You've applied the fix. The NBomber stress test now passes cleanly. Are you done?

Not yet. The fix needs to hold under real-world chaos: pods restarting, network partitions, CPU throttling. That's where LitmusChaos (or similar chaos engineering tools) comes in.

For .NET systems, you might use Rancher Desktop to run your Wolverine app in a local Kubernetes cluster and inject faults:

apiVersion: litmuschaos.io/v1alpha1
kind: ChaosEngine
metadata:
  name: inventory-chaos
spec:
  engineState: active
  experiments:
    - name: pod-delete
      spec:
        components:
          interval: 10s
          force: false

This deletes random pods every 10 seconds while your NBomber test runs. If your fix depends on in-memory state, this will expose it. If you've got proper distributed locking and message retries configured, your system should maintain 100% consistency even with continuous pod churn.

The TicketRush Case Study

The whitepaper walks through a case study—a fictional ticketing platform called TicketRush—that hits exactly the scenario we've been discussing: overselling inventory during flash sales. TicketRush isn't a .NET system, but the race conditions and fix strategies are identical to what we just walked through with Marten and Wolverine.

Here's what makes their approach brilliant: they didn't just run a load test and hope for the best. They intentionally widened the timing window by adding 800ms of latency to database calls and throttling CPU through Rancher Desktop. That single move took a race condition that failed 0.3% of the time and pushed it to 1.2% failure rate. Suddenly the bug wasn't a phantom—it was reproducible.

Then they fixed it with architectural changes—saga-based coordination instead of hoping optimistic concurrency would be fast enough—and validated it under sustained chaos. Pod deletions every 10 seconds. 30-minute test run. 100% consistency maintained. That's not luck. That's evidence.

Metrics That Matter

You can run all the chaos tests you want, but if you're not measuring whether they're actually working, you're just building elaborate tooling.

There are five metrics that actually predict whether your async system will hold up in production. Most teams don't measure any of them:

  1. Mean Time to Reproduce (MTTR_reproduce): How long does it take to reproduce an intermittent bug? The goal is minutes, not weeks.

  2. Heisenbug Escape Rate: How many intermittent failures reach production each quarter? Track this to see if your chaos testing is catching them earlier.

  3. Chaos Test Coverage: What percentage of your service boundaries have active chaos experiments running? If you're not testing a boundary under chaos, you're trusting luck.

  4. Concurrent Load Test Coverage: What percentage of your message handlers and endpoints are tested under realistic concurrency? If the answer is "none," you're going to have a bad time in production.

  5. Fix Verification Rate: When you fix a race condition, do you validate the fix under chaos, or just run your existing tests and ship it? The difference matters.

These aren't vanity metrics. They're leading indicators of whether your async system will survive production load.

The Mindset Shift

Here's the thing I've learned the hard way: the biggest challenge with async message-driven systems isn't technical. It's mental.

If you approach async distributed systems with synchronous debugging intuitions, you're going to build systems that look great in development and fail mysteriously in production. You'll add logs and try to trace execution order, but the logs won't help because the timing changes when you add logging. You'll run load tests, but they won't find the race conditions because you're not generating the right concurrency patterns.

The chaos-first philosophy isn't just a testing strategy—it's an operating principle. Async systems will fail in ways you didn't anticipate. You don't build confidence by hoping your tests are good enough. You build it by deliberately breaking things in a controlled environment before production does it for you. When you're running distributed systems at scale, this isn't optional.

That's what this framework gives you: a systematic way to manufacture the chaos that reveals the bugs, fix them with evidence rather than guesswork, and validate that your fixes actually work under pressure.

When you adopt tools like Wolverine or NServiceBus, you're not just adopting a messaging framework. You're adopting a completely different failure model. The sooner you adjust your testing and debugging approach to match that reality, the fewer 3 AM pages you'll get.

The whitepaper is a practical guide to making that shift. I highly recommend reading the full paper at https://zenodo.org/records/19390360 — especially if you're responsible for async .NET systems in production.

Because the next Heisenbug is already in your code. The question is whether you'll find it in your chaos test environment or in production.

You May Also Like


Disposable Code from the Architect's Perspective

disposable-code.png
Brad Jolicoeur - 04/07/2026
Read

Using Copilot Squad with Copilot CLI for Building .NET Web Applications

copilot-squad.png
Brad Jolicoeur - 04/06/2026
Read

Leveling Up Local Dev with .NET Aspire & AI

aspire-with-copilot.png
Brad Jolicoeur - 03/22/2026
Read