A word from this week's sponsor
I found a free resource from Datadog that might be useful if you're building, deploying, or operating modern applications.
This week's recommendation: Developer Toolkit for the AI Era
Inside, you'll find practical guidance on cutting CI run times, eliminating flaky tests, and shipping with more confidence as AI tools change how your team writes and deploys code.
Get it here for free →Want to reach thousands of .NET developers like this?
Sponsor TheCodeMan →Keywords: .NET capacity planning, ASP.NET Core performance, load testing, k6, p95 latency, RPS, connection pool, thread pool starvation, scaling thresholds, SLO
Picture the release call. Someone from product asks, "Can the API handle Black Friday?" and the room goes quiet. Then a senior engineer says something like "should be fine, we're on bigger instances now" and everyone moves on. That sentence is a guess wearing a confident voice, and three weeks later it turns into a 2 a.m. incident.
Capacity planning is what you do so that question has a real answer. Not a feeling, not a vibe about instance sizes - a number you measured, with the conditions it was measured under written next to it.
I'll walk through how I actually do this for ASP.NET Core APIs: the metrics that tell the truth, a baseline process you can repeat, the thresholds I reach for, and a small lab project you can run on your own machine to see degradation happen in real time. Everything here is backed by code in the samples/production-scaling-lab folder, so you're not taking my word for any of it.
Plenty of capacity plans are just two graphs: CPU and memory. The server has headroom on both, so the conclusion is "we're fine." Then traffic spikes and the API falls over while CPU sits at 40%.
The reason is that the things that actually break first rarely show up as CPU pressure:
So the question capacity planning answers isn't "do we have spare CPU." It's narrower and more useful:
How much traffic can this API take before the user experience starts to degrade?
Everything else is in service of answering that.
The single biggest mistake I see is tracking average latency and calling it a day. Averages hide the people who are actually suffering. If your average is 80ms but your p99 is 4 seconds, a real slice of your users is having a miserable time and your dashboard is lying to your face about it.
Here's what I track when planning capacity, roughly in order of how often they catch the real problem:
If you only add one thing to your current dashboards, make it p95 per endpoint. It changes the conversation immediately.
Capacity planning isn't a document you write once. It's a loop you run, and it always looks the same:

The order matters more than it looks.
Define the SLO first. "p95 under 300ms, error rate under 1%" is a target you can test against. "Fast" is not. Pin the number before you run anything, or you'll move the goalposts to wherever the results land.
Run a steady-state test at a load you believe reflects normal traffic. This is your reference point.
Run a spike test that ramps hard and fast. Steady-state tells you the cruising altitude; the spike tells you what happens when marketing sends an email at noon.
Find the first bottleneck and fix only that. When the system buckles, something gives way first. Fix that one thing, then re-test - because the fix usually just moves the ceiling to the next bottleneck, and you want to see it move.
Add headroom, then publish. Once you're inside SLO, give yourself 30-50% margin over expected peak and write down the number. A capacity limit nobody can find is the same as no capacity limit.
Theory is cheap. The production-scaling-lab project is a small ASP.NET Core 8 API built specifically so you can watch these failure modes happen. It has an I/O-bound endpoint to simulate connection pressure, an order-writing endpoint with deliberate load shedding, and background workers draining an outbox. Run it locally:
dotnet run --project src/ProductionScalingLab.Api/ProductionScalingLab.Api.csproj# API base URL: http://localhost:5080
The read path is an endpoint that just waits, standing in for a downstream call or a slow query - the kind of I/O-bound work that quietly eats threads and connections under load:
app.MapGet("/api/io-bound", async (int delayMs, CancellationToken ct) =>{ var boundedDelay = Math.Clamp(delayMs, 5, 5000); await Task.Delay(boundedDelay, ct); return Results.Ok(new { delayMs = boundedDelay, at = DateTime.UtcNow });});
The k6 script ramps virtual users from 200 to 1000 and asserts a p95 target. This is your baseline test - it tells you where latency starts to bend:
// k6/connections.jsexport const options = { stages: [ { duration: '30s', target: 200 }, { duration: '1m', target: 1000 }, { duration: '30s', target: 0 } ], thresholds: { http_req_duration: ['p(95)<400'], http_req_failed: ['rate<0.01'] }}; export default function () { const response = http.get('http://localhost:5080/api/io-bound?delayMs=30'); check(response, { 'status is 200': (r) => r.status === 200 }); sleep(1);}
Run it and watch p95:
k6 run k6/connections.js
For the first stretch the line stays flat. Somewhere as concurrency climbs, it bends upward and the p(95)<400 threshold goes red. That bend is your real capacity number for this endpoint - not the point where it errors out, but the point where it stops being fast.
Writes are different. You usually can't just let unlimited writes pile into the database and hope. The lab puts a gate in front of the order endpoint so that past a concurrency limit, it returns 429 instead of falling over:
app.MapPost("/api/orders", async ( CreateOrderRequest request, AppDbContext db, WriteGate writeGate, CancellationToken ct) =>{ if (string.IsNullOrWhiteSpace(request.CustomerEmail) || request.Amount <= 0) return Results.BadRequest("CustomerEmail and positive Amount are required."); if (!await writeGate.TryEnterAsync(ct)) return Results.StatusCode(StatusCodes.Status429TooManyRequests); try { var order = new Order { /* ... */ }; var outbox = new OutboxMessage { /* ... */ }; await using var tx = await db.Database.BeginTransactionAsync(ct); db.Orders.Add(order); db.OutboxMessages.Add(outbox); await db.SaveChangesAsync(ct); await tx.CommitAsync(ct); return Results.Accepted($"/api/orders/{order.Id}", new { orderId = order.Id }); } finally { writeGate.Exit(); }});
The gate itself is just a SemaphoreSlim with a short wait. If it can't get a slot in 250ms, the request is shed rather than queued forever:
public sealed class WriteGate{ private readonly SemaphoreSlim _semaphore; private int _inflight; public WriteGate(IConfiguration configuration) { var max = configuration.GetValue<int?>("LoadShedding:MaxConcurrentWrites") ?? 64; _semaphore = new SemaphoreSlim(max, max); } public int CurrentInflight => _inflight; public async Task<bool> TryEnterAsync(CancellationToken ct) { var acquired = await _semaphore.WaitAsync(TimeSpan.FromMilliseconds(250), ct); if (acquired) Interlocked.Increment(ref _inflight); return acquired; } public void Exit() { Interlocked.Decrement(ref _inflight); _semaphore.Release(); }}
This is the part that trips people up the first time: a 429 under extreme load is a good outcome. It means the system chose to protect the requests it can serve instead of accepting everything and serving none of it. Capacity planning is partly about deciding, ahead of time, where that line sits.
The write-spike test pushes a ramping arrival rate up to 800 requests/sec to find that line:
// k6/write-spike.jsexport const options = { scenarios: { write_spike: { executor: 'ramping-arrival-rate', startRate: 50, timeUnit: '1s', preAllocatedVUs: 100, maxVUs: 800, stages: [ { target: 100, duration: '30s' }, { target: 500, duration: '1m' }, { target: 800, duration: '30s' }, { target: 0, duration: '20s' } ] } }, thresholds: { http_req_duration: ['p(95)<700'], http_req_failed: ['rate<0.02'] }};
While the spike runs, hit the metrics endpoint and watch three numbers:
curl http://localhost:5080/api/metrics# { totalOrders, pendingOutbox, readModelCount, processedInbox, currentInflight }
The goal here is never zero errors. The goal is predictable degradation - knowing precisely what the API does when you push past its limit, so the failure is boring instead of catastrophic.
These are starting points, not laws. Your numbers depend on your hardware, your queries, and your SLO. But when I have nothing else to go on, this is roughly where I aim my effort:
The pattern across all four: as load grows, the work moves off the request path. Reads get cached, writes get queued, and the synchronous critical section gets as small as you can make it.
Before I'd call an API "capacity planned," I want all of these to be true:
If even one of those is missing, you're back to guessing on the next release call.
It's the process of measuring how much load your ASP.NET Core API can handle while staying inside its SLO targets - then writing that number down with the conditions attached, so scaling decisions are based on evidence instead of intuition.
p95 latency, almost always. It's the one that reflects what a real user feels. Pair it with timeout rate and error rate so you can tell "slow" apart from "broken."
On every meaningful architecture or database change, and at minimum once per release cycle. The fastest way to lose a capacity number is to ship three months of features on top of it and assume it still holds.
No - when it's deliberate, it's the system protecting the requests it can serve. The bug is accepting unlimited load and degrading everyone instead of shedding the excess.
Capacity planning isn't a spreadsheet you fill in once and forget. It's a loop: define the SLO, measure against it, find the first thing that breaks, fix that one thing, and measure again. Do that a few times and "can it handle Black Friday?" stops being a scary question and becomes a number you can defend.
Clone the lab, run the two k6 scripts, and watch your own p95 bend. Once you've seen exactly where your API degrades and what it does when it gets there, you've stopped guessing and started measuring - which is the whole point.
You can check out the full source code here: ProductionScalingLab-Demo on GitHub.
If you made it this far, you're serious about production-grade .NET systems. Use code DEEP20 for a discount on Design Patterns that Deliver.
Stop arguing about code style. In this course you get a production-proven setup with analyzers, CI quality gates, and architecture tests — the exact system I use in real projects. Join here.
Not sure yet? Grab the free Starter Kit — a drop-in setup with the essentials from Module 01.
Design Patterns that Deliver — Solve real problems with 5 battle-tested patterns (Builder, Decorator, Strategy, Adapter, Mediator) using practical, real-world examples. Trusted by 650+ developers.
Just getting started? Design Patterns Simplified covers 10 essential patterns in a beginner-friendly, 30-page guide for just $9.95.
Every Monday morning, I share 1 actionable tip on C#, .NET & Architecture that you can use right away. Join here.
Join 20,000+ subscribers who mass-improve their .NET skills with actionable tips on C#, Software Architecture & Best Practices.