Learn ASP.NET Core rate limiting with practical C# examples. Fixed window, sliding window, token bucket — protect your API from abuse today.
Every public API faces the same threat: one badly behaved client can hammer your endpoints, exhaust your database connections, and bring down the service for everyone. ASP.NET Core rate limiting gives you a built-in, production-ready defense against exactly this problem — no third-party packages required.
Since .NET 7, the Microsoft.AspNetCore.RateLimiting middleware has shipped as a first-class feature. In this guide, you'll learn how to configure every built-in algorithm, apply limits per-client and per-endpoint, and avoid the mistakes that quietly break rate limiting in production.
Why Rate Limiting in ASP.NET Core Matters
Rate limiting controls how many requests a client can make within a time window. Without it, your API is vulnerable to:
- Denial-of-service attacks — intentional flooding that crashes your service
- Accidental overload — a client bug that sends thousands of duplicate requests
- Resource exhaustion — database connection pools and memory drained by unchecked traffic
- Unfair usage — one heavy consumer starving out everyone else
Before .NET 7, most teams reached for AspNetCoreRateLimit, a popular NuGet package. That still works, but the built-in middleware is now the recommended approach — it's faster, better integrated with minimal APIs and controllers, and maintained by the .NET team.
Getting Started: Add the Rate Limiter Middleware
The rate limiting middleware lives in Microsoft.AspNetCore.RateLimiting, which is included in the ASP.NET Core shared framework. No extra NuGet install is needed for .NET 7 and later.
Here's the minimal setup:
using Microsoft.AspNetCore.RateLimiting;
using System.Threading.RateLimiting;
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddRateLimiter(options =>
{
options.AddFixedWindowLimiter("fixed", config =>
{
config.PermitLimit = 10;
config.Window = TimeSpan.FromSeconds(10);
config.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
config.QueueLimit = 2;
});
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
});
var app = builder.Build();
app.UseRateLimiter();
app.MapGet("/api/data", () => Results.Ok(new { message = "Success" }))
.RequireRateLimiting("fixed");
app.Run();
This creates a fixed window policy named "fixed" that allows 10 requests every 10 seconds per partition. When a client exceeds the limit, they receive a 429 Too Many Requests response. The QueueLimit of 2 means up to 2 excess requests will wait in a queue instead of being rejected immediately.
Understanding the Four Rate Limiting Algorithms
ASP.NET Core ships four algorithms in the System.Threading.RateLimiting namespace. Each fits different use cases.
1. Fixed Window Rate Limiting
The simplest approach. It counts requests within fixed time segments. When the window resets, the count drops to zero.
Problem: Burst traffic at window boundaries. A client can send 10 requests at second 9, then 10 more at second 11 — effectively 20 requests in 2 seconds.
options.AddFixedWindowLimiter("fixed", config =>
{
config.PermitLimit = 100;
config.Window = TimeSpan.FromMinutes(1);
});
2. Sliding Window Rate Limiting
Divides the window into segments and slides forward, smoothing out the burst problem that fixed windows have.
options.AddSlidingWindowLimiter("sliding", config =>
{
config.PermitLimit = 100;
config.Window = TimeSpan.FromMinutes(1);
config.SegmentsPerWindow = 6; // each segment = 10 seconds
});
With 6 segments, the window slides every 10 seconds. This prevents the boundary-burst problem because old segments roll out gradually rather than all at once.
3. Token Bucket Rate Limiting
Token bucket rate limiting in C# is ideal when you want to allow short bursts while enforcing a steady average rate. Tokens refill at a constant rate, and each request costs one token.
options.AddTokenBucketLimiter("token", config =>
{
config.TokenLimit = 20; // max burst size
config.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
config.TokensPerPeriod = 5; // 5 tokens every 10 seconds
config.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
config.QueueLimit = 2;
config.AutoReplenishment = true;
});
This allows a burst of up to 20 requests, then settles to a sustained rate of roughly 30 requests per minute. It's the best choice for most public APIs.
4. Concurrency Limiter
Unlike the others, this doesn't count requests over time — it limits how many requests are in flight simultaneously. When a request completes, its slot opens up immediately.
options.AddConcurrencyLimiter("concurrent", config =>
{
config.PermitLimit = 5;
config.QueueProcessingOrder = QueueProcessingOrder.OldestFirst;
config.QueueLimit = 10;
});
Use this for endpoints that call slow external services or run heavy database queries. It prevents resource exhaustion regardless of request rate.
Per-Client API Rate Limiting with Partitions
The examples above apply a single global limit. In production, you need per-client rate limiting — each user gets their own bucket. You achieve this with partitioned rate limiters.
builder.Services.AddRateLimiter(options =>
{
options.AddPolicy("per-user", context =>
{
var userId = context.User?.FindFirst("sub")?.Value
?? context.Connection.RemoteIpAddress?.ToString()
?? "anonymous";
return RateLimitPartition.GetTokenBucketLimiter(userId, _ => new TokenBucketRateLimiterOptions
{
TokenLimit = 20,
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 5,
AutoReplenishment = true
});
});
options.RejectionStatusCode = StatusCodes.Status429TooManyRequests;
options.OnRejected = async (context, cancellationToken) =>
{
context.HttpContext.Response.ContentType = "application/json";
await context.HttpContext.Response.WriteAsJsonAsync(new
{
error = "Too many requests",
retryAfter = context.Lease.TryGetMetadata(
MetadataName.RetryAfter, out var retryAfter)
? retryAfter.TotalSeconds
: 60
}, cancellationToken);
};
});
The partition key — here, the user's sub claim or their IP address — isolates each client's rate limit. User A hitting 20 requests doesn't affect User B at all.
The OnRejected callback customizes the 429 response body with a JSON payload and retryAfter hint, which is critical for well-behaved API clients.
Applying Rate Limits to Controllers and Endpoints
You have three options for applying rate limiting policies to your routes.
Minimal API Endpoints
app.MapGet("/api/products", GetProducts)
.RequireRateLimiting("per-user");
app.MapPost("/api/orders", CreateOrder)
.RequireRateLimiting("per-user");
// Exempt health checks from rate limiting
app.MapGet("/health", () => Results.Ok())
.DisableRateLimiting();
Controller Attributes
[ApiController]
[Route("api/[controller]")]
[EnableRateLimiting("per-user")]
public class ProductsController : ControllerBase
{
[HttpGet]
public IActionResult GetAll() => Ok(new[] { "Product1", "Product2" });
[HttpGet("{id}")]
[DisableRateLimiting] // exempt this specific action
public IActionResult GetById(int id) => Ok($"Product {id}");
}
Global Rate Limiting
options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(context =>
{
var ip = context.Connection.RemoteIpAddress?.ToString() ?? "unknown";
return RateLimitPartition.GetFixedWindowLimiter(ip, _ => new FixedWindowRateLimiterOptions
{
PermitLimit = 200,
Window = TimeSpan.FromMinutes(1)
});
});
The global limiter runs before any per-endpoint policy. Use it as a safety net — a generous limit that catches only truly abusive traffic — and use per-endpoint policies for tighter, targeted limits.
Returning Proper Rate Limit Headers
Well-designed APIs tell clients about their current rate limit status through response headers. Here's how to add standard rate limit headers:
app.Use(async (context, next) =>
{
await next();
if (context.Response.Headers.ContainsKey("X-RateLimit-Limit"))
return;
var rateLimiterFeature = context.Features.Get<IRateLimiterFeature>();
if (rateLimiterFeature?.Lease is { } lease)
{
if (lease.TryGetMetadata(MetadataName.RetryAfter, out var retryAfter))
{
context.Response.Headers["Retry-After"] =
((int)retryAfter.TotalSeconds).ToString();
}
}
});
Best Practices for ASP.NET Core Rate Limiting
After seeing rate limiting misconfigured in dozens of production systems, here are the practices that actually matter:
- Use token bucket for public APIs. It handles bursty traffic gracefully while enforcing a steady average — the best default for most scenarios.
- Always partition by user identity, not just IP. Behind corporate NATs and VPNs, thousands of users share one IP address. Rate limiting by IP alone punishes entire offices for one user's behavior.
- Set the global limiter generous, per-endpoint tight. The global limiter is your emergency brake (200-500 requests/minute). Per-endpoint limits are your real policy.
- Return Retry-After headers. Well-behaved clients back off automatically. Without the header, they keep hammering and your rejection rate stays high.
- Don't rate-limit health check endpoints. Load balancers and orchestrators call
/healthconstantly. Rate limiting them causes false-positive service restarts. - Log rejections. Use the
OnRejectedcallback to log which clients are being throttled. This data is essential for tuning limits and detecting abuse. - Middleware order matters. Place
UseRateLimiter()afterUseAuthentication()andUseAuthorization()so you have access to the user's identity for per-user partitioning.
Common Pitfalls to Avoid
These are the mistakes that silently break rate limiting:
- Forgetting
UseRateLimiter(). Registering policies inAddRateLimiterdoes nothing without the middleware call. The code compiles fine — you just have zero protection. - Wrong middleware order. Placing
UseRateLimiter()beforeUseAuthentication()meanscontext.Useris null in your partition function. Every request maps to the same anonymous partition. - Using only in-memory rate limiting behind a load balancer. The built-in middleware stores state in memory. If you have 4 app instances, each client effectively gets 4x the intended limit. For distributed scenarios, use Redis-backed rate limiting or move enforcement to your API gateway.
- Queue limits too high. A
QueueLimitof 100 means 100 requests sitting in memory waiting for a permit. Under sustained load, this consumes threads and memory — it's often better to reject immediately with a low queue limit (0-5). - Not testing under load. Rate limiting logic that looks correct in unit tests can fail under concurrency. Use tools like
k6orbombardierto verify behavior at scale.
Distributed Rate Limiting with Redis
For multi-instance deployments, you need shared state. While the built-in middleware doesn't include a Redis provider out of the box, you can implement a custom IRateLimiterPolicy backed by Redis or use a library like RedisRateLimiting:
// Install: dotnet add package RedisRateLimiting
using RedisRateLimiting;
using StackExchange.Redis;
var redis = ConnectionMultiplexer.Connect("localhost:6379");
builder.Services.AddRateLimiter(options =>
{
options.AddRedisTokenBucketLimiter("redis-token", config =>
{
config.ConnectionMultiplexerFactory = () => redis;
config.TokenLimit = 20;
config.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
config.TokensPerPeriod = 5;
});
});
This ensures all instances of your application share a single rate limit counter per client, giving you accurate enforcement regardless of how many pods or servers are running.
Testing Your Rate Limiting Configuration
Always verify your limits work before deploying. Here's a quick integration test using WebApplicationFactory:
[Fact]
public async Task RateLimiter_Returns429_WhenLimitExceeded()
{
await using var app = new WebApplicationFactory<Program>();
using var client = app.CreateClient();
var tasks = Enumerable.Range(0, 15)
.Select(_ => client.GetAsync("/api/data"));
var responses = await Task.WhenAll(tasks);
var tooMany = responses.Count(r => r.StatusCode ==
System.Net.HttpStatusCode.TooManyRequests);
Assert.True(tooMany > 0,
"Expected at least one 429 response when exceeding rate limit");
}
Conclusion
ASP.NET Core rate limiting is no longer optional for production APIs — it's a first-class middleware that takes minutes to set up and prevents hours of downtime. Here are the key takeaways:
- Use the built-in
Microsoft.AspNetCore.RateLimitingmiddleware — it ships with .NET 7+ and requires no extra packages. - Choose token bucket for public APIs, sliding window for strict quotas, and concurrency limiter for protecting slow downstream services.
- Always partition by authenticated user identity, falling back to IP address for anonymous clients.
- Return
Retry-Afterheaders and meaningful JSON error bodies on 429 responses. - For multi-instance deployments, back your rate limiting with Redis to get accurate distributed counters.
Start with the token bucket example above, tune the limits based on your actual traffic patterns, and add OnRejected logging from day one. Your future self — and your on-call team — will thank you.
Your go-to resource for C#, .NET, and modern software development. Follow along for daily tutorials, tips, and real-world examples.
Comments
Post a Comment