Django performance behind Nginx and uWSGI rarely comes from a single switch. It comes from four levers working together: a concurrency model sized to your CPU and workload, keeping Python off the hot path (Nginx serves static, media, TLS, and compression), recycling workers so memory leaks and slow requests cannot snowball, and caching plus offloading slow work to Redis and Celery. Tune those four and a modest VPS will comfortably serve traffic that an untuned box chokes on.
This is the tuning deep-dive. If you just need to get Django live for the first time -- the virtualenv, the systemd unit, certbot HTTPS, and production settings.py -- start with our step-by-step deploy guide, Deploy Django in Production with Nginx and uWSGI, and come back here to make it fast. To automate the same stack across many servers, see deploying Django with uWSGI and Nginx using Ansible.
Heads-up for 2026: uWSGI is in maintenance mode. The maintainers announced in late 2022 that the project would get security and maintenance fixes only, with no new feature development. It is still fast, stable, and running in countless production systems, so there is no need to rip it out -- but for brand-new projects most teams now reach for Gunicorn, and Granian (a Rust-based server) is the modern choice when raw throughput matters. The comparison below is honest about the trade-offs; the rest of the guide then tunes the full uWSGI path.
We have built, deployed, and performance-tuned Django on Linux for 12+ years across 50+ projects, so the numbers and config here mirror what we actually run. If you want a hand, see our Django development services.
Where Django performance actually comes from
Every request takes this path, and each hop is a place to win or lose latency:
Browser --HTTP/2, TLS, gzip/brotli--> Nginx --uwsgi proto (Unix socket)--> uWSGI workers --WSGI--> Django --> Postgres/Redis
^ |
+---- /static/, /media/ ------------+ served from disk, never touches Python
The app server (uWSGI) is only one lever. Before you add workers, make sure Nginx is doing its job: terminating TLS, serving static and media straight from disk, compressing responses, and buffering slow clients so a worker is never tied up streaming bytes to a phone on a train. After that, the biggest wins are usually in the application itself -- a cache in front of an expensive query beats any worker-count tweak. Tune in this order: Nginx offload -> worker sizing and recycling -> application caching/pooling/offload -> measure and repeat.
uWSGI vs Gunicorn vs Granian for throughput
All three sit in the same slot behind Nginx. Pick the application server before you tune anything else.
| Server | Language / protocol | Concurrency model | Throughput | 2026 status |
|---|---|---|---|---|
| uWSGI | C / WSGI | Prefork processes + optional threads; rich tuning (cheaper, Emperor, harakiri) | High when tuned | Maintenance mode -- security/fixes only |
| Gunicorn | Python / WSGI (+ ASGI via Uvicorn workers) | Prefork sync, gthread, or uvicorn.workers |
Good; simplest to run | Actively developed; de-facto Django default |
| Granian | Rust / WSGI, ASGI, RSGI | Multi-process + async runtime, HTTP/1.1, HTTP/2, no extra reverse proxy needed for TLS | Highest in many benchmarks | Actively developed; modern high-performance pick |
Practical guidance: keep uWSGI if you already run it well -- it is not slow and the tuning knobs below are excellent. Choose Gunicorn for a new synchronous project that values simplicity. Reach for Granian when you want the best raw numbers or a single binary that speaks both WSGI and ASGI. Whatever you pick, measure on your workload -- generic benchmarks rarely match your query mix.
WSGI vs ASGI: match the server to the workload
uWSGI is a WSGI server -- it shines at classic request/response Django. The moment you need long-lived connections, you want ASGI.
| Workload | Best fit | Why |
|---|---|---|
| CRUD pages, REST/JSON APIs, server-rendered templates | WSGI (uWSGI / Gunicorn) | Short, synchronous request/response; prefork workers are ideal |
| WebSockets, Django Channels, server-sent events, long-polling | ASGI (Uvicorn / Granian / Daphne) | One worker holds thousands of idle connections without a process each |
Mostly I/O-bound async views (async def, external API fan-out) |
ASGI, or WSGI + threads | The event loop overlaps waits; WSGI threads can help modestly |
| CPU-bound rendering, PDF generation, image processing | WSGI with more processes | The GIL means threads/async do not parallelise CPU; use processes |
If your app is purely synchronous today, a tuned WSGI server is the simplest fast option. If you are adding real-time features, run an ASGI server pointed at asgi.py rather than trying to bolt websockets onto uWSGI.
Sizing uWSGI workers: processes vs threads
This is the question everyone gets wrong. uWSGI gives you two dials:
processes-- separate OS processes, each with its own Python interpreter. They run truly in parallel across cores and sidestep the GIL. This is your primary throughput dial.threads-- threads inside each worker process. Because of the GIL only one thread runs Python at a time, so threads do not speed up CPU-bound code. They help when workers spend time waiting on I/O (a slow DB query, an external API) by letting another thread serve a request during the wait.
A solid starting point for CPU-bound or mixed sync apps:
processes = (2 x CPU cores) + 1
For I/O-bound apps, add a few threads per process (e.g. processes = cores, threads = 4-8) and measure. Two hard constraints keep you honest:
- Memory. Each process loads your whole app. If a worker uses ~150 MB and you have 2 GB free, you can afford roughly 10-12 workers, not 50. Over-provisioning workers causes swapping, which is far slower than queueing.
- The database. N workers x M threads can each open a DB connection. Make sure Postgres
max_connections(or your PgBouncer pool) covers it, or you will trade web stalls for DB connection errors.
When in doubt, start near the formula, load test, and raise workers only while p95 latency keeps improving without memory pressure.
; /home/django/myproject/myproject_uwsgi.ini -- tuned for performance
[uwsgi]
; --- paths ---
chdir = /home/django/myproject
module = myproject.wsgi:application
home = /home/django/myproject/.venv
env = DJANGO_SETTINGS_MODULE=myproject.settings
; --- master + dynamic worker scaling (the "cheaper" subsystem) ---
master = true
processes = 12 ; HARD CAP on workers (size to RAM + cores)
cheaper = 3 ; keep at least 3 idle-ish workers running
cheaper-initial = 4 ; start with 4 on boot
cheaper-algo = spare ; spawn/retire workers based on spare capacity
cheaper-step = 2 ; add workers 2 at a time under load
threads = 2 ; only useful for I/O-bound work (GIL-bound CPU won't scale)
; --- the socket Nginx talks to (Unix socket = fastest, local only) ---
socket = /run/uwsgi/myproject.sock
chmod-socket = 660
chown-socket = django:www-data
listen = 1024 ; accept() backlog; must be <= net.core.somaxconn
thunder-lock = true ; serialise accept() -> avoids the thundering herd
; --- recycle workers: bound memory leaks + clear slow-leak cruft ---
max-requests = 5000 ; respawn a worker after N requests
max-requests-delta = 500 ; +/- jitter so workers don't all respawn at once
reload-on-rss = 512 ; respawn any worker whose RSS exceeds 512 MB
worker-reload-mercy = 30 ; grace period before a reloading worker is killed
; --- timeouts + request buffering ---
harakiri = 30 ; kill a worker stuck > 30s (keep < nginx uwsgi_read_timeout)
harakiri-verbose = true ; log a traceback when harakiri fires
buffer-size = 16384 ; max request header size (default 4096 is small)
post-buffering = 8192 ; buffer request bodies for predictable reads
; --- hygiene + observability ---
vacuum = true ; remove socket/pidfile on exit
die-on-term = true ; honour systemd SIGTERM cleanly
disable-logging = true ; let Nginx own the access log; keep uWSGI logs lean
log-4xx = true
log-5xx = true
stats = 127.0.0.1:9191 ; scrape with: uwsgitop 127.0.0.1:9191
stats-http = trueWhat each knob is buying you
cheaper+cheaper-algoturn a fixed worker pool into a dynamic one: uWSGI runs the minimum (cheaper) when idle and scales up toprocessesunder load. Thesparealgorithm is always available;busyness(a plugin) scales on real busy-time and is worth a look at higher scale. This saves RAM off-peak without hurting peak throughput.max-requests+max-requests-deltaperiodically recycle workers so a slow memory leak never grows unbounded. The delta adds per-worker jitter so they do not all respawn in the same second and drop a chunk of capacity at once.reload-on-rssis the safety net for a fast leak.harakiricaps how long any single request can hold a worker. Set it below Nginx'suwsgi_read_timeoutso uWSGI kills the stuck worker first and frees the slot. For genuinely slow work, do not raise harakiri -- push the job to Celery instead.thunder-lockstops every worker waking up for one connection (the "thundering herd"); on Linux it is a clear win.listensets the kernel accept backlog -- raise it andnet.core.somaxconntogether, or the OS silently clamps it.- Unix socket vs TCP: prefer the Unix socket (
/run/...sock) when Nginx and uWSGI share a host -- it avoids the TCP/IP stack entirely. Use a TCP socket (127.0.0.1:8001) only when they run on different machines.
Nginx tuning: keep Python off the hot path
The fastest request is the one uWSGI never sees. Every byte of CSS, JS, and media that Nginx serves from disk is a request your Python workers are free to skip. Get these right:
sendfile+tcp_nopush+tcp_nodelay-- zero-copy static delivery, full packets for bulk data, low latency for small dynamic responses.- Upstream
keepalive-- reuse connections to uWSGI instead of opening a new one per request. gzipandbrotli-- compress text responses; brotli (viangx_brotli) beats gzip on HTML/CSS/JS.expires+ immutableCache-Controlon hashed static assets so browsers and CDNs stop re-fetching them.- HTTP/2 -- multiplex many assets over one connection; enable once TLS is on.
uwsgi_buffering-- let Nginx absorb the response and free the worker immediately.
# ---- /etc/nginx/nginx.conf (http {} block, global) ----
sendfile on;
tcp_nopush on; # send full packets (pairs with sendfile)
tcp_nodelay on; # don't delay small packets (low latency)
keepalive_timeout 65;
gzip on;
gzip_comp_level 5;
gzip_min_length 256;
gzip_types text/plain text/css application/json application/javascript text/xml application/xml image/svg+xml;
# brotli on; brotli_comp_level 5; brotli_types text/css application/javascript application/json image/svg+xml; # needs ngx_brotli
# ---- /etc/nginx/sites-available/myproject ----
upstream myproject_uwsgi {
server unix:/run/uwsgi/myproject.sock;
keepalive 16; # reuse upstream connections
}
server {
listen 443 ssl;
http2 on; # nginx 1.25+; multiplex assets over one connection
server_name example.com www.example.com;
client_max_body_size 75M;
# static + media: served by Nginx, immutable + long cache for hashed files
location /static/ {
alias /home/django/myproject/staticfiles/;
expires 30d;
add_header Cache-Control "public, immutable";
access_log off;
}
location /media/ {
alias /home/django/myproject/media/;
expires 7d;
}
# dynamic -> uWSGI
location / {
uwsgi_pass myproject_uwsgi;
include uwsgi_params;
uwsgi_param HTTP_X_FORWARDED_PROTO $scheme;
uwsgi_read_timeout 60s; # keep > uWSGI harakiri (30s)
uwsgi_buffering on;
uwsgi_buffers 16 16k; # absorb responses, free the worker fast
uwsgi_buffer_size 16k;
}
}A few of these reach below Nginx. The upstream keepalive and the listen/backlog tuning only pay off if the kernel agrees, so raise the system limits to match:
# /etc/sysctl.d/99-tuning.conf (then: sudo sysctl --system)
net.core.somaxconn = 1024 # must be >= uWSGI "listen"
net.ipv4.tcp_tw_reuse = 1
fs.file-max = 200000
The single most common mistake here is proxying static files through Django. If you see your Python workers serving /static/, fix the alias paths and run collectstatic -- you are paying full WSGI overhead to hand back a file Nginx could stream in microseconds.
Application levers: cache, pool, offload
Once the servers are tuned, the biggest remaining wins are in Django itself.
- Cache with Redis. Django ships a Redis backend (
django.core.cache.backends.redis.RedisCache). Cache expensive querysets, rendered fragments, or whole views; even a short timeout absorbs traffic spikes. Move sessions to the cache too. - Reuse database connections. Set
CONN_MAX_AGEso each worker keeps its Postgres connection between requests instead of reconnecting every time. At higher scale, put PgBouncer in front of Postgres for real pooling (use transaction pooling and setDISABLE_SERVER_SIDE_CURSORS = True). - Offload slow work to Celery. Anything over a few hundred milliseconds -- email, PDFs, third-party API calls, image processing -- belongs in a background task, not in the request/response cycle. This is also why you keep
harakirilow: slow work should never live in a web worker.
# myproject/settings.py -- performance levers
# 1) Redis cache (built into Django 4.0+); also back sessions with it
CACHES = {
"default": {
"BACKEND": "django.core.cache.backends.redis.RedisCache",
"LOCATION": "redis://127.0.0.1:6379/1",
}
}
SESSION_ENGINE = "django.contrib.sessions.backends.cache"
# 2) Persistent DB connections -- avoid reconnecting on every request.
# Use a number with bare Postgres; use 0 (or None) when PgBouncer pools for you.
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql",
"NAME": "myproject",
"CONN_MAX_AGE": 60, # seconds; None = persistent
"CONN_HEALTH_CHECKS": True, # drop dead connections (Django 4.1+)
# behind PgBouncer transaction pooling, also set:
# "DISABLE_SERVER_SIDE_CURSORS": True,
}
}
# 3) Cache a hot, rarely-changing view
from django.views.decorators.cache import cache_page
@cache_page(60 * 5) # 5 minutes
def pricing_table(request):
...Measure, don't guess: benchmarking
Worker counts, thread counts, and cache timeouts are empirical -- the only honest way to set them is to load test against a copy of production. Pick one tool and watch the right numbers:
wrk-- tiny, brutal HTTP benchmarker for a single endpoint.k6-- scriptable scenarios (login, browse, submit) with thresholds in CI.- Locust -- Python-defined user behaviour, great for realistic mixed traffic.
Watch p95/p99 latency (not the average -- averages hide the pain), requests/sec at a fixed error rate, and memory per worker. On the uWSGI side, uwsgitop shows per-worker busyness and the listen queue: a growing listen queue or rising avg-rt means you are out of workers; OOM/swap means you have too many. Change one variable at a time, re-run, keep what helps.
# 1) Hammer one endpoint: 4 threads, 100 connections, 30 seconds
wrk -t4 -c100 -d30s https://example.com/api/items/
# 2) k6 scenario with a p95 threshold (fails CI if too slow)
cat > load.js <<'JS'
import http from 'k6/http';
import { check } from 'k6';
export const options = {
vus: 50, duration: '30s',
thresholds: { http_req_duration: ['p(95)<400'] }, // p95 under 400ms
};
export default function () {
const res = http.get('https://example.com/api/items/');
check(res, { 'status 200': (r) => r.status === 200 });
}
JS
k6 run load.js
# 3) Watch uWSGI workers live (needs "stats = 127.0.0.1:9191" in the ini)
uwsgitop 127.0.0.1:9191A practical tuning workflow
- Offload first. Confirm Nginx serves
/static/and/media/, terminates TLS, and gzips/brotlis text. No Python should touch a static asset. - Baseline. Load test the untuned app and record p95, RPS, and memory. You cannot improve what you have not measured.
- Size workers. Start at
(2 x cores) + 1processes; add threads only for I/O-bound paths. Re-test; raise workers while p95 falls and memory holds. - Recycle + cap. Set
max-requests,max-requests-delta,reload-on-rss, and a saneharakiri. Addcheaperfor dynamic scaling. - Cache and pool. Add Redis caching,
CONN_MAX_AGE/PgBouncer, and push slow work to Celery. Re-test. - Repeat until latency and cost meet your target.
When one box is no longer enough, the next step is horizontal scale -- multiple app servers behind a load balancer, a managed Postgres, and a CDN in front of static. Our AWS consulting and cloud migration teams handle the autoscaling groups, read replicas, and zero-downtime cut-over. We have run this playbook on Django for 12+ years across 50+ projects.
Frequently Asked Questions
Is uWSGI still a good choice in 2026?
Yes, with eyes open. uWSGI is in maintenance mode -- it receives security and bug fixes but no new features -- yet it remains fast, stable, and very widely deployed. If you already run it well, the tuning knobs in this guide (cheaper scaling, worker recycling, harakiri, thunder-lock) make it an excellent server. For brand-new projects, most teams now pick Gunicorn for simplicity or Granian when they want the highest raw throughput.
How many uWSGI workers (processes) should I run?
Start near (2 x CPU cores) + 1 for CPU-bound or mixed apps, then load test and adjust. The real ceiling is memory: each process loads your whole app, so divide free RAM by per-worker RSS to find your maximum. Adding more workers than RAM allows causes swapping, which is slower than simply queueing requests. Also confirm your database can handle workers x threads connections.
When should I add threads instead of more processes?
Add threads only for I/O-bound work -- requests that spend time waiting on the database or an external API. Because of Python's GIL, only one thread per process runs Python at a time, so threads do nothing for CPU-bound code; for that you need more processes. A common I/O-bound setup is processes equal to cores with 4-8 threads each, tuned by measurement.
What does max-requests do and why set it?
max-requests tells uWSGI to recycle (restart) a worker after it has served that many requests, which bounds slow memory leaks and clears accumulated cruft. Pair it with max-requests-delta so workers do not all respawn in the same instant and drop capacity together, and add reload-on-rss as a safety net that restarts any worker whose memory crosses a hard limit.
Should Nginx talk to uWSGI over a Unix socket or TCP?
Use a Unix socket when Nginx and uWSGI run on the same host -- it skips the TCP/IP stack and is faster, and you can lock it down with filesystem permissions. Use a TCP socket such as 127.0.0.1:8001 only when the two run on different machines. Either way, enable upstream keepalive in Nginx so connections are reused instead of reopened per request.
How do I serve async Django or WebSockets at speed?
uWSGI is a WSGI server, so it cannot hold long-lived async connections efficiently. For WebSockets, Django Channels, server-sent events, or heavily async views, run an ASGI server -- Uvicorn, Granian, or Daphne -- pointed at your project's asgi.py, with Nginx still in front as the reverse proxy adding the WebSocket upgrade headers.
How do I find the right settings without guessing?
Benchmark a production-like copy with wrk, k6, or Locust and change one variable at a time. Watch p95/p99 latency (not the average), requests per second at a fixed error rate, and memory per worker, and use uwsgitop to read per-worker busyness and the listen queue. A growing listen queue means too few workers; swapping means too many.