Gunicorn and Uvicorn in production — the worker tuning we actually apply

Python web services in 2026 split into two camps: traditional WSGI (Django, Flask) served via Gunicorn sync workers, and modern ASGI (FastAPI, Starlette, Django 5 async) served via Uvicorn workers under a Gunicorn master. Both can be production-grade. Neither works well at defaults.

The single most common misconfiguration we see: 2 × CPU + 1 workers running against an async ASGI app that should be one Uvicorn worker per core with no concurrency multiplier. The CPU sits idle because every worker is single-threaded and async runs in a single event loop.

The Gunicorn + Uvicorn pattern for FastAPI

gunicorn app.main:app \
  --workers $(( $(nproc) )) \
  --worker-class uvicorn.workers.UvicornWorker \
  --bind 0.0.0.0:8000 \
  --timeout 30 \
  --graceful-timeout 30 \
  --keep-alive 5 \
  --max-requests 10000 \
  --max-requests-jitter 1000 \
  --access-logfile -

One Uvicorn worker per core. Each worker runs a single-threaded event loop. Async concurrency comes from awaiting I/O, not from threading.

The full write-up covers:

The sync-worker formula (2 × CPU) + 1 and where it's wrong
Why --max-requests matters (slow memory creep, GC fragmentation)
--timeout vs --graceful-timeout — and the order they fire
When gthread (sync with threads) is the right pick
ProxyHeadersMiddleware for the X-Forwarded-For chain
Reading gunicorn --print-config to verify what's actually loaded

We ship this configuration on every managed Python service.