Gunicorn + Uvicorn Workers¶
Gunicorn is a battle-tested WSGI/ASGI process manager. Uvicorn provides the ASGI worker class. Together they give you:
- Pre-fork worker pool managed by a supervisor process
- Automatic worker restart on crash
- Graceful rolling restarts with zero downtime
- OS signal-based configuration reload
When to use this pattern
A single uvicorn process already handles thousands of concurrent connections
via async I/O. Add Gunicorn when you need multi-core CPU utilisation without
Kubernetes/Docker orchestration, or when your infrastructure team requires Gunicorn
for consistency with other Python services.
Install¶
Quickstart¶
gunicorn main:app \
--worker-class uvicorn.workers.UvicornWorker \
--workers 4 \
--bind 0.0.0.0:8000
Worker count guidelines¶
| vCPUs | Recommended workers |
|---|---|
| 1 | 3 |
| 2 | 5 |
| 4 | 9 |
| 8 | 17 |
Each Uvicorn worker is an async event loop, so it handles many concurrent connections. More workers = more parallelism for CPU-bound code but also more memory.
Check your core count:
gunicorn.conf.py¶
Keep all configuration in a file rather than long command-line flags:
# gunicorn.conf.py
import multiprocessing
# Server socket
bind = "0.0.0.0:8000"
backlog = 2048
# Worker processes
worker_class = "uvicorn.workers.UvicornWorker"
workers = multiprocessing.cpu_count() * 2 + 1
worker_connections = 1000
max_requests = 10_000 # restart worker after N requests (prevents memory leaks)
max_requests_jitter = 1_000 # randomise to prevent thundering-herd restarts
# Timeouts
timeout = 30 # worker killed if no response within 30 s
graceful_timeout = 30 # time allowed for in-flight requests on SIGTERM
keepalive = 5 # seconds to keep idle connections open
# Logging
accesslog = "-" # stdout
errorlog = "-" # stderr
loglevel = "info"
access_log_format = '%(h)s "%(r)s" %(s)s %(b)s %(D)sμs'
# Process naming
proc_name = "fasterapi"
# Security: drop privileges after binding
user = "fasterapi"
group = "fasterapi"
# Preload app for faster worker fork and shared memory
preload_app = True
Run with the config file:
UvicornH11Worker vs UvicornWorker¶
| Worker class | HTTP/2 | WebSocket | When to use |
|---|---|---|---|
uvicorn.workers.UvicornWorker |
No (HTTP/1.1 only) | Yes | Behind a reverse proxy that terminates HTTP/2 (recommended) |
uvicorn.workers.UvicornH11Worker |
No | Yes | Explicit h11 parser; slightly more strict |
Always terminate HTTP/2 at Nginx/Traefik and use plain HTTP/1.1 between the proxy and Gunicorn.
systemd integration¶
Combine with the systemd guide:
[Unit]
Description=FasterAPI via Gunicorn
After=network.target
[Service]
Type=notify
User=fasterapi
Group=fasterapi
WorkingDirectory=/opt/myapp/app
EnvironmentFile=-/etc/fasterapi.env
Environment="PATH=/opt/myapp/.venv/bin:/usr/bin:/bin"
ExecStart=/opt/myapp/.venv/bin/gunicorn main:app \
--config /opt/myapp/gunicorn.conf.py
ExecReload=/bin/kill -s HUP $MAINPID
TimeoutStopSec=30
KillMode=mixed
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
KillMode=mixed sends SIGTERM to the master (triggers graceful drain) and
SIGKILL to any remaining workers after TimeoutStopSec.
Zero-downtime rolling restart¶
# Signal master to spawn new workers then kill old ones gracefully
kill -USR2 $(cat /var/run/gunicorn.pid)
# Old master exits after new master is healthy
kill -WINCH $(cat /var/run/gunicorn.pid.2)
Or via systemd:
Monitoring worker health¶
# List all gunicorn processes
ps aux | grep gunicorn
# Check master PID
cat /var/run/gunicorn.pid
# Worker restarts (high count → memory leak or crash loop)
journalctl -u fasterapi | grep "Worker exiting"
Pre-loading and shared state¶
preload_app = True imports your app once in the master before forking.
Workers share the code segment (copy-on-write), reducing total memory use.
Warning
Do not open database connections or asyncio event loops at import time
when preload_app = True. Each forked worker needs its own connection pool.
Use a lifespan handler instead.
# main.py — safe with preload_app
from contextlib import asynccontextmanager
from FasterAPI import Faster
import databases
DATABASE_URL = "postgresql+asyncpg://..."
@asynccontextmanager
async def lifespan(app):
# Opens AFTER fork — each worker gets its own pool
app.state.db = databases.Database(DATABASE_URL)
await app.state.db.connect()
yield
await app.state.db.disconnect()
app = Faster(lifespan=lifespan)
Performance tuning¶
| Option | Value | Effect |
|---|---|---|
worker_connections |
1000–4000 | Max simultaneous connections per worker |
max_requests |
5000–20000 | Restart worker after N requests (memory hygiene) |
keepalive |
2–75 | Reuse TCP connections; match your load balancer timeout |
timeout |
30–120 | Kill unresponsive workers; set > your slowest endpoint |
preload_app |
True |
Share code pages across workers |
Docker + Gunicorn¶
FROM python:3.13-slim
WORKDIR /app
COPY pyproject.toml gunicorn.conf.py ./
RUN pip install --no-cache-dir ".[all]" gunicorn
COPY app/ app/
RUN useradd -r -u 1001 fasterapi
USER fasterapi
EXPOSE 8000
CMD ["gunicorn", "app.main:app", "--config", "gunicorn.conf.py"]
Next steps¶
- systemd Service — process supervision on Linux.
- HTTPS — Let's Encrypt — TLS in front of Gunicorn.
- Docker — containerising the app.