Deploying with an External Postgres¶
SentriKat's docker-compose ships with a colocated postgres:15-alpine container for quick start. Production deployments typically point the application at a managed Postgres or a dedicated VM.
Supported topologies¶
| Topology | DATABASE_URL example |
|---|---|
| Same-host (default) | postgresql://sentrikat:pwd@db:5432/sentrikat |
| Same-VPC VM | postgresql://user:[email protected]:5432/sentrikat |
| AWS RDS | postgresql://user:[email protected]:5432/sentrikat?sslmode=require |
| GCP Cloud SQL (TCP) | postgresql://user:[email protected]:5432/sentrikat?sslmode=require |
| GCP Cloud SQL (Unix socket) | postgresql://user:pwd@/sentrikat?host=/cloudsql/PROJECT:REGION:INSTANCE |
| Azure Database | postgresql://user:[email protected]:5432/sentrikat?sslmode=require |
| PgBouncer in front | Same URL pattern, point at PgBouncer endpoint |
To switch to an external DB, drop the db service from your docker-compose.yml and set:
environment:
DATABASE_URL: postgresql://user:[email protected]:5432/sentrikat?sslmode=require
TLS / SSL setup¶
sslmode=require is the minimum for managed clouds. For mutual TLS:
Mount the CA bundle as a read-only file:
sslmode levels:
| Mode | Verifies CA? | Verifies hostname? | Use when |
|---|---|---|---|
disable | no | no | local dev only |
require | no | no | basic encryption |
verify-ca | yes | no | trusted private CA |
verify-full | yes | yes | production with managed cloud |
Connection pool tuning¶
Default settings:
DB_POOL_SIZE = 10 # base connections per worker
DB_POOL_MAX_OVERFLOW = 20 # extra above pool_size on burst
DB_POOL_TIMEOUT = 30 # seconds to wait for free conn
DB_POOL_RECYCLE = 1800 # seconds before recycling
Sizing guidance:
| Fleet size | Workers × Threads | Recommended DB_POOL_SIZE | DB_POOL_MAX_OVERFLOW |
|---|---|---|---|
| Up to 100 agents | 4 × 4 (16 conc) | 5 | 10 |
| 100-1000 agents | 8 × 4 (32 conc) | 10 (default) | 20 (default) |
| 1000-5000 agents | 12 × 8 (96 conc) | 20 | 30 |
| 5000-10000 agents | 16 × 8 (128 conc) | 30 | 50 |
Total connections to the DB ≈ (pool_size + max_overflow) × workers. Make sure your Postgres max_connections is high enough, or use PgBouncer in transaction-pooling mode in front to multiplex.
Failover behavior¶
pool_pre_ping=True is enabled, so SQLAlchemy verifies each connection before checkout. After a managed-DB failover (RDS Multi-AZ takes 30-60 seconds, Cloud SQL HA similar):
- Connections in use at failover raise an error to the request handler — users see HTTP 503.
- New requests pre_ping, find the connection dead, transparently reconnect to the new primary.
- End-to-end recovery within 60-90 seconds of failover start.
To mask the brief outage from end-users, place PgBouncer in front (it buffers reconnects and retries).
Connection resilience for transient errors¶
What SentriKat handles automatically:
- Stale connection (idle timeout, killed by DB): pre_ping detects, reconnects.
- Master failover (managed cloud HA): pre_ping detects, reconnects. In-flight requests fail.
- Brief network blip < 1 sec: pre_ping retries.
What requires operator intervention:
- Sustained DB outage > 5 minutes: SentriKat returns 503 errors. The "Background Health Checks" job (interval 30 min) detects FAIL and emits an alert via SMTP/webhook. Recovery is automatic when DB returns.
- Schema migration mid-failover: do not deploy a new SentriKat release during an active DB failover.
Network requirements¶
- Egress from SentriKat container to DB host on port 5432 (or custom).
- VPC peering / security group / firewall rules as appropriate.
- DNS resolution from inside the container — verify with
docker exec sentrikat getent hosts $DB_HOST.
Diagnostics¶
# Confirm DB connectivity
docker exec sentrikat python3 -c \
"from app import db, create_app; app=create_app(); ctx=app.app_context(); ctx.push(); print(db.engine.execute('SELECT version()').scalar())"
# Check active pool size
docker exec sentrikat python3 -c \
"from app import db, create_app; app=create_app(); ctx=app.app_context(); ctx.push(); print('size=', db.engine.pool.size(), 'checked_out=', db.engine.pool.checkedout())"
See also¶
- Logging & Observability — how DB-down is surfaced via the health-check log stream.
- Container User & Privilege Model — UID 999 ownership for the
/app/datavolume that holds the CA bundle.