Blog
Notes from the cloud floor
Practical writing on managing AWS, GCP and Azure at scale.
May 30, 2026 · 1 min read
MySQL to Postgres migration — when it's worth it, the gotchas, and the pgloader workflow
We've migrated databases both ways. Here's the honest decision framework, the data-type traps that bite every team, and the migration runbook we've refined across customer engagements.
Read articleMay 28, 2026 · 1 min read
RHEL 9 to RHEL 10 with Leapp — the pre-flight checks and the gotchas we hit
In-place major version upgrades are now genuinely viable on RHEL. They are not, however, fire-and-forget. Here's the Leapp workflow we run, the issues we surface, and when we still prefer fresh installs.
Read articleMay 28, 2026 · 1 min read
OpenTelemetry for Node.js — the wiring that actually works in production
OpenTelemetry has won the distributed tracing argument. Here's how we instrument Node services, how we export to OTLP, and the mistakes we've already made so you don't have to.
Read articleMay 27, 2026 · 1 min read
WP-CLI operations at scale — running 200 WordPress sites from one terminal
The WP-CLI patterns we use to operate hundreds of WordPress sites without losing our minds — the multi-site loop, the dry-run discipline, and the audit script.
Read articleMay 27, 2026 · 1 min read
Postgres replication patterns in 2026 — Patroni, managed services, and the failover story
When to roll your own Patroni cluster, when to use managed Postgres, and the failover semantics nobody explains until production breaks.
Read articleMay 27, 2026 · 1 min read
The Laravel migrations that break production — and the safe patterns we use instead
Rename column, drop column, change type, add NOT NULL — every one of these has a 'works on staging, breaks at midnight on production' failure mode.
Read articleMay 26, 2026 · 1 min read
SELinux in production — the workflow that actually works, and the AVC denials we keep finding
Setenforce 0 is not a strategy. Here's the SELinux workflow we use on every RHEL host we manage, including the custom policy modules and the debugging steps in order.
Read articleMay 26, 2026 · 1 min read
Upgrading to PHP 8.3 in production — the migration playbook for Laravel, Symfony, and WordPress
PHP 8.3 is mature, fast, and the deprecation surface from 8.1/8.2 is small but sharp. Here is the staged playbook we use to move customer fleets across without an incident.
Read articleMay 26, 2026 · 1 min read
The npm supply chain in 2026 — lockfiles, sigstore, Socket, and the attacks we've seen
npm is the largest software supply chain in history and the most attacked. Here's the threat model in 2026 and the controls we ship on every managed Node.js stack.
Read articleMay 26, 2026 · 1 min read
MySQL slow query tuning — the EXPLAIN-driven workflow we use on customer databases
Slow query logs, EXPLAIN ANALYZE, performance_schema, and the seven antipatterns we find on almost every audit.
Read articleMay 26, 2026 · 1 min read
Multi-arch Docker builds in 2026 — shipping ARM and x86 from the same pipeline
Graviton, Ampere, and Apple Silicon make ARM real in production. Here's how we build multi-arch images that work everywhere, without 3x the build time.
Read articleMay 24, 2026 · 1 min read
Nginx vs HAProxy vs Envoy — an honest 2026 comparison
Three excellent proxies, three different sweet spots. Where we deploy each one for customers, and the failure modes that decide which to pick.
Read articleMay 24, 2026 · 1 min read
EKS vs GKE vs AKS in 2026 — an honest field comparison
We run all three for customers. Here's where each one quietly wins, where it loses, and the decision framework we actually use.
Read articleMay 23, 2026 · 1 min read
Postgres autovacuum, demystified — the tuning that prevents 3am wraparound panic
Autovacuum failures are silent until they aren't. Here's how it actually works, the metrics that matter, and the per-table tuning we apply on customer databases.
Read articleMay 23, 2026 · 1 min read
PM2 vs cluster vs containers — how we run Node.js in 2026
PM2 was the right answer in 2018. The cluster module was the right answer before that. In 2026 the answer depends on what you're optimising for.
Read articleMay 22, 2026 · 1 min read
Migrating a 50GB WooCommerce site with zero downtime — the runbook we use
Big WooCommerce migrations fail in predictable ways. Here's the runbook we follow, the gotchas to plan for, and the cutover script that ties it together.
Read articleMay 22, 2026 · 1 min read
Composer supply chain in 2026 — the audits, locks, and signing controls we ship by default
Composer is the single largest entry point into PHP applications. After three years of attacks on Packagist, the controls every PHP shop should have are no longer optional.
Read articleMay 22, 2026 · 1 min read
Nginx, HTTP/3, and a TLS config that's actually current for 2026
QUIC support, TLS 1.3, OCSP stapling, cipher hardening, and the small details that decide whether your edge gets an A+ or a C on every TLS scanner.
Read articleMay 22, 2026 · 1 min read
From Docker Compose to Kubernetes — the migration that doesn't have to be painful
A staged migration playbook from docker-compose to Kubernetes, including the patterns that translate cleanly and the ones that need rethinking.
Read articleMay 21, 2026 · 1 min read
kpatch on RHEL — patching kernel CVEs without the reboot
Live kernel patching is real, supported, and useful. It's also not a silver bullet. Here's how we use kpatch in production and where we still reboot.
Read articleMay 21, 2026 · 1 min read
Postgres connection pooling with PgBouncer — the patterns we run in production
Transaction mode, session mode, prepared statements, and the cluster topology decisions that determine whether PgBouncer helps or hurts.
Read articleMay 21, 2026 · 1 min read
The Node.js memory leak playbook — heap snapshots, clinic.js, and the four patterns we keep finding
Most Node.js memory leaks aren't exotic. They're the same handful of patterns appearing in production again and again. Here's how we diagnose them.
Read articleMay 20, 2026 · 1 min read
Layered rate limiting in Nginx — from limit_req_zone to Cloudflare and back
How we stack edge, perimeter, and origin rate limiting to absorb scrapers, brute-force attempts, and the occasional volumetric DDoS without paging the on-call.
Read articleMay 20, 2026 · 1 min read
Zero-downtime Laravel deploys — the atomic-symlink pipeline that keeps queues honest
What it actually takes to deploy Laravel without dropping requests or losing jobs: atomic releases, the queue worker dance, OPcache reset timing, and the Envoyer-style pipeline we ship.
Read articleMay 20, 2026 · 1 min read
The CIS-aligned Kubernetes security baseline we ship on day one
Pod Security Standards, Kyverno policies, NetworkPolicies, audit logging — the controls we apply to every customer cluster before workloads arrive.
Read articleMay 19, 2026 · 1 min read
OPcache and JIT in PHP 8.3 production — what actually moves the needle
OPcache is mandatory. JIT is conditional. Here is the production config we ship, the JIT mode debate settled with numbers, and the workloads where JIT genuinely hurts.
Read articleMay 19, 2026 · 1 min read
MySQL backups that actually restore — XtraBackup, binlogs, and the quarterly drill
mysqldump is not a production backup strategy. Here's the Percona XtraBackup + binlog PITR setup we deploy and the restore drill that keeps it honest.
Read articleMay 19, 2026 · 1 min read
Hardening WordPress in 2026 — the checklist we actually run on customer sites
Most WordPress security guides are 80% noise. Here are the controls that actually stop the attacks we see every week.
Read articleMay 19, 2026 · 1 min read
Dockerfile best practices in 2026 — the patterns that actually matter
Most Dockerfile guides are stale. Here are the patterns that pay off in production: multi-stage builds, build cache mounts, distroless bases, and the rootless story.
Read articleMay 18, 2026 · 1 min read
The Nginx reverse proxy patterns we actually run in production
Upstream blocks, keepalive tuning, header forwarding, and the X-Forwarded-For chain. The reverse-proxy config we copy onto every customer's edge.
Read articleMay 18, 2026 · 1 min read
A pragmatic Argo CD setup — GitOps that survives contact with reality
GitOps is sold as magic. In practice the magic happens when your repo structure, sync waves, and secrets strategy all work together. Here's the layout we run.
Read articleMay 17, 2026 · 1 min read
Migrating Apache to Nginx — the translation patterns and playbook we use
Most Apache-to-Nginx migrations get stuck on .htaccess. Here's the translation table, the gotchas, and the playbook that gets a real site cut over without surprises.
Read articleMay 16, 2026 · 1 min read
Ubuntu Server 24.04 fresh-install hardening checklist
The exact steps we run on every new Ubuntu 24.04 host before any workload arrives — SSH, UFW, fail2ban, AppArmor, auditd, and the small details that actually matter.
Read articleMay 15, 2026 · 1 min read
FastAPI on Kubernetes — the production deployment we ship by default
Pydantic v2, ASGI server choice, OpenAPI in CI, health checks, and the Kubernetes manifests we apply to every new FastAPI service.
Read articleMay 15, 2026 · 1 min read
Applying the CIS Ubuntu benchmark — the controls that matter and the ones we skip
A pragmatic walk through CIS Ubuntu 22.04 and 24.04 Level 1 and Level 2: which controls move attacker economics, which produce yellow ticks for auditors, and how to audit at scale.
Read articleMay 15, 2026 · 1 min read
Apache TLS hardening in 2026 — ciphers, OCSP stapling, and the cert renewal pipeline
TLS 1.3 is the default, but most Apache installs still have config from the 2018 ciphersuite wars. Here's what actually belongs in your SSL config today.
Read articleMay 14, 2026 · 1 min read
Canonical Livepatch in production — patching kernel CVEs without rebooting
How Livepatch actually works, what it can and can't patch, the Pro subscription economics, and the alternatives if you can't or won't use it.
Read articleMay 14, 2026 · 1 min read
Managing RHEL at scale — Satellite, content views, and the lifecycle we actually ship
subscription-manager is fine until you have 300 hosts. Here's the Satellite layout that keeps RHEL fleets sane, patched and auditable.
Read articleMay 14, 2026 · 1 min read
Python dependency security in 2026 — pip-audit, lockfiles, and the PyPI attacks we keep seeing
Supply-chain attacks on PyPI are routine now. Here is the toolchain we run, the lockfile discipline we enforce, and the alerts we actually act on.
Read articleMay 14, 2026 · 1 min read
ModSecurity and the OWASP CRS — the WAF rules we actually ship on Apache
Most ModSecurity installs are either off by default or so noisy nobody reads the logs. Here's how we tune it to be useful without drowning in false positives.
Read articleMay 13, 2026 · 1 min read
Configuring unattended-upgrades on Ubuntu the way production actually needs it
Which security patches you want auto-applied, which ones you don't, and how we handle reboots across a fleet of thousands of servers.
Read articleMay 13, 2026 · 1 min read
Celery in production — broker choice, retry semantics, and what Flower actually tells you
Redis vs RabbitMQ, idempotent tasks, the retry backoff we apply by default, and the monitoring that catches problems before users notice.
Read articleMay 13, 2026 · 1 min read
MySQL high availability in 2026 — Galera, InnoDB Cluster, or async replicas?
Three real HA approaches for MySQL, what each actually buys you, and the decision tree we use when standing up customer clusters.
Read articleMay 13, 2026 · 1 min read
Laravel Octane in production — RoadRunner vs Swoole vs FrankenPHP
We've migrated dozens of Laravel apps to Octane on three different runtimes. Here's how RoadRunner, Swoole, and FrankenPHP compare on throughput, memory, deployment ergonomics, and failure modes.
Read articleMay 13, 2026 · 1 min read
Apache MPM event in 2026 — sizing the thread pool we actually run
Prefork is a museum piece. Worker is fine. Event is what you want — and most Apache installs we audit have it tuned wrong.
Read articleMay 12, 2026 · 1 min read
The WordPress cache stack that actually survives Black Friday
Page cache, object cache, opcode cache, edge cache — what each one buys you, where they fight, and the layering order that holds up at peak.
Read articleMay 12, 2026 · 1 min read
Gunicorn and Uvicorn in production — the worker tuning we actually apply
Sync vs async, CPU count math, the Gunicorn-with-Uvicorn-workers pattern, and the timeouts that keep Python services healthy under load.
Read articleMay 12, 2026 · 1 min read
PHP-FPM pool tuning in production — the static vs dynamic vs ondemand decision
Most PHP performance problems aren't PHP. They're pool sizing, process manager mode, and a slowlog nobody reads. Here's how we tune PHP-FPM on real workloads.
Read articleMay 10, 2026 · 1 min read
Run a PostgreSQL PITR restore drill every week (here's our runbook)
Backups you've never restored are not backups. Our weekly point-in-time recovery drill — what it tests, what we automate, and what we still do by hand.
Read articleMay 8, 2026 · 1 min read
Six Kubernetes cost leaks we find on almost every cluster
Idle namespaces, oversized requests, EBS snapshot sprawl, NAT egress bills — the recurring ways K8s burns 25-40% of your compute budget.
Read articleMay 6, 2026 · 1 min read
Laravel Horizon in production — sizing workers, surviving Redis, and the retry strategy
What we've learned running Horizon for Laravel customers handling millions of jobs a day: worker autoscaling, queue isolation, retry semantics, and the configuration mistakes that quietly burn money.
Read articleMay 6, 2026 · 2 min read
A practical Docker image supply chain: signed, scanned, attested
Cosign, Trivy, SBOMs and admission policies — the minimum container supply-chain setup we ship on every customer cluster.
Read article