Docker Best Practices for Production in 2026: 15 Rules That Matter

Getting Docker to work on your laptop is straightforward. Getting it to work reliably in production — under load, with secrets, after a team member makes an unreviewed change at 11pm — is a different problem entirely. Most Docker tutorials stop at "it runs." Production requires that it runs correctly, securely, and predictably for months without manual intervention.

These 15 practices come from the gap between "works locally" and "works in production." They're grouped by concern: image efficiency, security, observability, and deployment hygiene. Skip the ones irrelevant to your stack, but understand why each one exists before you dismiss it.

Image Size and Build Efficiency (Rules 1–5)

Rule 1: Use multi-stage builds to keep production images small

A naive Node.js Dockerfile that installs all dependencies (including devDependencies), runs the TypeScript compiler, and copies the whole project directory produces images around 1.2GB. A multi-stage Dockerfile uses a separate builder stage: stage 1 installs all dependencies and runs the build, stage 2 starts from a clean base image and copies only the compiled output. The resulting image is 150–200MB. Smaller images pull faster, start faster, consume less disk on your servers, and have a smaller attack surface. For teams deploying to AWS ECS or a Kubernetes cluster, smaller images directly reduce deployment time — pulling a 200MB image over a Mumbai region connection is meaningfully faster than pulling 1.2GB.

# Stage 1: Build
FROM node:20.11-alpine3.19 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Stage 2: Production
FROM node:20.11-alpine3.19
WORKDIR /app
COPY package*.json ./
RUN npm ci --omit=dev
COPY --from=builder /app/dist ./dist
CMD ["node", "dist/index.js"]

Rule 2: Use specific version tags, never latest

Pinning to node:20.11-alpine3.19 instead of node:latest ensures the same base image is used across every environment — local, CI, production — indefinitely. When latest updates (a new Node.js minor, a new Alpine release), your image builds change without any code changes, which breaks reproducibility and makes debugging regressions much harder. Specific tags also make security scanning more meaningful: you know exactly what version you scanned.

Rule 3: Use Alpine or Distroless base images

Alpine Linux is approximately 5MB as a base image, versus 120MB for Debian-based images. Alpine includes only essential packages, which reduces the number of CVEs (vulnerabilities) present in the image. For Go binaries and Java applications, consider Google's Distroless images — they contain only the runtime and the application, with no shell, no package manager, and no tools an attacker could exploit after gaining container access. The trade-off: Alpine uses musl libc instead of glibc, which can cause compatibility issues with some native Node.js modules. Test Alpine compatibility early in the project, not after you've built six months of code.

Rule 4: Order Dockerfile layers by change frequency

Docker caches each layer. When a layer changes, all subsequent layers are rebuilt. This means the order of your Dockerfile instructions determines how much of the cache is invalidated on each build. The correct order for a Node.js application: copy package.json and package-lock.json first, run npm ci, then copy your application source code. With this order, changing application code (which happens constantly) only invalidates the COPY . . layer and onwards — the npm ci layer, which takes the most time, is reused from cache. Reversing this order (copying all source first) means every code change triggers a full npm ci, turning 30-second builds into 4-minute builds.

Rule 5: Use a .dockerignore file

Without a .dockerignore, the Docker build context sent to the Docker daemon includes everything in your project directory: node_modules (potentially 500MB), .git history, .env files, test fixtures, and documentation. The .dockerignore file works like .gitignore and excludes these from the build context. A basic .dockerignore for a Node.js project: node_modules, .git, .env*, *.test.ts, coverage/, *.md. This reduces build context from 500MB to under 5MB and prevents .env files from accidentally being included in layers.

Security in Production Containers (Rules 6–10)

Rule 6: Never run containers as root

By default, Docker containers run as root. If an attacker exploits a vulnerability in your application and gains code execution, they execute as root inside the container — and depending on the container configuration, may be able to escape to the host. Adding USER node (for Node.js images) or USER 1001 before your CMD instruction drops to a non-privileged user. Most security breaches that involve container escape specifically rely on the container running as root. This single line closes that avenue.

# Create a non-root user and switch to it
RUN addgroup --system appgroup && adduser --system --ingroup appgroup appuser
USER appuser
CMD ["node", "dist/index.js"]

Rule 7: Inject secrets at runtime, never bake them into the image

A Dockerfile with ENV API_KEY=abc123xyz or ARG DATABASE_PASSWORD=secretpass permanently embeds those values into the image layer history — visible to anyone who can run docker history myimage. The correct approach: pass secrets at runtime via environment variables (docker run -e API_KEY=$API_KEY), Docker Secrets (for Swarm), or Kubernetes Secrets. For applications in production, use a secrets manager: AWS Parameter Store (free tier available), HashiCorp Vault, or Doppler. These inject secrets as environment variables at container startup without storing them in the image.

Rule 8: Scan images for vulnerabilities before deploying

Docker Scout (built into Docker Desktop) and Trivy (open source, free) both scan container images against known CVE databases. Integrate the scan into your CI/CD pipeline so a vulnerability above a severity threshold blocks the deployment. Command: trivy image --exit-code 1 --severity HIGH,CRITICAL myapp:latest. This fails the CI job if any HIGH or CRITICAL CVEs are found. Run this after building the image, before pushing to the registry. Most Indian development teams skip this step entirely; adding it takes 2 minutes in a GitHub Actions workflow and has caught real vulnerabilities in base images.

Rule 9: Set memory and CPU limits on every container

Without resource limits, a memory leak or CPU-intensive bug in one container can starve other containers on the same host — potentially bringing down your entire application stack. Set limits at the container level: --memory 512m --cpus 0.5 (Docker CLI) or the equivalent in your docker-compose.yml under deploy.resources.limits. When a container exceeds its memory limit, Docker kills and restarts it rather than allowing it to consume the host's memory. This containment is exactly what you want: controlled failure of one component instead of cascading failure across all services.

Rule 10: Use a read-only filesystem where possible

Starting a container with --read-only prevents any process inside from writing to the container filesystem at runtime. This means a compromised container cannot modify application code, install tools, or write backdoors to disk. For directories that legitimately need writes (log files, temp files), mount a tmpfs volume at those specific paths: --tmpfs /tmp:rw,size=64m. Most well-written applications — particularly stateless APIs — function perfectly with a read-only root filesystem. It takes 30 minutes to test and configure, and eliminates an entire category of post-exploitation techniques.

Health, Logging, and Observability (Rules 11–13)

Rule 11: Add a HEALTHCHECK to every Dockerfile

Without a HEALTHCHECK, Docker (and orchestrators like ECS and Kubernetes) consider a container "healthy" the moment it starts — even if the application inside is crashing in a loop or failing to respond. A HEALTHCHECK tells the orchestrator how to verify the application is actually responding. Containers that fail the health check are taken out of the load balancer rotation and restarted. Without this, traffic is routed to broken containers.

HEALTHCHECK --interval=30s --timeout=5s --start-period=15s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

Rule 12: Log to stdout and stderr, not to files

Docker automatically captures stdout and stderr from each container and routes them to the configured logging driver (CloudWatch, Loki, Datadog, or just docker logs). Applications that write logs to files inside the container require a separate volume mount, log rotation configuration, and a log shipping agent — all of which add complexity without adding value. Configure your application to write logs to stdout and let Docker handle the rest. In Node.js: console.log() and console.error() write to stdout and stderr by default. In Python: configure the logging module to use a StreamHandler pointing to sys.stdout.

Rule 13: Set the container timezone for Indian deployments

Docker containers default to UTC. Without explicit timezone configuration, log timestamps, cron schedules, and any time-based business logic inside the container run on UTC — 5 hours and 30 minutes behind IST. This means a transaction logged at "03:30 UTC" happened at "09:00 IST" — workable but confusing for Indian operations teams debugging incidents at midnight IST. Add one line to your Dockerfile: ENV TZ=Asia/Kolkata. This sets the timezone for all processes inside the container. For scheduled tasks and logs, IST timestamps align immediately with what your team expects.

Orchestration and Deployment (Rules 14–15)

Rule 14: Use Docker Compose for local development consistency

A docker-compose.yml checked into the repository should mirror the production service topology: the application container, a database, a Redis cache, and any background workers. When a developer clones the repository and runs docker compose up, they get an environment identical to production in terms of service dependencies and networking. This eliminates "works on my machine" problems caused by differences in locally installed database versions, Redis configurations, or environment variables. Version-control the docker-compose.yml alongside the application code and treat it as part of the codebase, not a personal configuration file.

Rule 15: Tag images semantically, not with :latest

Deploying with a tag like myapp:latest means you cannot roll back to exactly the version that was running before a bad deployment. If something goes wrong, "the previous version" is ambiguous. Tag images with the git commit SHA (myapp:abc123def) or a semantic version (myapp:1.4.2). Your CI/CD pipeline should automatically tag each build with the commit SHA and push it to the container registry. To roll back: re-deploy the previous SHA tag. This is deterministic — you know exactly what code is in myapp:abc123def because you can check out that commit in git.

Frequently Asked Questions

Should I use Docker Compose in production?

Docker Compose is a solid choice for single-server production deployments where all services run on one machine — a DigitalOcean Droplet at ₹1,500 per month running a web application alongside PostgreSQL and Redis, for example. The setup is straightforward, the configuration is version-controlled, and restarts after reboots are handled with a simple restart policy. For multi-server deployments or requirements for horizontal auto-scaling, you need either Kubernetes (more operational complexity, more power) or AWS ECS (easier to operate, integrates well with AWS services). For most Indian startup MVPs with moderate and predictable load, Docker Compose on a single VPS is the pragmatic choice — add orchestration when you actually need it, not before.

How do I handle database migrations with Docker in production?

The safest pattern is to run migrations as a distinct step before the application containers start, not as part of the application's startup sequence. In Docker Compose, this means a migration service with a depends_on condition that must exit successfully before the app container starts. In ECS or Kubernetes, it's an init container or a task that runs before the deployment proceeds. Running migrations inside the application's startup command is dangerous when multiple replicas start simultaneously: all three replicas hit the database with the same migration at the same time, causing conflicts. Recommended tools by stack: Flyway and Liquibase for Java, golang-migrate for Go, Prisma Migrate for Node.js. Integrate the migration run into your CI/CD pipeline as a named step so you can see it succeed or fail separately from the deployment.

What's the right Docker setup for a Kerala-based development team with mixed Windows and Mac laptops?

Install Docker Desktop on all machines — the Windows version should use the WSL2 backend rather than Hyper-V for better performance and compatibility. The Dockerfile and docker-compose.yml work identically on Windows and Mac, provided you avoid hardcoded Unix path separators and use relative paths. The single most common cross-platform issue is line endings: shell scripts written or modified on Windows use CRLF line endings, which cause the container's Linux shell to fail with cryptic errors like "exec format error." Solve this permanently by adding a .gitattributes file to the repository with the line *.sh text eol=lf. This instructs git to always use LF line endings for shell scripts regardless of the developer's operating system. Treat the CI/CD pipeline (which always runs Linux) as the authoritative environment — if it builds and runs there, it's correct.