Docker: Dockerfiles & CI/CD

Build optimized container images and integrate Docker into your continuous integration and deployment pipelines.

Writing Dockerfiles

A Dockerfile is a recipe for building a Docker image. It defines exactly what goes into your container: the operating system, libraries, configuration, and application code. When you share a Dockerfile, anyone can reproduce your exact environment.

Why Dockerfiles Matter

Consider the following benefits:

Reproducibility: The same Dockerfile always produces the same image (given the same build context)
Documentation: The Dockerfile serves as documentation for your environment
Automation: CI/CD pipelines can automatically build images from Dockerfiles
Version control: Track environment changes alongside code changes

Your First Dockerfile

A Dockerfile consists of instructions, each creating a layer in the final image. Here is a minimal but complete example:

FROM python:3.12-slim    # Start from an official base image
WORKDIR /app             # Set working directory
COPY requirements.txt .  # Copy dependency file first (caching)
RUN pip install -r requirements.txt  # Install dependencies
COPY . .                 # Copy application code
CMD ["python", "app.py"] # Default command when container starts

Build and run this with:

docker build -t my-app .
docker run -p 8080:80 my-app

Dockerfile Instructions Reference

Each instruction serves a specific purpose. Here is a quick reference:

Instruction	Purpose	Example
FROM	Base image to start from	`FROM python:3.12-slim`
WORKDIR	Set working directory	`WORKDIR /app`
COPY	Copy files from host to image	`COPY . /app`
RUN	Execute command during build	`RUN pip install -r requirements.txt`
CMD	Default command when container starts	`CMD ["python", "app.py"]`
ENTRYPOINT	Configure container as executable	`ENTRYPOINT ["./start.sh"]`
EXPOSE	Document which port the app uses	`EXPOSE 8080`
ENV	Set environment variable	`ENV NODE_ENV=production`
ARG	Build-time variable	`ARG VERSION=1.0`
USER	Run as specific user	`USER appuser`

CMD vs ENTRYPOINT

These two instructions are often confused. Here is when to use each:

CMD: Use when you want to provide defaults that can be easily overridden
ENTRYPOINT: Use when the container should always run a specific executable

# CMD: user can override with "docker run my-app /bin/bash"
CMD ["python", "app.py"]

# ENTRYPOINT: container always runs this, CMD provides default arguments
ENTRYPOINT ["python"]
CMD ["app.py"]  # User can override: docker run my-app other.py

COPY vs ADD

Prefer COPY unless you specifically need ADD’s features:

COPY: Simple file copy (recommended)
ADD: Also extracts tar files and downloads URLs (use sparingly)

Multistage Builds

Multi-stage builds solve a common problem: build tools make images large. Your Go compiler, Node.js build tools, or Java SDK add hundreds of megabytes that are not needed at runtime.

The solution: Use one stage to build, another to run. The final image only contains what is needed to run your application.

# Stage 1: Build (large image with all build tools)
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

# Stage 2: Run (small image with only runtime)
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html

Result: Final image is ~25MB instead of ~1GB.

Scenario	Without Multi-stage	With Multi-stage
Node.js app	~1 GB	~100 MB
Go binary	~800 MB	~10 MB
Java app	~500 MB	~200 MB

Best Practices

These practices will make your images smaller, more secure, and faster to build.

Image Size and Security

Practice	Why It Matters
Use minimal base images (alpine, slim)	Smaller attack surface, faster pulls
Run as non-root user	Prevents container escape attacks
Use specific version tags	Avoids surprise breaking changes
Scan images for vulnerabilities	Catches known security issues

Build Speed

Order your Dockerfile to maximize cache hits. Put things that change rarely at the top:

# Good: dependencies change less often than code
COPY package.json package-lock.json ./
RUN npm ci
COPY . .  # Code changes invalidate only this layer

# Bad: any code change reinstalls all dependencies
COPY . .
RUN npm ci

Layer Optimization

Combine related commands to reduce layers and image size:

# Good: single layer, cleanup in same layer
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*

# Bad: cleanup in separate layer does not reduce size
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*

Essential .dockerignore

Always create a .dockerignore file to exclude unnecessary files:

node_modules
.git
*.log
.env

Performance Optimization

Container performance depends on resource allocation, storage configuration, and network setup. Here are the key levers you can adjust.

Resource Limits

Always set resource limits in production to prevent runaway containers from affecting other services.

# Memory: set limit and reservation
docker run -d --memory="1g" --memory-reservation="750m" my-app

# CPU: limit to 2 cores
docker run -d --cpus="2" my-app

Build Performance

Use BuildKit cache mounts to dramatically speed up builds:

# syntax=docker/dockerfile:1
FROM golang:1.20 AS builder
RUN --mount=type=cache,target=/go/pkg/mod \
    --mount=type=cache,target=/root/.cache/go-build \
    go build -o app .

Network Performance

Need	Solution
Maximum performance	`--network host` (no isolation)
Good performance + isolation	Custom bridge network
Multi-host communication	Overlay network

Monitoring

# Real-time resource usage
docker stats

# Container-specific metrics
docker stats container-name --no-stream

For production monitoring, consider cAdvisor with Prometheus for detailed metrics and alerting.

Docker Swarm: Native Orchestration

Docker Swarm turns multiple Docker hosts into a single cluster. It handles service deployment, scaling, and rolling updates with built-in load balancing.

When to use Swarm vs Kubernetes:

Factor	Docker Swarm	Kubernetes
Complexity	Simple, quick to learn	Complex, steep learning curve
Setup time	Minutes	Hours to days
Scalability	Good for small/medium	Excellent for large scale
Feature set	Essential features	Comprehensive ecosystem
Best for	Small teams, simpler apps	Large teams, complex apps

Quick Start

# Initialize swarm on first manager
docker swarm init

# Join workers (run this on each worker node)
docker swarm join --token <token> <manager-ip>:2377

# Deploy a service with 3 replicas
docker service create --name web --replicas 3 -p 80:80 nginx

# Scale up
docker service scale web=5

Stack Deployment

For multi-service applications, use stack files (compose format):

# stack.yml
version: '3.8'
services:
  web:
    image: nginx:alpine
    deploy:
      replicas: 3
      update_config:
        parallelism: 1
        delay: 10s
    ports:
      - "80:80"
  api:
    image: my-api:latest
    deploy:
      replicas: 2

# Deploy and manage stack
docker stack deploy -c stack.yml myapp
docker stack services myapp
docker stack rm myapp

High Availability Tips

Use an odd number of managers (3, 5, or 7) for quorum
Distribute managers across availability zones
Use node labels and constraints for placement control

CI/CD Integration with Docker

Docker enables consistent builds across all CI/CD platforms. The pattern is always the same: build image, test, scan for vulnerabilities, push to registry, deploy.

GitHub Actions

The most common approach for GitHub-hosted projects:

# .github/workflows/docker.yml
name: Docker Build
on:
  push:
    branches: [main]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v4
    - uses: docker/setup-buildx-action@v3
    - uses: docker/login-action@v3
      with:
        registry: ghcr.io
        username: $
        password: $
    - uses: docker/build-push-action@v5
      with:
        push: true
        tags: ghcr.io/$:latest
        cache-from: type=gha
        cache-to: type=gha,mode=max

GitLab CI/CD

# .gitlab-ci.yml
build:
  image: docker:latest
  services:
    - docker:dind
  script:
    - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

Key CI/CD Practices

Practice	Why
Cache layers	Faster builds
Scan images	Catch vulnerabilities before deployment
Tag with commit SHA	Traceable deployments
Use multi-stage builds	Smaller production images
Avoid `latest` tag in production	Reproducible deployments