Docker: Dockerfiles & CI/CD
Build optimized container images and integrate Docker into your continuous integration and deployment pipelines.
Writing Dockerfiles
A Dockerfile is a recipe for building a Docker image. It defines exactly what goes into your container: the operating system, libraries, configuration, and application code. When you share a Dockerfile, anyone can reproduce your exact environment.
Why Dockerfiles Matter
Consider the following benefits:
- Reproducibility: The same Dockerfile always produces the same image (given the same build context)
- Documentation: The Dockerfile serves as documentation for your environment
- Automation: CI/CD pipelines can automatically build images from Dockerfiles
- Version control: Track environment changes alongside code changes
Your First Dockerfile
A Dockerfile consists of instructions, each creating a layer in the final image. Here is a minimal but complete example:
FROM python:3.12-slim # Start from an official base image
WORKDIR /app # Set working directory
COPY requirements.txt . # Copy dependency file first (caching)
RUN pip install -r requirements.txt # Install dependencies
COPY . . # Copy application code
CMD ["python", "app.py"] # Default command when container starts
Build and run this with:
docker build -t my-app .
docker run -p 8080:80 my-app
Dockerfile Instructions Reference
Each instruction serves a specific purpose. Here is a quick reference:
| Instruction | Purpose | Example |
|---|---|---|
| FROM | Base image to start from | FROM python:3.12-slim |
| WORKDIR | Set working directory | WORKDIR /app |
| COPY | Copy files from host to image | COPY . /app |
| RUN | Execute command during build | RUN pip install -r requirements.txt |
| CMD | Default command when container starts | CMD ["python", "app.py"] |
| ENTRYPOINT | Configure container as executable | ENTRYPOINT ["./start.sh"] |
| EXPOSE | Document which port the app uses | EXPOSE 8080 |
| ENV | Set environment variable | ENV NODE_ENV=production |
| ARG | Build-time variable | ARG VERSION=1.0 |
| USER | Run as specific user | USER appuser |
CMD vs ENTRYPOINT
These two instructions are often confused. Here is when to use each:
- CMD: Use when you want to provide defaults that can be easily overridden
- ENTRYPOINT: Use when the container should always run a specific executable
# CMD: user can override with "docker run my-app /bin/bash"
CMD ["python", "app.py"]
# ENTRYPOINT: container always runs this, CMD provides default arguments
ENTRYPOINT ["python"]
CMD ["app.py"] # User can override: docker run my-app other.py
COPY vs ADD
Prefer COPY unless you specifically need ADD’s features:
- COPY: Simple file copy (recommended)
- ADD: Also extracts tar files and downloads URLs (use sparingly)
Multistage Builds
Multi-stage builds solve a common problem: build tools make images large. Your Go compiler, Node.js build tools, or Java SDK add hundreds of megabytes that are not needed at runtime.
The solution: Use one stage to build, another to run. The final image only contains what is needed to run your application.
# Stage 1: Build (large image with all build tools)
FROM node:18 AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
# Stage 2: Run (small image with only runtime)
FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
Result: Final image is ~25MB instead of ~1GB.
| Scenario | Without Multi-stage | With Multi-stage |
|---|---|---|
| Node.js app | ~1 GB | ~100 MB |
| Go binary | ~800 MB | ~10 MB |
| Java app | ~500 MB | ~200 MB |
Best Practices
These practices will make your images smaller, more secure, and faster to build.
Image Size and Security
| Practice | Why It Matters |
|---|---|
| Use minimal base images (alpine, slim) | Smaller attack surface, faster pulls |
| Run as non-root user | Prevents container escape attacks |
| Use specific version tags | Avoids surprise breaking changes |
| Scan images for vulnerabilities | Catches known security issues |
Build Speed
Order your Dockerfile to maximize cache hits. Put things that change rarely at the top:
# Good: dependencies change less often than code
COPY package.json package-lock.json ./
RUN npm ci
COPY . . # Code changes invalidate only this layer
# Bad: any code change reinstalls all dependencies
COPY . .
RUN npm ci
Layer Optimization
Combine related commands to reduce layers and image size:
# Good: single layer, cleanup in same layer
RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/*
# Bad: cleanup in separate layer does not reduce size
RUN apt-get update
RUN apt-get install -y curl
RUN rm -rf /var/lib/apt/lists/*
Essential .dockerignore
Always create a .dockerignore file to exclude unnecessary files:
node_modules
.git
*.log
.env
Performance Optimization
Container performance depends on resource allocation, storage configuration, and network setup. Here are the key levers you can adjust.
Resource Limits
Always set resource limits in production to prevent runaway containers from affecting other services.
# Memory: set limit and reservation
docker run -d --memory="1g" --memory-reservation="750m" my-app
# CPU: limit to 2 cores
docker run -d --cpus="2" my-app
Build Performance
Use BuildKit cache mounts to dramatically speed up builds:
# syntax=docker/dockerfile:1
FROM golang:1.20 AS builder
RUN --mount=type=cache,target=/go/pkg/mod \
--mount=type=cache,target=/root/.cache/go-build \
go build -o app .
Network Performance
| Need | Solution |
|---|---|
| Maximum performance | --network host (no isolation) |
| Good performance + isolation | Custom bridge network |
| Multi-host communication | Overlay network |
Monitoring
# Real-time resource usage
docker stats
# Container-specific metrics
docker stats container-name --no-stream
For production monitoring, consider cAdvisor with Prometheus for detailed metrics and alerting.
Docker Swarm: Native Orchestration
Docker Swarm turns multiple Docker hosts into a single cluster. It handles service deployment, scaling, and rolling updates with built-in load balancing.
When to use Swarm vs Kubernetes:
| Factor | Docker Swarm | Kubernetes |
|---|---|---|
| Complexity | Simple, quick to learn | Complex, steep learning curve |
| Setup time | Minutes | Hours to days |
| Scalability | Good for small/medium | Excellent for large scale |
| Feature set | Essential features | Comprehensive ecosystem |
| Best for | Small teams, simpler apps | Large teams, complex apps |
Quick Start
# Initialize swarm on first manager
docker swarm init
# Join workers (run this on each worker node)
docker swarm join --token <token> <manager-ip>:2377
# Deploy a service with 3 replicas
docker service create --name web --replicas 3 -p 80:80 nginx
# Scale up
docker service scale web=5
Stack Deployment
For multi-service applications, use stack files (compose format):
# stack.yml
version: '3.8'
services:
web:
image: nginx:alpine
deploy:
replicas: 3
update_config:
parallelism: 1
delay: 10s
ports:
- "80:80"
api:
image: my-api:latest
deploy:
replicas: 2
# Deploy and manage stack
docker stack deploy -c stack.yml myapp
docker stack services myapp
docker stack rm myapp
High Availability Tips
- Use an odd number of managers (3, 5, or 7) for quorum
- Distribute managers across availability zones
- Use node labels and constraints for placement control
CI/CD Integration with Docker
Docker enables consistent builds across all CI/CD platforms. The pattern is always the same: build image, test, scan for vulnerabilities, push to registry, deploy.
GitHub Actions
The most common approach for GitHub-hosted projects:
# .github/workflows/docker.yml
name: Docker Build
on:
push:
branches: [main]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: docker/setup-buildx-action@v3
- uses: docker/login-action@v3
with:
registry: ghcr.io
username: $
password: $
- uses: docker/build-push-action@v5
with:
push: true
tags: ghcr.io/$:latest
cache-from: type=gha
cache-to: type=gha,mode=max
GitLab CI/CD
# .gitlab-ci.yml
build:
image: docker:latest
services:
- docker:dind
script:
- docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
- docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
Key CI/CD Practices
| Practice | Why |
|---|---|
| Cache layers | Faster builds |
| Scan images | Catch vulnerabilities before deployment |
| Tag with commit SHA | Traceable deployments |
| Use multi-stage builds | Smaller production images |
Avoid latest tag in production |
Reproducible deployments |