Container Security: A Principal Engineer's Reference

How to harden a Docker container at runtime and at build time. This document assumes you are running workloads inside the Lima guest provisioned by lima-qemu-dockerd.yaml (Ubuntu + rootless dockerd), but every docker run flag and pattern here applies on any Linux host with Docker Engine.

For AppArmor profile mechanics, see apparmor.md. For why the daemon itself runs unprivileged inside a VM, see Rootless Docker in lima.md.

0. Where security lives in the stack
1. Threat model
2. Image and supply chain
3. Runtime hardening
4. Network isolation
5. Secrets and configuration
6. Compose and orchestrator manifests
7. Verification and auditing
8. Hardened example
9. Design-review checklist
See also

0. Where security lives in the stack

Containers are not VMs. Namespaces and cgroups provide process isolation, but the kernel is still shared. Security is layered — each outer ring contains and constrains everything inside it:

flowchart TB
  subgraph macos["macOS host — no direct container kernel access"]
    subgraph lima["Lima guest VM (Ubuntu) — AppArmor enabled · own kernel · own disk"]
      subgraph dockerd["dockerd (rootless in this repo) — socket access ≠ host root; daemon runs in user namespace"]
        subgraph runc["Container runtime (runc) — user namespace · seccomp · AppArmor · dropped caps"]
          app["Your application process"]
        end
      end
    end
  end

Each layer shrinks blast radius. A hardened container inside rootless dockerd inside a Lima VM is materially safer than --privileged on a rootful daemon on bare metal — but no layer replaces the others. Harden the container itself regardless of outer wrappers.

1. Threat model

Know what you are defending against:

Threat	What helps
Compromised application (RCE, SSRF)	Non-root user, read-only rootfs, seccomp, AppArmor, network policies
Container escape to host	Drop `CAP_SYS_ADMIN`, no `--privileged`, keep kernel patched, LSM profiles
Lateral movement between containers	Custom bridge networks, no `--network host`, firewall rules in guest
Supply-chain compromise (malicious image/layer)	Pin digests, scan images, minimal base images, verify signatures
Data exfiltration via mounted volumes	Mount only what is needed, read-only where possible, never mount `docker.sock`
Denial of service	`--memory`, `--cpus`, `--pids-limit`

Non-goal: containers cannot sandbox a workload that requires full kernel access (e.g. a kernel module loader). Those belong on bare metal or a dedicated VM with a narrow scope — not in a shared Docker host.

2. Image and supply chain

Security starts before docker run.

Use minimal, pinned images

# Pin by digest, not just tag
docker pull nginx@sha256:abc123…

# Prefer slim or distroless bases over full OS images
# Good: gcr.io/distroless/static, debian:bookworm-slim, alpine:3.20
# Avoid: ubuntu:latest with 200+ packages you never use

Build as non-root

# Dockerfile — run the process as an unprivileged user
FROM debian:bookworm-slim
RUN groupadd -r app && useradd -r -g app -d /app -s /sbin/nologin app
COPY --chown=app:app . /app
USER app
WORKDIR /app
ENTRYPOINT ["./chatbot-app"]

Scan before deploy

# inside the Lima guest
docker scout cves chatbot-app:latest          # Docker Scout (if enabled)
trivy image chatbot-app:latest                # Aqua Trivy — common in CI

Do not bake secrets into images

Environment variables and ARG values end up in image history and layer caches. Inject secrets at runtime (see §5).

3. Runtime hardening

These are the highest-leverage docker run flags. Apply all that your workload can tolerate; drop only what you have tested.

Run as a non-root user

docker run --user 1000:1000 chatbot-app:latest
# or rely on the image's USER directive (preferred — enforced at build time)

Root inside a container is still root in the container's user namespace. Combined with a misconfigured capability or kernel bug, that is unnecessary risk.

Read-only root filesystem

docker run --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64m \
  --tmpfs /run:rw,nosuid,size=16m \
  chatbot-app:latest

Forces attackers who gain write access to stay in ephemeral tmpfs mounts instead of patching binaries on the overlay layer.

Drop Linux capabilities

Docker grants a default capability set. Drop everything you do not need, then add back surgically:

# Drop all, add only what the workload requires
docker run \
  --cap-drop ALL \
  --cap-add NET_BIND_SERVICE \
  chatbot-app:latest

Capability	Risk if kept unnecessarily
`CAP_SYS_ADMIN`	Near-equivalent to root; mount, namespace manipulation
`CAP_NET_RAW`	Raw sockets; ARP/DNS spoofing
`CAP_SYS_PTRACE`	Debug/trace other processes
`CAP_DAC_READ_SEARCH`	Bypass file read permission checks

Never use --privileged. It grants every capability, disables seccomp, and disables AppArmor confinement. It is for one-off debugging on a throwaway VM, not production.

Seccomp

Seccomp — short for secure computing mode — is a Linux kernel facility that filters which system calls a process may invoke. Docker applies a default seccomp profile (default.json) that blocks ~40 dangerous syscalls. Keep it unless you have a tested replacement:

# Default — always prefer this
docker run chatbot-app:latest

# Custom profile (must be a JSON file on the host, passed into the guest path)
docker run --security-opt seccomp=/path/to/custom.json chatbot-app:latest

# Unconfined — blocks nothing; avoid in production
docker run --security-opt seccomp=unconfined chatbot-app:latest

If a container fails with Operation not permitted on a specific syscall, tighten a custom profile to allow that syscall — do not jump to unconfined.

AppArmor

Docker applies the docker-default profile automatically inside the Lima guest. Override only with a purpose-built profile:

docker run --security-opt apparmor=my-custom-profile chatbot-app:latest

See apparmor.md for profile authoring and the docker-default rule set.

Resource limits

Prevent one container from starving the guest (or triggering OOM kills that take down neighbors):

docker run \
  --memory 512m \
  --memory-swap 512m \
  --cpus 1.0 \
  --pids-limit 100 \
  chatbot-app:latest

Disable unnecessary kernel features

docker run \
  --security-opt no-new-privileges:true \
  chatbot-app:latest

no-new-privileges prevents a process from gaining more privileges via setuid binaries or file capabilities — a common escalation path after an initial compromise.

Volume mounts — least privilege

# Read-only application data
docker run -v /data/app:/data:ro chatbot-app:latest

# Never do this in production
docker run -v /var/run/docker.sock:/var/run/docker.sock chatbot-app:latest   # host takeover
docker run -v /:/host:rw chatbot-app:latest                                   # full host FS

Bind-mounting the project source from Lima's shared macOS mount (/Users/...) into a container is fine for development; for anything resembling production, keep data on the guest's native ext4 disk (see file-sharing.md).

4. Network isolation

Use a dedicated bridge network

docker network create app-net
docker run --network app-net --name api api:latest
docker run --network app-net --name db  postgres:16

Containers on the default bridge can reach each other by IP. A custom network gives you name-based DNS and explicit membership.

Avoid host networking unless required

# Exposes the container directly on the guest's network stack — no NAT, no port mapping
docker run --network host chatbot-app:latest   # rarely justified

Host networking removes network-namespace isolation. A compromised process can bind to ports and sniff traffic on the guest's interfaces.

Publish only the ports you need

docker run -p 127.0.0.1:8080:8080 chatbot-app:latest   # bind to guest loopback only

Inside Lima, published ports are forwarded to macOS via the Lima control plane. Binding to 127.0.0.1 in the guest limits exposure to processes on the guest itself.

Inter-container egress

For strict environments, attach containers to an internal network with no default route to the internet, and route outbound traffic through an explicit proxy or sidecar.

5. Secrets and configuration

Approach	Guidance
`docker run -e SECRET=…`	Visible in `docker inspect`, process env, `/proc/1/environ`. OK for local dev only.
Docker secrets (Swarm)	Encrypted at rest, mounted as files. Not available in plain `docker run`.
Bind-mounted secret files	`docker run -v /run/secrets/db-password:/run/secrets/db-password:ro` with mode `0400` on the host file.
External secret managers	Vault, AWS SM, 1Password Connect — fetch at entrypoint, never persist to disk.

Pattern: read secrets from a file at startup, not from environment variables:

docker run \
  -v /run/secrets/token:/run/secrets/token:ro \
  -e SECRET_FILE=/run/secrets/token \
  chatbot-app:latest

6. Compose and orchestrator manifests

Translate the same principles into declarative config.

Docker Compose (v3)

services:
  api:
    image: myorg/api@sha256:abc123…
    read_only: true
    tmpfs:
      - /tmp:rw,noexec,nosuid,size=64m
    security_opt:
      - no-new-privileges:true
      - apparmor:docker-default
    cap_drop:
      - ALL
    cap_add:
      - NET_BIND_SERVICE
    user: "1000:1000"
    pids_limit: 100
    deploy:
      resources:
        limits:
          cpus: "1.0"
          memory: 512M
    networks:
      - backend
    # No published ports unless the service is an edge API

networks:
  backend:
    internal: false

Kubernetes (sketch)

The same fields exist as securityContext, resources.limits, readOnlyRootFilesystem, allowPrivilegeEscalation: false, capabilities.drop: [ALL], and seccompProfile: {type: RuntimeDefault}. AppArmor is covered in apparmor.md §3.

7. Verification and auditing

Run these inside the Lima guest after starting a container.

Inspect effective security options

docker inspect --format '{{.HostConfig.SecurityOpt}}' mycontainer
docker inspect --format '{{.HostConfig.CapDrop}} {{.HostConfig.CapAdd}}' mycontainer
docker inspect --format '{{.HostConfig.ReadonlyRootfs}}' mycontainer
docker inspect --format '{{.Config.User}}' mycontainer

Confirm AppArmor profile

docker inspect --format '{{.AppArmorProfile}}' mycontainer
# Expected: docker-default (or your custom profile name)

Watch for LSM denials

# AppArmor denials land in the kernel log
sudo journalctl -k | grep -i apparmor | tail -20

# Or use auditd if installed
sudo ausearch -m AVC -ts recent

If a container works with --privileged but fails without it, do not ship --privileged. Find the specific denial (AppArmor, seccomp, or missing capability) and fix it narrowly.

Runtime process check

docker top mycontainer
# UID should not be 0 unless you have a documented reason

8. Hardened example

A production-shaped docker run for a stateless HTTP API inside the Lima guest:

docker run -d \
  --name api \
  --restart unless-stopped \
  --user 1000:1000 \
  --read-only \
  --tmpfs /tmp:rw,noexec,nosuid,size=64m \
  --security-opt no-new-privileges:true \
  --security-opt apparmor=docker-default \
  --cap-drop ALL \
  --cap-add NET_BIND_SERVICE \
  --memory 512m \
  --memory-swap 512m \
  --cpus 1.0 \
  --pids-limit 200 \
  -p 127.0.0.1:8080:8080 \
  --network app-net \
  myorg/api@sha256:abc123…

Adjust capabilities and tmpfs mounts after testing against the real workload. The structure — non-root, read-only, dropped caps, resource limits, pinned digest, loopback bind — should be the default you justify away from, not toward.

9. Design-review checklist

Use this as a gate before merging a container spec:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Container Security: A Principal Engineer's Reference

Table of contents

0. Where security lives in the stack

1. Threat model

2. Image and supply chain

Use minimal, pinned images

Build as non-root

Scan before deploy

Do not bake secrets into images

3. Runtime hardening

Run as a non-root user

Read-only root filesystem

Drop Linux capabilities

Seccomp

AppArmor

Resource limits

Disable unnecessary kernel features

Volume mounts — least privilege

4. Network isolation

Use a dedicated bridge network

Avoid host networking unless required

Publish only the ports you need

Inter-container egress

5. Secrets and configuration

6. Compose and orchestrator manifests

Docker Compose (v3)

Kubernetes (sketch)

7. Verification and auditing

Inspect effective security options

Confirm AppArmor profile

Watch for LSM denials

Runtime process check

8. Hardened example

9. Design-review checklist

See also

FilesExpand file tree

container-security.md

Latest commit

History

container-security.md

File metadata and controls

Container Security: A Principal Engineer's Reference

Table of contents

0. Where security lives in the stack

1. Threat model

2. Image and supply chain

Use minimal, pinned images

Build as non-root

Scan before deploy

Do not bake secrets into images

3. Runtime hardening

Run as a non-root user

Read-only root filesystem

Drop Linux capabilities

Seccomp

AppArmor

Resource limits

Disable unnecessary kernel features

Volume mounts — least privilege

4. Network isolation

Use a dedicated bridge network

Avoid host networking unless required

Publish only the ports you need

Inter-container egress

5. Secrets and configuration

6. Compose and orchestrator manifests

Docker Compose (v3)

Kubernetes (sketch)

7. Verification and auditing

Inspect effective security options

Confirm AppArmor profile

Watch for LSM denials

Runtime process check

8. Hardened example

9. Design-review checklist

See also