Skip to content

Add per-job data transfer quota (AWS)#3685

Open
peterschmidt85 wants to merge 3 commits intomasterfrom
feature/data-transfer-quota
Open

Add per-job data transfer quota (AWS)#3685
peterschmidt85 wants to merge 3 commits intomasterfrom
feature/data-transfer-quota

Conversation

@peterschmidt85
Copy link
Contributor

Summary

  • Adds DSTACK_SERVER_DATA_TRANSFER_QUOTA_PER_JOB_AWS server setting (bytes, 0=unlimited) to terminate jobs exceeding outbound data transfer limits
  • Metering uses iptables byte counters on the shim (host-level), hooked into both OUTPUT and FORWARD chains to cover host and bridge Docker network modes
  • Private/VPC traffic (10.0.0.0/8, 172.16.0.0/12, etc.) is excluded — only external/billable traffic counts
  • Shim notifies the runner via a new POST /api/terminate endpoint, so the server reads the termination reason through the existing /api/pull flow (same pattern as log quota)
  • Job fails with data_transfer_quota_exceeded termination reason, visible in CLI and UI
  • AWS-only for now; other backends can be added via DSTACK_SERVER_DATA_TRANSFER_QUOTA_PER_JOB_GCP, etc.

Files changed

Go (shim/runner):

  • runner/internal/shim/netmeter/ — new package: iptables chain setup, byte counter polling, quota check
  • runner/internal/shim/docker.gowaitContainerWithQuota() method wired into Run()
  • runner/internal/runner/api/ — new POST /api/terminate endpoint
  • runner/internal/common/types/types.goTerminationReasonDataTransferQuotaExceeded

Python (server):

  • settings.pySERVER_DATA_TRANSFER_QUOTA_PER_JOB_AWS env var
  • runs.pyDATA_TRANSFER_QUOTA_EXCEEDED termination reason
  • jobs_running.py / running_jobs.py — pass quota to shim for AWS backend

Test plan

  • Unit tests for iptables output parsing (netmeter_test.go)
  • Integration test on AWS: verified iptables metering works in both Docker host and bridge modes
  • E2E test on AWS: task terminated at ~10MB with Error (Data transfer quota exceeded) in dstack ps
  • E2E test on Crusoe (non-AWS): task ran past 10MB without metering — quota correctly not enforced

🤖 Generated with Claude Code

…limits

Adds a configurable per-job outbound data transfer quota (AWS only) that
terminates jobs when the total external traffic exceeds the threshold.
Metering uses iptables byte counters on the shim (host-level), excluding
private/VPC traffic. The shim notifies the runner via a new /api/terminate
endpoint so the server reads the termination reason through the existing
/api/pull flow — same pattern as log quota.

Configured via DSTACK_SERVER_DATA_TRANSFER_QUOTA_PER_JOB_AWS (bytes, 0=unlimited).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Andrey Cheptsov and others added 2 commits March 22, 2026 23:41
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…uota

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant