Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ make docs-gen # regenerate AI docs from source
- Pod builder is a pure function in internal/podbuilder/ (no k8s client)
- Pacing logic lives exclusively in internal/pacing/
- Don't manually edit generated files — run make docs-gen
- Documentation must never contain unverified information — verify all examples against a real cluster before merging

## Testing Patterns

Expand Down
32 changes: 30 additions & 2 deletions api/v1alpha1/discoverypolicy_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,27 @@ type DiscoverySource struct {
SecretRef *corev1.LocalObjectReference `json:"secretRef,omitempty"`
}

// AggregationMethod defines how range query values are aggregated into a score.
// +kubebuilder:validation:Enum=sum;count;avg;max
type AggregationMethod string

const (
// AggregationSum adds all data-point values over the lookback window.
// Use when the query returns a gauge/counter and the total magnitude matters
// (e.g., total memory usage across the window).
AggregationSum AggregationMethod = "sum"
// AggregationCount counts the number of non-zero data points over the lookback window.
// Use when you want to rank by how frequently an image appears
// (e.g., number of sample intervals where the image was running).
AggregationCount AggregationMethod = "count"
// AggregationAvg computes the arithmetic mean of all data-point values.
// Use when you want the average magnitude regardless of how many samples exist.
AggregationAvg AggregationMethod = "avg"
// AggregationMax takes the highest single data-point value.
// Use when peak usage is more relevant than cumulative usage.
AggregationMax AggregationMethod = "max"
)

// PrometheusSource defines Prometheus query configuration for image discovery.
type PrometheusSource struct {
// Endpoint is the Prometheus-compatible API URL (Prometheus, Thanos, Mimir, VictoriaMetrics).
Expand All @@ -66,13 +87,20 @@ type PrometheusSource struct {
// +kubebuilder:validation:MinLength=1
Query string `json:"query"`
// Lookback is the time window for aggregation. When set, the operator uses query_range
// (start=now-lookback, end=now) and sums all returned values per image to produce a score.
// (start=now-lookback, end=now) and aggregates all returned values per image to produce a score.
// The aggregation function is controlled by the aggregationMethod field.
// When unset, uses an instant query (/api/v1/query) and the point-in-time value is the score.
// Example: "168h" (7 days), "24h", "72h"
// +optional
Lookback *metav1.Duration `json:"lookback,omitempty"`
// AggregationMethod controls how data points from a range query are combined into a single score.
// Only used when lookback is set. Ignored for instant queries.
// Default: "sum". Options: "sum", "count", "avg", "max"
// +kubebuilder:default="sum"
// +optional
AggregationMethod AggregationMethod `json:"aggregationMethod,omitempty"`
// Step is the resolution step for range queries (only used when lookback is set).
// Smaller steps = more data points = more accurate sums but higher Prometheus load.
// Smaller steps = more data points = more accurate aggregation but higher Prometheus load.
// Default: "5m". Example: "1m", "15m"
// +kubebuilder:default="5m"
// +optional
Expand Down
17 changes: 15 additions & 2 deletions config/crd/bases/drop.corewire.io_discoverypolicies.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,18 @@ spec:
prometheus:
description: Prometheus contains the configuration when type=prometheus.
properties:
aggregationMethod:
default: sum
description: |-
AggregationMethod controls how data points from a range query are combined into a single score.
Only used when lookback is set. Ignored for instant queries.
Default: "sum". Options: "sum", "count", "avg", "max"
enum:
- sum
- count
- avg
- max
type: string
endpoint:
description: |-
Endpoint is the Prometheus-compatible API URL (Prometheus, Thanos, Mimir, VictoriaMetrics).
Expand All @@ -95,7 +107,8 @@ spec:
lookback:
description: |-
Lookback is the time window for aggregation. When set, the operator uses query_range
(start=now-lookback, end=now) and sums all returned values per image to produce a score.
(start=now-lookback, end=now) and aggregates all returned values per image to produce a score.
The aggregation function is controlled by the aggregationMethod field.
When unset, uses an instant query (/api/v1/query) and the point-in-time value is the score.
Example: "168h" (7 days), "24h", "72h"
type: string
Expand All @@ -111,7 +124,7 @@ spec:
default: 5m
description: |-
Step is the resolution step for range queries (only used when lookback is set).
Smaller steps = more data points = more accurate sums but higher Prometheus load.
Smaller steps = more data points = more accurate aggregation but higher Prometheus load.
Default: "5m". Example: "1m", "15m"
type: string
required:
Expand Down
21 changes: 18 additions & 3 deletions docs/content/docs/discovery.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,14 @@ count(container_memory_working_set_bytes{

Hand-maintained image lists do not keep up in environments where automation (for example Renovate) ships new image versions every day. A practical pattern is to rank images by observed CI usage over a rolling window.

The `lookback` field tells Drop to use Prometheus `query_range` API over that time window and sum all returned values per image to produce a total usage score:
The `lookback` field tells Drop to use Prometheus `query_range` API over that time window. The `aggregationMethod` field controls how the returned data points are combined into a single score per image:

| Method | Behavior | Use when |
|--------|----------|----------|
| `sum` (default) | Adds all data-point values over the window | Total cumulative usage matters (e.g. total memory consumed) |
| `count` | Counts the number of data points returned | You want to rank by how frequently an image appears |
| `avg` | Arithmetic mean of all data-point values | Average magnitude matters regardless of sample count |
| `max` | Highest single data-point value | Peak usage is more relevant than cumulative |

```yaml
apiVersion: drop.corewire.io/v1alpha1
Expand All @@ -82,6 +89,7 @@ spec:
endpoint: https://mimir.example.com
lookback: 168h # 7 days
step: 5m
aggregationMethod: sum # default — rank by total usage over 7 days
query: |
count(
container_memory_working_set_bytes{
Expand All @@ -95,7 +103,8 @@ Use this when you want DiscoveryPolicy to continuously follow what your GitLab r

#### Field-by-field explanation

- `lookback: 168h` — Drop uses `query_range` with start=now-7d, end=now, and sums all returned values per image to rank by total usage over the window.
- `lookback: 168h` — Drop uses `query_range` with start=now-7d, end=now, and aggregates all returned values per image using the chosen `aggregationMethod` (default: `sum`).
- `aggregationMethod: sum` — sums all data-point values to rank by total usage. Use `count` to rank by number of appearances, `avg` for average magnitude, or `max` for peak value.
- `step: 5m` — resolution step for the range query (controls how many data points Prometheus returns).
- `count(...) by (image)` — counts the number of running containers per image to rank by popularity.
- `container_memory_working_set_bytes{...}` — source metric used to observe running containers.
Expand All @@ -108,7 +117,12 @@ Use this when you want DiscoveryPolicy to continuously follow what your GitLab r

For each unique `image` label, Drop uses the Prometheus query result value as the score.

When `lookback` is not set (the default), Drop sends an instant query (`/api/v1/query`) and uses the returned value directly. When `lookback` is set (e.g. `lookback: 168h`), Drop uses a range query (`/api/v1/query_range`) over that window and **sums all returned values** to produce the score. This means images that appear more frequently over the window get a higher score.
When `lookback` is not set (the default), Drop sends an instant query (`/api/v1/query`) and uses the returned value directly. When `lookback` is set (e.g. `lookback: 168h`), Drop uses a range query (`/api/v1/query_range`) over that window and aggregates data points using the `aggregationMethod`:

- `sum` (default): adds all data-point values — images with higher cumulative usage score higher
- `count`: counts the number of data points — images that appear more frequently score higher
- `avg`: averages data-point values — images with higher average value score higher
- `max`: takes the peak value — images with the highest single observation score higher

The example above uses `lookback: 168h` so Drop handles the 7-day windowing via the API — no need to embed `[7d]` in PromQL.

Expand Down Expand Up @@ -156,6 +170,7 @@ spec:
- type: prometheus
prometheus:
endpoint: https://mimir.example.com
aggregationMethod: count # rank by number of appearances
query: |
count(container_memory_working_set_bytes{
container!="", container!="POD",
Expand Down
5 changes: 3 additions & 2 deletions docs/content/docs/reference/_generated_crds.md
Original file line number Diff line number Diff line change
Expand Up @@ -207,8 +207,9 @@ PrometheusSource defines Prometheus query configuration for image discovery.
|-------|------|----------|---------|-------------|
| `endpoint` | `string` | Yes | — | Endpoint is the Prometheus-compatible API URL (Prometheus, Thanos, Mimir, VictoriaMetrics). Example: "http://prometheus.monitoring.svc:9090", "https://mimir.example.com" |
| `query` | `string` | Yes | — | Query is the PromQL expression. It MUST return results with an "image" label — that label value is used as the discovered image reference. The query result value is used as the ranking score (higher = more relevant). Example: count(container_memory_working_set_bytes{container!="",container!="POD",namespace="gitlab-runner"}) by (image) |
| `lookback` | `*metav1.Duration` | No | — | Lookback is the time window for aggregation. When set, the operator uses query_range (start=now-lookback, end=now) and sums all returned values per image to produce a score. When unset, uses an instant query (/api/v1/query) and the point-in-time value is the score. Example: "168h" (7 days), "24h", "72h" |
| `step` | `string` | No | 5m | Step is the resolution step for range queries (only used when lookback is set). Smaller steps = more data points = more accurate sums but higher Prometheus load. Default: "5m". Example: "1m", "15m" |
| `lookback` | `*metav1.Duration` | No | — | Lookback is the time window for aggregation. When set, the operator uses query_range (start=now-lookback, end=now) and aggregates all returned values per image to produce a score. The aggregation function is controlled by the aggregationMethod field. When unset, uses an instant query (/api/v1/query) and the point-in-time value is the score. Example: "168h" (7 days), "24h", "72h" |
| `aggregationMethod` | `AggregationMethod` | No | sum | AggregationMethod controls how data points from a range query are combined into a single score. Only used when lookback is set. Ignored for instant queries. Default: "sum". Options: "sum", "count", "avg", "max" |
| `step` | `string` | No | 5m | Step is the resolution step for range queries (only used when lookback is set). Smaller steps = more data points = more accurate aggregation but higher Prometheus load. Default: "5m". Example: "1m", "15m" |

### RegistrySource

Expand Down
2 changes: 0 additions & 2 deletions docs/go.mod
Original file line number Diff line number Diff line change
@@ -1,5 +1,3 @@
module github.com/corewire/drop/docs

go 1.26.0

require github.com/imfing/hextra v0.12.3 // indirect
2 changes: 0 additions & 2 deletions docs/go.sum
Original file line number Diff line number Diff line change
@@ -1,2 +0,0 @@
github.com/imfing/hextra v0.12.3 h1:DZHY2rUWYteyzjlHi9r4n7Bb5e2Q+6LXe4C1Dqn0ZjM=
github.com/imfing/hextra v0.12.3/go.mod h1:vi+yhpq8YPp/aghvJlNKVnJKcPJ/VyAEcfC1BSV9ARo=
6 changes: 4 additions & 2 deletions docs/static/llms-full.txt
Original file line number Diff line number Diff line change
Expand Up @@ -181,8 +181,9 @@ PrometheusSource defines Prometheus query configuration for image discovery.
|-------|------|------|----------|---------|-------------|
| Endpoint | `endpoint` | `string` | ✓ | | Endpoint is the Prometheus-compatible API URL (Prometheus, Thanos, Mimir, VictoriaMetrics). Example: "http://prometheus.monitoring.svc:9090", "https://mimir.example.com" |
| Query | `query` | `string` | ✓ | | Query is the PromQL expression. It MUST return results with an "image" label — that label value is used as the discovered image reference. The query result value is used as the ranking score (higher = more relevant). Example: count(container_memory_working_set_bytes{container!="",container!="POD",namespace="gitlab-runner"}) by (image) |
| Lookback | `lookback` | `*metav1.Duration` | — | | Lookback is the time window for aggregation. When set, the operator uses query_range (start=now-lookback, end=now) and sums all returned values per image to produce a score. When unset, uses an instant query (/api/v1/query) and the point-in-time value is the score. Example: "168h" (7 days), "24h", "72h" |
| Step | `step` | `string` | — | `5m` | Step is the resolution step for range queries (only used when lookback is set). Smaller steps = more data points = more accurate sums but higher Prometheus load. Default: "5m". Example: "1m", "15m" |
| Lookback | `lookback` | `*metav1.Duration` | — | | Lookback is the time window for aggregation. When set, the operator uses query_range (start=now-lookback, end=now) and aggregates all returned values per image to produce a score. The aggregation function is controlled by the aggregationMethod field. When unset, uses an instant query (/api/v1/query) and the point-in-time value is the score. Example: "168h" (7 days), "24h", "72h" |
| AggregationMethod | `aggregationMethod` | `AggregationMethod` | — | `sum` | AggregationMethod controls how data points from a range query are combined into a single score. Only used when lookback is set. Ignored for instant queries. Default: "sum". Options: "sum", "count", "avg", "max" |
| Step | `step` | `string` | — | `5m` | Step is the resolution step for range queries (only used when lookback is set). Smaller steps = more data points = more accurate aggregation but higher Prometheus load. Default: "5m". Example: "1m", "15m" |

### RegistrySource

Expand Down Expand Up @@ -332,6 +333,7 @@ spec:
query: 'count(container_memory_working_set_bytes{container!="", container!="POD", namespace="build-stuff", pod=~"runner-.*"}) by (image)'
lookback: 24h
step: 5m
aggregationMethod: sum
syncInterval: 30s
maxImages: 10
---
Expand Down
22 changes: 0 additions & 22 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,6 @@ github.com/google/go-cmp v0.7.0/go.mod h1:pXiqmnSA92OHEEa9HXL2W4E7lf9JzCmGVUdgjX
github.com/google/gofuzz v1.0.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
github.com/google/gofuzz v1.2.0 h1:xRy4A+RhZaiKjJ1bPfwQ8sedCA+YS2YcCHW6ec7JMi0=
github.com/google/gofuzz v1.2.0/go.mod h1:dBl0BpW6vV/+mYPU4Po3pmUjxk6FQPldtuIdl/M65Eg=
github.com/google/pprof v0.0.0-20250403155104-27863c87afa6 h1:BHT72Gu3keYf3ZEu2J0b1vyeLSOYI8bm5wbJM/8yDe8=
github.com/google/pprof v0.0.0-20250403155104-27863c87afa6/go.mod h1:boTsfXsheKC2y+lKOCMpSfarhxDeIzfZG1jqGcPl3cA=
github.com/google/pprof v0.0.0-20260402051712-545e8a4df936 h1:EwtI+Al+DeppwYX2oXJCETMO23COyaKGP6fHVpkpWpg=
github.com/google/pprof v0.0.0-20260402051712-545e8a4df936/go.mod h1:MxpfABSjhmINe3F1It9d+8exIHFvUqtLIRCdOGNXqiI=
github.com/google/uuid v1.6.0 h1:NIvaJDMOsjHA8n1jAhLSgzrAzy1Hgr+hNrb57e+94F0=
Expand Down Expand Up @@ -107,14 +105,8 @@ github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee h1:W5t00kpgFd
github.com/modern-go/reflect2 v1.0.3-0.20250322232337-35a7c28c31ee/go.mod h1:yWuevngMOJpCy52FWWMvUC8ws7m/LJsjYzDa0/r8luk=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 h1:C3w9PqII01/Oq1c1nUAm88MOHcQC9l5mIlSMApZMrHA=
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822/go.mod h1:+n7T8mK8HuQTcFwEeznm/DIxMOiR9yIdICNftLE1DvQ=
github.com/onsi/ginkgo/v2 v2.27.4 h1:fcEcQW/A++6aZAZQNUmNjvA9PSOzefMJBerHJ4t8v8Y=
github.com/onsi/ginkgo/v2 v2.27.4/go.mod h1:ArE1D/XhNXBXCBkKOLkbsb2c81dQHCRcF5zwn/ykDRo=
github.com/onsi/ginkgo/v2 v2.29.0 h1:rfh+ZFjgJhYWRoIqVf3Uwx/W20yLrcrE2h2GmYVRaag=
github.com/onsi/ginkgo/v2 v2.29.0/go.mod h1:+aXOY+vzZ5mu2iI2HpTZUPmM//oQfsNFX6gU9kNcA44=
github.com/onsi/gomega v1.39.0 h1:y2ROC3hKFmQZJNFeGAMeHZKkjBL65mIZcvrLQBF9k6Q=
github.com/onsi/gomega v1.39.0/go.mod h1:ZCU1pkQcXDO5Sl9/VVEGlDyp+zm0m1cmeG5TOzLgdh4=
github.com/onsi/gomega v1.40.0 h1:Vtol0e1MghCD2ZVIilPDIg44XSL9l2QAn8ZNaljWcJc=
github.com/onsi/gomega v1.40.0/go.mod h1:M/Uqpu/8qTjtzCLUA2zJHX9Iilrau25x1PdoSRbWh5A=
github.com/onsi/gomega v1.41.0 h1:OwKp4pXNgVxf6sCplzYo794OFNuoL2q2SBMU5NSWOjA=
github.com/onsi/gomega v1.41.0/go.mod h1:M/Uqpu/8qTjtzCLUA2zJHX9Iilrau25x1PdoSRbWh5A=
github.com/pkg/errors v0.9.1 h1:FEBLx1zS214owpjy7qsBeixbURkuhQAwrK5UwLGTwt4=
Expand Down Expand Up @@ -192,36 +184,22 @@ go.yaml.in/yaml/v3 v3.0.4 h1:tfq32ie2Jv2UxXFdLJdh3jXuOzWiL1fo0bu/FbuKpbc=
go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg=
golang.org/x/exp v0.0.0-20251219203646-944ab1f22d93 h1:fQsdNF2N+/YewlRZiricy4P1iimyPKZ/xwniHj8Q2a0=
golang.org/x/exp v0.0.0-20251219203646-944ab1f22d93/go.mod h1:EPRbTFwzwjXj9NpYyyrvenVh9Y+GFeEvMNh7Xuz7xgU=
golang.org/x/mod v0.32.0 h1:9F4d3PHLljb6x//jOyokMv3eX+YDeepZSEo3mFJy93c=
golang.org/x/mod v0.32.0/go.mod h1:SgipZ/3h2Ci89DlEtEXWUk/HteuRin+HHhN+WbNhguU=
golang.org/x/mod v0.35.0 h1:Ww1D637e6Pg+Zb2KrWfHQUnH2dQRLBQyAtpr/haaJeM=
golang.org/x/mod v0.35.0/go.mod h1:+GwiRhIInF8wPm+4AoT6L0FA1QWAad3OMdTRx4tFYlU=
golang.org/x/net v0.49.0 h1:eeHFmOGUTtaaPSGNmjBKpbng9MulQsJURQUAfUwY++o=
golang.org/x/net v0.49.0/go.mod h1:/ysNB2EvaqvesRkuLAyjI1ycPZlQHM3q01F02UY/MV8=
golang.org/x/net v0.53.0 h1:d+qAbo5L0orcWAr0a9JweQpjXF19LMXJE8Ey7hwOdUA=
golang.org/x/net v0.53.0/go.mod h1:JvMuJH7rrdiCfbeHoo3fCQU24Lf5JJwT9W3sJFulfgs=
golang.org/x/oauth2 v0.34.0 h1:hqK/t4AKgbqWkdkcAeI8XLmbK+4m4G5YeQRrmiotGlw=
golang.org/x/oauth2 v0.34.0/go.mod h1:lzm5WQJQwKZ3nwavOZ3IS5Aulzxi68dUSgRHujetwEA=
golang.org/x/sync v0.19.0 h1:vV+1eWNmZ5geRlYjzm2adRgW2/mcpevXNg50YZtPCE4=
golang.org/x/sync v0.19.0/go.mod h1:9KTHXmSnoGruLpwFjVSX0lNNA75CykiMECbovNTZqGI=
golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4=
golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0=
golang.org/x/sys v0.40.0 h1:DBZZqJ2Rkml6QMQsZywtnjnnGvHza6BTfYFWY9kjEWQ=
golang.org/x/sys v0.40.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/sys v0.43.0 h1:Rlag2XtaFTxp19wS8MXlJwTvoh8ArU6ezoyFsMyCTNI=
golang.org/x/sys v0.43.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/term v0.39.0 h1:RclSuaJf32jOqZz74CkPA9qFuVTX7vhLlpfj/IGWlqY=
golang.org/x/term v0.39.0/go.mod h1:yxzUCTP/U+FzoxfdKmLaA0RV1WgE0VY7hXBwKtY/4ww=
golang.org/x/term v0.42.0 h1:UiKe+zDFmJobeJ5ggPwOshJIVt6/Ft0rcfrXZDLWAWY=
golang.org/x/term v0.42.0/go.mod h1:Dq/D+snpsbazcBG5+F9Q1n2rXV8Ma+71xEjTRufARgY=
golang.org/x/text v0.33.0 h1:B3njUFyqtHDUI5jMn1YIr5B0IE2U0qck04r6d4KPAxE=
golang.org/x/text v0.33.0/go.mod h1:LuMebE6+rBincTi9+xWTY8TztLzKHc/9C1uBCG27+q8=
golang.org/x/text v0.36.0 h1:JfKh3XmcRPqZPKevfXVpI1wXPTqbkE5f7JA92a55Yxg=
golang.org/x/text v0.36.0/go.mod h1:NIdBknypM8iqVmPiuco0Dh6P5Jcdk8lJL0CUebqK164=
golang.org/x/time v0.14.0 h1:MRx4UaLrDotUKUdCIqzPC48t1Y9hANFKIRpNx+Te8PI=
golang.org/x/time v0.14.0/go.mod h1:eL/Oa2bBBK0TkX57Fyni+NgnyQQN4LitPmob2Hjnqw4=
golang.org/x/tools v0.41.0 h1:a9b8iMweWG+S0OBnlU36rzLp20z1Rp10w+IY2czHTQc=
golang.org/x/tools v0.41.0/go.mod h1:XSY6eDqxVNiYgezAVqqCeihT4j1U2CCsqvH3WhQpnlg=
golang.org/x/tools v0.44.0 h1:UP4ajHPIcuMjT1GqzDWRlalUEoY+uzoZKnhOjbIPD2c=
golang.org/x/tools v0.44.0/go.mod h1:KA0AfVErSdxRZIsOVipbv3rQhVXTnlU6UhKxHd1seDI=
gomodules.xyz/jsonpatch/v2 v2.4.0 h1:Ci3iUJyx9UeRx7CeFN8ARgGbkESwJK+KB9lLcWxY/Zw=
Expand Down
Loading
Loading