Skip to content

feat(curtailment): add preview persistence and selection#141

Closed
rongxin-liu wants to merge 3 commits intomainfrom
feat/issue-140-curtailment-persistence-preview
Closed

feat(curtailment): add preview persistence and selection#141
rongxin-liu wants to merge 3 commits intomainfrom
feat/issue-140-curtailment-persistence-preview

Conversation

@rongxin-liu
Copy link
Copy Markdown
Contributor

@rongxin-liu rongxin-liu commented Apr 30, 2026

Closes #140.

Summary

Adds the curtailment persistence and preview-planning slice. PreviewCurtailmentPlan is now backed by database-scoped candidate loading, full device-set scope validation, active-target conflict detection, a kW-only selector, and fleetd config for candidate eligibility thresholds and post-event cooldowns.

This keeps start/update/stop/read curtailment RPCs intentionally stubbed. The persistence schema is laid down for later event lifecycle, dispatch, reconciliation, and restore work, while this PR makes preview selection usable end to end.

What changed

Persistence schema and sqlc

  • Added curtailment_event, curtailment_target, and curtailment_reconciler_heartbeat tables with constraints, active-event indexes, target-work indexes, idempotency/external-reference uniqueness, and reconciler singleton state.
  • Added the ListCurtailmentPreviewDevices sqlc query for whole-org, device-set, and explicit-device preview scopes.
  • Preview candidate loading joins pairing/status data, latest 15-minute power telemetry, 5-minute average power/hashrate, latest completed-hour efficiency, active curtailment target conflicts, and recent terminal-target cooldown state.
  • Added a partial terminal-target cooldown index on curtailment_target keyed for device/state/time lookups, while preserving the event ended_at fallback.
  • Added a curtailment store interface and SQL store implementation that maps nullable status/telemetry into domain preview-device records and converts stored J/H efficiency into J/TH display values.
  • Added device-set resolution through the store by reusing the generated GetDeviceSetTypesBatch query.
  • Regenerated sqlc outputs for the new migration/query surface.

Preview selector and plan math

  • Implemented the curtailment domain service for PreviewCurtailmentPlan, including session org scoping, request normalization, v1 validation, scope conversion, explicit-device resolution, device-set validation, and store-backed candidate loading.
  • v1 preview supports FIXED_KW with LEAST_EFFICIENT_FIRST, FULL, and NORMAL / EMERGENCY priorities.
  • Device-set previews now validate the full requested scope before listing candidates. Mixed valid/invalid sets fail with InvalidArgument instead of silently previewing a smaller scope; duplicate requested IDs do not cause false failures.
  • FIXED_KW selection ranks candidates least-efficient first, accumulates current-power snapshots until target_kw is met, allows explicitly positive tolerance for near-misses, and rejects insufficient curtailable load with structured details.
  • Candidate filtering covers stale telemetry, pairing state, missing/offline/unavailable/explicit UNKNOWN device status, maintenance acknowledgement, full-curtailment capability support, active curtailment target conflicts, post-event cooldown, current/recent power thresholds, and recent hashrate.
  • Active curtailment conflicts are skipped for both normal and emergency previews. Emergency priority can bypass terminal-target cooldown only.
  • Selected candidates are ordered least-efficient first with deterministic device-identifier tie-breaking, and preview reduction/remaining power are calculated from current power snapshots.

Handler and fleetd wiring

  • Wired PreviewCurtailmentPlan through the curtailment Connect handler while leaving the remaining Fleet curtailment RPCs as Unimplemented stubs.
  • Added fleetd curtailment config under curtailment- / CURTAILMENT_, including candidate-min-power-w and post-event-cooldown with the existing 1500 W and 10m defaults.
  • Constructed the SQL curtailment store and preview service in fleetd with the loaded config and existing capability provider.
  • Refreshed generated Go and TypeScript curtailment outputs.

Behavior changes

  • PreviewCurtailmentPlan is available in fleetd when the service is configured.
  • Whole-org, device-set, and explicit-device previews can return selected candidates plus skipped-candidate reasons.
  • Device-set previews fail with InvalidArgument when any requested set is outside the caller org or does not exist.
  • Explicit device previews fail with InvalidArgument when a requested device is outside the caller org or does not exist.
  • Devices already owned by an active curtailment target are skipped with active_curtailment for both normal and emergency previews.
  • Normal-priority previews skip devices with recent terminal targets; emergency previews can bypass that cooldown.
  • Devices with missing or explicit UNKNOWN status are treated as unreachable, matching the rest of the fleet behavior for paired miners without current status.
  • Miners must have current power, recent average power, and recent hashrate signals that agree they are currently curtailable before being selected.
  • Returned preview efficiency values use the J/TH display convention even though the existing proto field name remains efficiency_jh.

Reviewer guide

Start with the persistence and candidate-loading path:

  • server/migrations/000039_create_curtailment.up.sql
  • server/sqlc/queries/curtailment.sql
  • server/internal/domain/stores/interfaces/curtailment.go
  • server/internal/domain/stores/sqlstores/curtailment.go
  • server/internal/domain/stores/sqlstores/curtailment_integration_test.go

Then review the preview behavior:

  • server/internal/domain/curtailment/service.go
  • server/internal/domain/curtailment/selector.go
  • server/internal/domain/curtailment/modes/mode.go
  • server/internal/domain/curtailment/modes/fixed_kw.go
  • server/internal/handlers/curtailment/handler.go
  • server/cmd/fleetd/config.go
  • server/cmd/fleetd/main.go

Generated outputs live under server/generated/sqlc, server/generated/grpc/curtailment/v1, and client/src/protoFleet/api/generated/curtailment/v1.

Out of scope

  • Creating durable curtailment events or targets from preview results.
  • StartCurtailment, UpdateCurtailmentEvent, StopCurtailment, GetActiveCurtailment, and ListCurtailmentEvents business logic.
  • Command dispatch integration, command preflight blocking, active reconciliation, drift detection, and restore behavior.
  • Read APIs for active/historical curtailment events.
  • Audit events, metrics, alerts, frontend UI, and end-to-end virtual-fleet coverage.
  • Webhook ingestion, closed-loop modes, Fleet-level efficiency events, smart PDU / PDU-outlet / rack target controls, and maintenance-hold primitives.

Test coverage

Added or updated tests cover:

  • Preview request validation for supported v1 mode, strategy, level, priority, maintenance force acknowledgement, explicit-device resolution, device-set scope validation, and duplicate device-set IDs.
  • Whole-org/device-set/device-list store parameter conversion and preview-device query mapping.
  • FIXED_KW selection math, tolerance echoing, least-efficient-first ordering, deterministic tie-breaking, last-miner overshoot, explicit tolerance near-miss, strict omitted/zero tolerance, and insufficient curtailable load.
  • Skipped-candidate behavior for stale telemetry, unpaired devices, missing/offline/unavailable/explicit UNKNOWN status, maintenance, missing full-curtailment capability, active curtailment conflicts, cooldown, unreliable power telemetry, phantom load, and non-hashing miners.
  • Normal vs emergency cooldown behavior, with active curtailment conflicts skipped in both priorities.
  • SQL store integration coverage for active-target detection, terminal-target cooldown detection, and J/H to J/TH efficiency conversion.
  • Handler preview service wiring plus continued stub behavior for the remaining curtailment RPCs.
  • Fleetd config/wiring build coverage.

Focused checks run locally:

  • go test ./server/internal/domain/curtailment
  • go test ./server/internal/domain/stores/sqlstores -short
  • cd server && golangci-lint fmt --diff -c .golangci.yaml
  • cd server && just lint
  • cd server && just build
  • git diff --check

GitHub PR Gate is passing on latest head 4a14467.

@github-actions github-actions Bot added javascript Pull requests that update javascript code client server shared labels Apr 30, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

🔐 Codex Security Review

Note: This is an automated security-focused code review generated by Codex.
It should be used as a supplementary check alongside human review.
False positives are possible - use your judgment.

Scope summary

  • Reviewed pull request diff only (fb12730479631adb4bc973182d42d5767a54a9a0...4e703bd20acd68016466f4b0b3fa30c3f601f210, exact PR three-dot diff)
  • Model: gpt-5.4

💡 Click "edited" above to see previous reviews for this PR.


Review Summary

Overall Risk: MEDIUM

Findings

[MEDIUM] Curtailment preview can read stale telemetry from a previous device/org after soft-delete reuse

  • Category: SQLi/Database
  • Location: server/sqlc/queries/curtailment.sql:81
  • Description: The new preview query joins device_metrics and device_metrics_hourly only on device_identifier. In this repo, identifiers are reusable after soft-delete, so a device that is deleted and later re-paired under a new row can inherit recent power/hash data and historical efficiency from the old row. The same issue also exists in the hourly efficiency lookup later in this query.
  • Impact: Preview results can leak telemetry across device lifetimes, including cross-org leakage if the same miner is moved between organizations. It also makes curtailment planning incorrect because candidate ranking and estimated reduction can be based on someone else’s historical data.
  • Recommendation: Bound telemetry lookups to the current device lifetime, e.g. include the active device row’s creation timestamp in scoped_devices and require dm.time >= d.created_at / dmh.bucket >= date_trunc('hour', d.created_at), or move telemetry to an immutable device key instead of bare identifiers.

[MEDIUM] Whole-org preview does uncached capability lookups for every device

  • Category: Reliability
  • Location: server/internal/domain/curtailment/selector.go:258
  • Description: filterCandidates() walks every scoped device, and skipReason() can call supportsFullCurtailment() for each one. In production this service is wired to pluginService, so large previews repeatedly fetch the same capability information for identical (driver, manufacturer, model) combinations instead of caching it per request.
  • Impact: Whole-org previews scale with both fleet size and plugin lookup count, so repeated preview calls from the UI/API can create avoidable latency spikes and extra load on plugin processes.
  • Recommendation: Memoize capability results by (driver_name, manufacturer, model) for the duration of the request and reuse them inside the selection loop. Consider also capping or summarizing skipped_candidates for very large scopes.

[LOW] New curtailment store paths return raw database errors to clients

  • Category: gRPC
  • Location: server/internal/domain/stores/sqlstores/curtailment.go:55
  • Description: The store wraps SQL failures with fleeterror.NewInternalErrorf("...: %v", err). The error interceptor sends FleetError.DebugMessage back to the caller, so Postgres relation names, constraint names, and migration state leak through preview failures. The same pattern is also used in ListValidDeviceSetIDs().
  • Impact: Authenticated callers can learn internal schema and deployment details from backend failures that should stay server-side.
  • Recommendation: Return a sanitized internal error message to the client and log the underlying DB error separately.

Notes

I did not find an auth bypass, SQL injection, command injection, or pool/wallet redirection path in the reviewed diff.

I did not run tests in this review environment.


Generated by Codex Security Review |
Triggered by: @rongxin-liu |
Review workflow run

@github-actions github-actions Bot removed javascript Pull requests that update javascript code client shared labels May 1, 2026
Base automatically changed from feat/issue-116-curtailment-foundation to main May 1, 2026 13:06
@rongxin-liu rongxin-liu marked this pull request as ready for review May 1, 2026 21:14
@rongxin-liu rongxin-liu requested a review from a team as a code owner May 1, 2026 21:14
Copilot AI review requested due to automatic review settings May 1, 2026 21:14
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the first end-to-end usable curtailment “preview” path to the Go backend by laying down persistence tables + sqlc query surface, a SQL-backed store, and a domain selector/service that filters/ranks miners and returns a PreviewCurtailmentPlan response. This is wired into fleetd while leaving the remaining curtailment RPCs intentionally Unimplemented.

Changes:

  • Added curtailment persistence schema (events/targets/reconciler heartbeat) and a sqlc query to load preview candidates with telemetry + conflict/cooldown signals.
  • Implemented curtailment preview domain service/selector (v1 FIXED_KW + least-efficient-first selection) and a SQL store adapter.
  • Wired PreviewCurtailmentPlan through the Connect handler and fleetd config/service construction.

Reviewed changes

Copilot reviewed 15 out of 18 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
server/sqlc/queries/curtailment.sql Adds ListCurtailmentPreviewDevices query to load scoped preview candidates + telemetry/conflict/cooldown flags.
server/migrations/000039_create_curtailment.up.sql Introduces curtailment event/target/heartbeat tables, constraints, and indexes.
server/migrations/000039_create_curtailment.down.sql Drops new curtailment tables/trigger on rollback.
server/internal/handlers/curtailment/handler.go Implements PreviewCurtailmentPlan delegation (keeps other RPCs unimplemented).
server/internal/handlers/curtailment/handler_test.go Updates handler construction + adds delegation test for preview.
server/internal/domain/stores/interfaces/curtailment.go Adds curtailment store interface + preview device DTO/params.
server/internal/domain/stores/sqlstores/curtailment.go Implements SQL curtailment store (device-set validation + preview candidate mapping).
server/internal/domain/stores/sqlstores/curtailment_integration_test.go DB integration coverage for preview candidate flags + efficiency conversion.
server/internal/domain/curtailment/service.go Implements PreviewCurtailmentPlan orchestration (scope validation + store-backed candidate load).
server/internal/domain/curtailment/selector.go Implements filtering + ranking + plan building and response mapping.
server/internal/domain/curtailment/modes/mode.go Adds mode abstraction and candidate struct + helpers.
server/internal/domain/curtailment/modes/fixed_kw.go Implements FIXED_KW selection logic + structured insufficient-load error.
server/internal/domain/curtailment/service_test.go Adds unit tests for preview validation/filtering/selection semantics.
server/cmd/fleetd/config.go Adds curtailment config block to fleetd config.
server/cmd/fleetd/main.go Wires SQL curtailment store + preview service into the curtailment handler.
server/generated/sqlc/models.go Regenerated sqlc models for new curtailment tables.
server/generated/sqlc/db.go Regenerated sqlc prepared statement plumbing for new query.
server/generated/sqlc/curtailment.sql.go Regenerated sqlc query implementation for ListCurtailmentPreviewDevices.

Comment thread server/migrations/000039_create_curtailment.up.sql Outdated
@rongxin-liu rongxin-liu force-pushed the feat/issue-140-curtailment-persistence-preview branch from 3b4640f to 40ff61f Compare May 1, 2026 21:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add curtailment persistence and preview selector

2 participants