Skip to content

fix(audit): resolve #161, #162, #177 + close 13 stale audit issues#209

Merged
ajitpratap0 merged 2 commits intomainfrom
fix/audit-161-162-177
Apr 9, 2026
Merged

fix(audit): resolve #161, #162, #177 + close 13 stale audit issues#209
ajitpratap0 merged 2 commits intomainfrom
fix/audit-161-162-177

Conversation

@ajitpratap0
Copy link
Copy Markdown
Owner

Summary

Third pass at the 2026-03-25 audit backlog. Of the 17 remaining issues triaged this session, 13 were stale (already fixed on main or described code that no longer exists) and closed with evidence links. The 3 genuinely broken items are fixed here.

Fixes

CODE-005 (#161) — request-context-aware orchestrator HTTP client

`OrchestratorClient` now exposes ctx-aware variants (`PauseCtx`, `ResumeCtx`, `IsPausedCtx`, `GetActiveAgentCountCtx`) that inherit a 5s timeout from the caller's context. Dashboard handlers pick them up through a narrow `contextualOrchestrator` type assertion and pass `c.Request.Context()` so an HTTP client disconnect aborts the outbound orchestrator RPC instead of pinning a handler goroutine for the full 5s timeout window.

Rationale for the type-assertion approach: the existing `OrchestratorInterface` is implemented by both `*OrchestratorClient` (cross-process HTTP) and `*orchestrator.Orchestrator` (in-process, used from tests and e2e). Adding ctx params to the interface would churn ~25 test sites for no real benefit — in-process calls don't need cancellation. Keeping the non-ctx methods as wrappers around the Ctx variants avoids that churn entirely while still fixing the real bug for the real code path.

CODE-006 (#162) — fail-fast DB pool init

`cmd/api/main.go` wraps `db.New` in a 30s `context.WithTimeout` derived from the signal context. Previously the signal context had no deadline, so if Postgres was slow or unreachable the process blocked indefinitely on startup instead of failing fast.

FE-010 (#177) — performance page error state

The performance page aggregates errors from the three primary queries (summary metrics, equity history, pair performance) and shows an accessible error banner with a Retry button that refetches only the failed queries. Candlestick and drawdown are excluded intentionally — candlestick has its own `NEXT_PUBLIC_USE_MOCK_CANDLES` fallback and drawdown is derived from equity.

Closed as stale (13)

# Reason
#189 SRE-010 config.yaml:248 already has `port: 8080` with explanatory comment
#188 SRE-009 Alert rule already references `cryptofunk_max_drawdown_threshold`
#187 SRE-008 main.go:101 uses `signal.NotifyContext` — atomic signal setup
#181 FE-014 Fixed in PR #208
#180 FE-013 react-query exposes `isError`/`error` natively
#178 FE-011 Risk page already uses `useRiskAlerts()` hook
#170 CODE-014 `default_quantity` is fully config-driven
#168 CODE-012 `verifyAPIKeysOld` not in Go source
#167 CODE-011 No `fmt.Errorf.*%v.*err` matches in internal/
#166 CODE-010 `ptrStr`/`ptrF64` not in Go source
#165 CODE-009 No `DefaultInitialCapital` const; session init reads config
#169 CODE-013 Strategy validation goroutine is per-request + ctx-bounded
#163 CODE-007 Telegram rate limiter `Stop()` closes stopCleanup + ticker

Deferred

#179 FE-012 — no tests for risk/performance/agents pages. Writing these needs vitest route-level harness scaffolding; best done in a dedicated PR.

Test Plan

  • `go build ./...` — clean
  • `go test -race -timeout 120s ./internal/api/... ./cmd/api/...` — pass
  • `npm run type-check` — clean
  • Manual: kill -STOP a postgres container and confirm `api` exits with a timeout error after 30s (not hangs)
  • Manual: pause trading from dashboard, disconnect client mid-request, confirm API handler goroutine releases quickly instead of waiting the full 5s

Breaking Changes

None.

🤖 Generated with Claude Code

- CODE-005 (#161): OrchestratorClient now exposes ctx-aware variants
  (PauseCtx/ResumeCtx/IsPausedCtx/GetActiveAgentCountCtx) that inherit
  a 5s timeout from the caller's context. Dashboard handlers detect
  these via a narrow contextualOrchestrator type assertion and pass
  c.Request.Context() so an HTTP client disconnect aborts the
  outbound orchestrator RPC instead of pinning a goroutine for 5s.
  The existing non-ctx methods are retained as wrappers so the
  OrchestratorInterface and MockOrchestrator stay unchanged — zero
  churn to tests and the in-process orchestrator implementation.

- CODE-006 (#162): cmd/api/main.go wraps the DB pool initialization
  in a 30s context.WithTimeout so the process fails fast when
  Postgres is unreachable instead of blocking indefinitely on the
  long-lived signal-derived context.

- FE-010 (#177): Performance page now surfaces backend failures with
  an accessible error banner + Retry button that refetches the three
  primary queries (summary metrics, equity history, pair performance).
  Candlestick and drawdown are deliberately excluded — candlestick
  has its own NEXT_PUBLIC_USE_MOCK_CANDLES fallback and drawdown is
  derived from equity.
@ajitpratap0 ajitpratap0 added the bug Something isn't working label Apr 8, 2026
@claude
Copy link
Copy Markdown

claude bot commented Apr 8, 2026

PR Review — fix(audit): #161, #162, #177

Overall this is a focused, well-reasoned PR. The design decisions are clearly documented and the scope is appropriately narrow. A few items worth addressing before merge:


cmd/api/main.go — DB pool init timeout (CODE-006)

Minor — misleading comment.
The comment says "The DB itself keeps using the signal-derived context for the rest of its lifetime" but the signal context isn't actually passed to the pool at all — pgxpool.New uses the context only for the initial connection attempt, not for pool lifetime management. Subsequent queries each supply their own context. The code is correct; the comment overstates what the signal context does.

// Suggested wording:
// Bound the connection attempt by a dedicated 30s timeout. The signal ctx
// is only used as the cancellation parent; the pool itself is managed by
// pool.Close() on shutdown, not by any stored context.

Nitpick — dbInitCancel() placement.
Calling dbInitCancel() immediately (not deferred) is correct and intentional — good. Worth a one-line comment since readers often expect defer cancel() and the deviation is surprising.


internal/api/orchestrator_client.go — Ctx variants (CODE-005)

Clean refactor. Extracting orchestratorRequestTimeout as a named constant is the right call. No issues here.

One observation: when the incoming context is already canceled before context.WithTimeout is called, the resulting context is also immediately canceled. GetActiveAgentCountCtx returns -1 in that case, and IsPausedCtx returns false — both are the correct graceful-degradation values. Worth a test case if coverage is desired.


internal/api/dashboard.go — type-assertion dispatch (CODE-005)

The contextualOrchestrator optional-interface pattern is a pragmatic solution and the rationale in both the code comment and PR description is solid. The four helper functions keep each call-site lean.

Potential concern — nil safety in mutation handlers.
PauseTrading and ResumeTrading call pauseOrchestrator/resumeOrchestrator without an explicit h.orchestrator != nil guard (the guard visible in the diff is only in getTradingStatus and getSystemStatus). If there is a code path where a DashboardHandler is constructed without an orchestrator and PauseTrading is reachable, this will panic. If there's an existing guard earlier in the handler (not visible in the diff), this is fine — worth confirming.

Design note — context cancellation on mutation operations.
Passing c.Request.Context() to Pause/Resume means a mid-flight client disconnect will abort the outbound HTTP call to the orchestrator. Because HTTP is request/response, the orchestrator may have already processed the request by the time the context fires, so the cancel is safe (the mutation either happened or it didn't). Just worth keeping in mind if the orchestrator side ever adds long-running side effects.


web/dashboard/app/performance/page.tsx — error banner (FE-010)

Likely missing import — AlertTriangle.
The diff adds <AlertTriangle ... /> but does not show a corresponding import added for it. If it wasn't already used elsewhere in this file, this will cause a build/type-check error. Please verify import { AlertTriangle } from 'lucide-react' (or equivalent) is present.

Destructuring lines are very wide.

// Before (108 chars+)
const { data: equityData, isLoading: equityLoading, isError: equityError, error: equityErrorObj, refetch: refetchEquity } = ...

Consider splitting across lines for readability:

const {
  data: equityData,
  isLoading: equityLoading,
  isError: equityError,
  error: equityErrorObj,
  refetch: refetchEquity,
} = useEquityHistory(selectedTimeRange)

retryAll recreation on every render.
retryAll closes over metricsError, equityError, pairError, and the refetch* fns. Since it's passed as an onClick, wrapping it in useCallback would prevent an unnecessary new function reference each render:

const retryAll = useCallback(() => {
  if (metricsError) refetchMetrics()
  if (equityError) refetchEquity()
  if (pairError) refetchPair()
}, [metricsError, equityError, pairError, refetchMetrics, refetchEquity, refetchPair])

Not a correctness issue, but it is the idiomatic React approach.

firstErrorMessage is complex but correct.
The chained &&/ternary expression works. A useMemo with an early-return pattern would be easier to follow, but this is style preference rather than a bug.


Test Coverage

The PR description notes FE-012 is deferred (correct call for that scope). On the Go side:

  • The new *Ctx methods in orchestrator_client.go lack unit tests. The non-ctx variants likely have existing coverage that now exercises them indirectly, but a focused test for the context-propagation behavior (e.g., a canceled context returns -1 / false immediately) would be valuable given this is the primary fix for CODE-005.
  • The 30s DB init timeout in main.go is marked as a manual test item in the checklist. An integration test or at least a note about how to verify it automatically would be useful.

Summary

Area Status
CODE-005 (orchestrator_client.go) ✅ Clean, no issues
CODE-005 (dashboard.go dispatch helpers) ✅ Good pattern, verify nil safety in PauseTrading/ResumeTrading
CODE-006 (main.go timeout) ✅ Correct, comment is slightly misleading
FE-010 (performance/page.tsx) ⚠️ Verify AlertTriangle import; minor style items

The three bugs being fixed are real and the fixes are appropriate. The main blocking item to confirm before merge is the AlertTriangle import.

🤖 Generated with Claude Code

- main.go: rewrite the DB init timeout comment to accurately describe
  pgxpool.New's context usage (initial AcquireConn only) and explain
  why dbInitCancel is called immediately instead of deferred.
- performance/page.tsx: split the wide useQuery destructurings across
  lines for readability and wrap retryAll in useCallback so the
  onClick handler doesn't get a fresh function reference every render.

Note: AlertTriangle is already imported on line 17 (pre-existing use
for the Max Drawdown StatCard icon at line 121), so the review's
concern about a missing import was a false alarm.

Note: h.orchestrator nil-safety in PauseTrading/ResumeTrading is
already guarded by the 'if h.orchestrator == nil' checks at lines 545
and 580 that existed before this PR — not shown in the diff.
@ajitpratap0
Copy link
Copy Markdown
Owner Author

Round 2 addressed in a6efa16 — thanks @claude.

main.go comment (CODE-006)
✅ Rewrote to accurately describe pgxpool.New's context usage (initial AcquireConn only) and to explain why dbInitCancel is called immediately instead of deferred.

performance/page.tsx (FE-010)
✅ Split the wide useQuery destructurings across lines.
✅ Wrapped retryAll in useCallback with the correct dependency array.

False alarms verified, not blocking

  • `AlertTriangle` is already imported on line 17 — pre-existing usage as the Max Drawdown StatCard icon at line 121. Not a new import.
  • `h.orchestrator == nil` guards already exist in PauseTrading (line 545) and ResumeTrading (line 580) — they aren't shown in the diff because they weren't modified. The `pauseOrchestrator`/`resumeOrchestrator` dispatch helpers are only called after those existing nil checks, so no new panic surface.

Deferred

  • Context-cancellation unit tests for the new `*Ctx` methods — agreed they're worth having, but adding them properly needs a small test harness for the HTTP fake (context-canceled transport injection). Better as a focused follow-up than bolted onto this PR.

Re-ran `go build ./cmd/api/...` and `npm run type-check`; both clean.

@claude
Copy link
Copy Markdown

claude bot commented Apr 8, 2026

Claude finished @ajitpratap0's task in 52s —— View job


PR Review — Round 2 Verification

All items from the previous review round confirmed clean.

Verified in a6efa16

cmd/api/main.go — comment rewrite (CODE-006)
Lines 104–110 now accurately describe pgxpool.New's context scope (initial AcquireConn only) and explain why dbInitCancel() is called immediately rather than deferred. The comment is technically correct and will prevent future confusion.

web/dashboard/app/performance/page.tsx — style fixes (FE-010)

  • useCallback is imported on line 3.
  • All three useQuery destructurings (lines 28–48) are split across multiple lines — readable.
  • retryAll is wrapped in useCallback (lines 67–71) with the correct dependency array: [metricsError, equityError, pairError, refetchMetrics, refetchEquity, refetchPair].

False alarms confirmed

  • AlertTriangle — already imported on line 17 (pre-existing; used as the Max Drawdown StatCard icon at line 181). Not a new import, no issue.
  • Nil guards — PauseTrading (line 597) and ResumeTrading (line 632) both have if h.orchestrator == nil checks before any pauseOrchestrator/resumeOrchestrator dispatch. No panic surface.

Outstanding (deferred, not blocking)

  • Context-cancellation unit tests for *Ctx methods — agreed the HTTP fake harness is better scaffolded in a focused follow-up PR.
  • FE-012 (route-level vitest harness) — already tracked for a dedicated PR.

This PR is ready to merge. All three genuine bugs are fixed, all review feedback has been addressed, and the two deferred items are properly tracked for follow-up.

@claude
Copy link
Copy Markdown

claude bot commented Apr 8, 2026

Code Review — PR #209

Overall this is a clean, well-scoped bug-fix pass. The rationale in the PR description is clear and the three fixes are all straightforward. A few observations below.


CODE-006 · cmd/api/main.go — DB pool init timeout ✅

dbInitCtx, dbInitCancel := context.WithTimeout(ctx, 30*time.Second)
database, err := db.New(dbInitCtx, &cfg.Database)
dbInitCancel()

Calling dbInitCancel() immediately (not deferred) is correct and the comment explaining why is excellent — pgxpool only uses the context for the initial AcquireConn; the pool itself is torn down by database.Close(), not by any stored context. This is non-obvious behaviour and documenting it here prevents a future reviewer from reflexively adding defer. The signal context as the parent also handles SIGTERM-during-startup correctly.


CODE-005 · Context-aware orchestrator calls ✅

The type-assertion bridge via contextualOrchestrator is the right trade-off: it fixes the real goroutine-leak path (cross-process HTTP) without churning ~25 in-process test sites. A few notes:

orchestrator_client.go — wrapping non-ctx variants:

func (c *OrchestratorClient) Pause() error {
    return c.PauseCtx(context.Background())
}

Clean. The context.Background() parent is correct here since there's no request context in scope.

dashboard.go — helper dispatch functions:

func pauseOrchestrator(ctx context.Context, orch OrchestratorInterface) error {
    if ctxOrch, ok := orch.(contextualOrchestrator); ok {
        return ctxOrch.PauseCtx(ctx)
    }
    return orch.Pause()
}

This is solid. One edge case worth confirming: if the request context cancels (client disconnect) while PauseCtx is in-flight, the orchestrator may still successfully process the pause command but the API handler will return a context-cancelled error to the (now-disconnected) client. The orchestrator state will be paused even though no acknowledgement is sent. This is the correct behavior for cross-process mutations but worth noting in a future runbook — a retry from a reconnected client might attempt to pause an already-paused system.

Missing tests — this is the one meaningful gap. PauseCtx, ResumeCtx, IsPausedCtx, and GetActiveAgentCountCtx have no dedicated unit tests. The existing integration tests cover the non-ctx wrappers, but a table-driven test verifying that:

  1. A cancelled parent context aborts the outbound HTTP call before 5s
  2. The 5s ceiling is still enforced when the parent context has no deadline

...would prevent regressions if this code is touched later. Not a blocker for merge but worth a follow-up issue.


FE-010 · Performance page error state ✅

Accessibility: role="alert" on the error banner is correct — screen readers will announce it when it appears.

retryAll dependency array — minor simplification opportunity:

// Current
const retryAll = useCallback(() => {
    if (metricsError) refetchMetrics()
    if (equityError) refetchEquity()
    if (pairError) refetchPair()
}, [metricsError, equityError, pairError, refetchMetrics, refetchEquity, refetchPair])

refetch* functions from react-query are stable references (guaranteed by TanStack Query). Calling refetch on a query that isn't in error state is a no-op (it just triggers a background refresh which react-query will handle correctly). You could simplify:

const retryAll = useCallback(() => {
    refetchMetrics()
    refetchEquity()
    refetchPair()
}, [refetchMetrics, refetchEquity, refetchPair])

This removes the boolean error flags from the closure, making the callback stable across error-state transitions. Not a bug as written — the current conditional logic is intentional and produces correct UX — but the simpler version is easier to maintain.

firstErrorMessage chaining is verbose but correct. No issue.

Candlestick/drawdown exclusion is well-justified in the comment. Good call.


Summary

Area Verdict
DB init timeout (CODE-006) ✅ Correct and well-documented
Context-aware orchestrator (CODE-005) ✅ Sound design; missing Ctx unit tests
Performance error banner (FE-010) ✅ Accessible; retryAll deps can be simplified
Breaking changes None
Security concerns None

The only actionable follow-up is unit tests for the *Ctx orchestrator methods — everything else is nits. Looks good to merge.

🤖 Generated with Claude Code

@ajitpratap0 ajitpratap0 merged commit 1f9b999 into main Apr 9, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant