feat(discovery+openshell): DNS-AID agent discovery with OpenShell policy enforcement#469
feat(discovery+openshell): DNS-AID agent discovery with OpenShell policy enforcement#469IngmarVG-IB wants to merge 6 commits into
Conversation
|
Welcome to AICR, @IngmarVG-IB! Thanks for your first pull request. Before review, please ensure:
A maintainer will review this soon. |
mchmarny
left a comment
There was a problem hiding this comment.
Well-structured PR — clean separation between discovery, policy, and API layers. The code follows project conventions (functional options, pkg/errors, pkg/defaults, slog, table-driven tests, create-or-update K8s semantics). Feature gating via env vars is a good rollout strategy.
Key issues to address:
-
SSRF bypass via hostname (fetcher.go) — The private IP check only catches raw IP literals in URLs. A hostname resolving to
127.0.0.1or a link-local address passes validation. Since policy URIs come from DNS records (untrusted input), this needs aDialContext-level check or at minimum a documented limitation. -
Silent data loss on context cancellation (resolver.go) —
DiscoverAllswallows all errors including context cancellation, returning an empty list withnilerror. Callers can't distinguish "no agents" from "timed out mid-discovery." -
Confusing dual-return on fail-open (guard.go) —
Checkreturns(allowed=true, err!=nil)on fetch failure. This contract is easy to misuse. -
bapVersionsv-prefix handling (types.go) — Av-prefixed version likev2.1.0producesmcp/v2instead ofmcp/2.
Minor/cleanup:
- Dead code:
g.Wait()error check inDiscoverAllcan never trigger - Unnecessary loop variable capture (Go 1.22+ scoping)
- Duplicated
startMockDNStest helper across packages PORTenv var parsed in two places.claude/agents/k8s-discovery.mdmay be unintentional
Overall the architecture is solid. The SSRF issue is the most important one to fix before merge.
|
@IngmarVG-IB this PR now has merge conflicts with |
3b66bf9 to
ab2b0f4
Compare
Address reviewer (mchmarny) findings on PR NVIDIA#469: Security: - openshell/fetcher: close SSRF hostname bypass by adding a safeDialContext on the HTTP transport that rejects connections to private/loopback IPs even when the URL hostname resolves to one. Adds isBlockedIP helper and a regression test that fails closed against an httptest TLS server. Correctness: - discovery/resolver: propagate context cancellation in DiscoverAll so callers can distinguish "no agents" from "timed out mid-discovery". Remove the dead g.Wait() error path now that goroutines genuinely return ctx.Err() on cancellation. - openshell/guard: simplify the Check contract — fetch failures fail open with (Allowed=true, nil error) rather than the dual-return that was easy to misuse. Return an error only on programmer errors (nil record). Update agents.go to match. - openshell/guard: remove panic from NewGuard; document that callers must validate via ValidMode before construction (server.go already does this). - discovery/types: strip leading semver "v" prefix in bapVersions so "v2.1.0" produces "mcp/2" rather than "mcp/v2". Cleanup: - discovery/resolver: remove redundant `entry := entry` loop variable capture (Go 1.22+ scoping makes it unnecessary). - openshell/evaluator: restructure rate_limits to a clearer switch (violation vs. warning) so future map changes do the right thing. - server/config + api/server: extract a single ResolvePort helper so agent registration and the HTTP server agree on the bound port. - internal/testdns: extract the duplicated startMockDNS helper into a shared internal package; pkg/discovery and pkg/api now both import it. - openshell/fetcher: document the singleflight + context coalescing trade-off (3s timeout makes shared cancellation acceptable). - Remove .claude/agents/k8s-discovery.md (local Claude Code config, unintentionally committed). Tests: - New: TestFetcherSSRFDialBlocksPrivateIP, TestIsBlockedIP, TestGuardNilRecordReturnsError, expanded TestBapVersions cases. - Updated: TestGuardFailOpenOnFetchError now asserts (Allowed=true, nil) rather than the old dual-return; TestGuardInvalidModePanics removed (no longer panics).
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Enterprise Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (31)
📝 Walkthrough<review_stack_artifact> </review_stack_artifact> 🚥 Pre-merge checks | ✅ 4✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 24
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/server/server.go (1)
163-172:⚠️ Potential issue | 🟡 MinorReadiness flips to
truebefore the listener is actually bound — small but real race.
s.httpServer.ListenAndServe()runs in a goroutine, and the listener may not yet be bound whens.setReady(true)runs on Line 172. In a Kubernetes readiness-probe scenario,/readycould return 200 while connections are still being refused, causing flapping or 502s during a rolling restart. Notably, the newlifecycle_test.gotests poll TCP withnet.DialTimeoutrather than checking readiness state, which suggests this gap is already known.Bind the listener synchronously before going async, e.g.:
♻️ Proposed fix sketch
- // Start server in goroutine - errChan := make(chan error, 1) - go func() { - if err := s.httpServer.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) { - errChan <- err - } - }() - - // Mark ready after hooks and listener are up - s.setReady(true) + // Bind listener synchronously so readiness reflects actual accept state + ln, err := net.Listen("tcp", s.httpServer.Addr) + if err != nil { + return aicrerrors.Wrap(aicrerrors.ErrCodeInternal, "listen failed", err) + } + + errChan := make(chan error, 1) + go func() { + if err := s.httpServer.Serve(ln); err != nil && !errors.Is(err, http.ErrServerClosed) { + errChan <- err + } + }() + + // Mark ready after hooks succeeded and the listener is actually bound + s.setReady(true)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/server/server.go` around lines 163 - 172, The readiness is set before the HTTP listener is bound; change startup to synchronously create/bind the listener and only set readiness after bind succeeds, then serve in a goroutine: create a net.Listener (or call s.httpServer.Listen on a previously created listener) and check for bind errors before calling s.setReady(true), then start serving with s.httpServer.Serve(listener) inside the existing goroutine and forward Serve errors to errChan (preserve the existing ListenAndServe error handling logic but use Serve to avoid the race). Ensure you reference and modify the existing s.httpServer.ListenAndServe usage, errChan, and s.setReady calls so readiness flips only after successful listener bind.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/api/agents_test.go`:
- Line 52: Replace the needless fmt.Sprintf call used for the Target field —
change Target: fmt.Sprintf("aicrd.default.svc.cluster.local.") to use the string
literal directly and remove the fmt import if that was its last usage; look for
the fmt.Sprintf usage in the test (Target field) and the fmt import at the top
of the file to update accordingly.
In `@pkg/api/agents.go`:
- Around line 93-128: The per-agent serial policy checks in the loop over
records (where resp := agentResponse{...} and guard.Check(ctx, rec) is called)
can block under fanout; refactor to perform guard.Check concurrently using
errgroup.WithContext(ctx) and a small concurrency semaphore (e.g., buffered
channel) to cap parallelism, collect results back into the corresponding
agentResponse (including building policyResultSummary from result.Violations)
and attach resp.PolicyResult; ensure you propagate context cancellation and
handle individual check errors by logging them (as currently done) without
failing the whole handler so the outer handler still respects
DiscoveryHandlerTimeout.
In `@pkg/api/server.go`:
- Line 109: The current check if os.Getenv("AICR_DISCOVERY_ENABLED") == "true"
rejects other common truthy values; replace this strict comparison with
strconv.ParseBool(os.Getenv("AICR_DISCOVERY_ENABLED")) and use the returned bool
to gate the discovery block (the same if that currently encloses discovery
setup). If ParseBool returns an error, log a warning (using the existing logger
used in this file) that the env value could not be parsed and treat it as false;
proceed only when the parsed bool is true. Ensure you import strconv and keep
the AICR_DISCOVERY_ENABLED env var name and the existing discovery setup code
intact.
- Around line 134-151: podNamespace currently returns the raw bytes from the
serviceaccount namespace file which can include trailing newlines and
whitespace; update podNamespace to trim whitespace from the file contents (use
strings.TrimSpace on the string(data)) before returning and ensure the "strings"
package is added to imports so discoveryDomain (which calls podNamespace)
receives a clean namespace string.
In `@pkg/discovery/publisher.go`:
- Around line 192-204: The create-then-update path in ensureAgentConfigMap
currently does an Update on IsAlreadyExists without re-fetching, which can race
and cause Conflict or clobbering; change the IsAlreadyExists branch so you first
Get the existing ConfigMap via
p.clientset.CoreV1().ConfigMaps(p.namespace).Get(ctx, cmName,
metav1.GetOptions{}), merge the labels/annotations onto that existing object,
copy its ResourceVersion into your update object, then call Update; wrap the
Update in a bounded retry loop that retries on k8serrors.IsConflict (similar to
the retry logic used by updateIndex) to handle concurrent modifications and
avoid lost updates.
In `@pkg/discovery/resolver_test.go`:
- Line 237: Replace the unnecessary call
fmt.Sprintf("aicrd.default.svc.cluster.local.") used as the Target value with
the plain string literal "aicrd.default.svc.cluster.local." (i.e., remove
fmt.Sprintf usage) and then remove the unused fmt import from
pkg/discovery/resolver_test.go so the test no longer imports fmt for no reason.
In `@pkg/discovery/resolver.go`:
- Around line 147-184: In ListIndex, add context cancellation checks inside the
outer loop over resp.Answer and the inner loop over txt.Txt so parsing can abort
promptly if ctx is cancelled: use a non-blocking select on ctx.Done() (or check
ctx.Err()) at the start of each loop iteration and return nil, ctx.Err() (or an
appropriate wrapped error) when cancelled; update handling around variables
resp, rr, txt, and s to bail out cleanly instead of continuing to parse.
- Around line 42-48: The doc comment on NewDiscoverer is misleading: it claims
the default DNS server is "the system resolver at 127.0.0.1:53" but the code
hard-codes dnsServer = "127.0.0.1:53"; either update the comment to say it
defaults to localhost (127.0.0.1:53) or change initialization to parse the
system resolver via miekg/dns.ClientConfigFromFile("/etc/resolv.conf") and set
Discoverer.dnsServer from the first nameserver with a fallback to
"127.0.0.1:53"; locate NewDiscoverer and the Discoverer.dnsServer field to apply
the fix and keep defaults.DiscoveryDNSTimeout and dns.Client usage unchanged.
- Around line 153-161: After the ExchangeContext call, add a defensive nil check
for resp before reading resp.Rcode: if resp is nil, return a wrapped
errors.ErrCodeUnavailable (with a message like "nil DNS response for <fqdn>") to
avoid a panic; apply the same nil-check pattern in querySVCB where
ExchangeContext is used so both places guard against nil responses from
d.dnsClient.ExchangeContext before accessing resp.Rcode.
In `@pkg/discovery/svcb.go`:
- Around line 41-60: ParseSVCBAnswer currently selects the first-seen dns.SVCB
with the lowest svcb.Priority and leaves ties non-deterministic; update
ParseSVCBAnswer to break ties when svcb.Priority == best.Priority by applying a
deterministic tiebreaker (e.g., compare svcb.TargetName lexicographically or
compare a canonicalized set of SvcParams) or implement RFC 9460 §2.4.3
randomized selection (seeded per-query) before calling parseSVCBRecord; ensure
the comparison logic references msg.Answer, dns.SVCB, svcb.Priority,
best.Priority and still skips alias mode (Priority == 0).
- Around line 97-98: The BAP parsing currently does params.BAP =
strings.Split(val, ",") which makes an empty string produce []string{""}; change
the logic in the SvcParamKeyBAP handling so that when val == "" (or
strings.TrimSpace(val) == "") you set params.BAP to nil or an empty slice,
otherwise split the value; update the SvcParamKeyBAP branch that assigns
params.BAP to perform this check so downstream checks like
AgentRegistration.bapVersions() using len(r.BAP) behave correctly.
In `@pkg/errors/errors.go`:
- Around line 41-44: The constant ErrCodePolicyDenied is defined but unused;
either remove ErrCodePolicyDenied from pkg/errors/errors.go or add a consumer
that returns/checks it (e.g., in the OpenShell policy evaluation path). If
removing, delete the ErrCodePolicyDenied line and keep ErrCodePolicyFetch which
is used by pkg/openshell/fetcher.go; if adding a consumer, update the OpenShell
policy evaluation or HTTP strict-mode handler to return an errors.Error with
Code set to ErrCodePolicyDenied when a policy evaluation explicitly denies a
request, and ensure callers check for ErrCodePolicyDenied where denial semantics
are handled.
In `@pkg/openshell/evaluator.go`:
- Around line 106-121: The current appliesToLayer("rate_limits", layer) branch
unconditionally appends a deny PolicyViolation for LayerTarget; change this so
we don't silently deny: either implement a real comparison between
rules.RateLimits and runtime counters on PolicyContext (e.g.,
requestsPerMinute/requestsPerHour fields) inside Evaluate (so you compare
rules.RateLimits thresholds to the PolicyContext counters and only append a
violation when exceeded), or if those counters aren't available yet, replace the
unconditional violation with an explicit “not implemented” outcome (e.g., append
a PolicyViolation or warning with Rule:"rate_limits", Detail:"rate limits
enforcement not implemented for LayerTarget" and/or log a clear TODO) so
Guard.Check/Evaluate won't silently deny once LayerTarget enforcement is wired.
- Around line 273-287: In parseTLSVersion, avoid shadowing predeclared
identifiers by assigning directly to the named return variables (major, minor,
ok) instead of creating local vars like maj/min; change "maj, err :=
strconv.Atoi(parts[0])" to "major, err = strconv.Atoi(parts[0])" and "min, err
:= strconv.Atoi(parts[1])" to "minor, err = strconv.Atoi(parts[1])", keep the
existing error checks, and return the named returns (major, minor, true) so no
local min/maj variables are introduced.
In `@pkg/openshell/fetcher_test.go`:
- Around line 151-173: The test currently fails at validateFetchURL because
srv.URL uses a raw IP; change the test so the request URL contains a hostname
(not an IP) that will resolve to a private/loopback address at dial time so
safeDialContext is exercised. Concretely: keep using NewFetcher() and the custom
transport from newSafeTransport(false), but set f.client.Transport.DialContext
to a custom DialContext that resolves the test hostname (e.g., "localtest.me" or
"example.test") to the httptest server's loopback address/port
(srv.Listener.Addr()) so DNS resolution happens at dial and safeDialContext is
invoked; ensure the URL passed to f.Fetch uses that hostname (not 127.0.0.1) so
validateFetchURL does not reject it up front. Reference functions/values:
TestFetcherSSRFDialBlocksPrivateIP, validateFetchURL, safeDialContext,
NewFetcher, newSafeTransport, f.client.Transport.DialContext, and srv.
In `@pkg/openshell/fetcher.go`:
- Around line 81-88: The http.Client created in the fetcher (where f.client is
set) follows redirects without re-validating the redirected URL, allowing TLS
downgrade or host changes; update the client initialization to set CheckRedirect
to a function that calls validateFetchURL (or at minimum enforces scheme ==
"https") for the redirect Location, and return an error to stop the redirect if
validation fails. Implement the CheckRedirect closure alongside
Timeout/Transport in the f.client initialization so redirects are rejected when
validateFetchURL or an https-only check fails; keep
safeDialContext/newSafeTransport as-is for post-resolution checks.
- Around line 99-117: The current t.DialContext checks IPs via
net.DefaultResolver.LookupIPAddr then calls dialer.DialContext(ctx, network,
addr) which re-resolves the hostname (TOCTOU); instead, after filtering ips with
isBlockedIP and allowPrivate, attempt to dial each validated IP literal directly
(use ipa.IP.String() + the original port via net.JoinHostPort) with the existing
dialer.DialContext to avoid a second DNS lookup; on first successful connection
return it, otherwise try the next IP and finally return the last error if all
fail; keep using the same ctx and network and preserve the original hostname for
TLS/SNI handling (do not alter request Host elsewhere).
In `@pkg/openshell/guard.go`:
- Around line 132-136: Guard.Check() is building a PolicyContext (pctx) with
only CallerID, CallerDomain and Protocol which means Layer 1 evaluation will see
zero-values for AuthType, HasMutualTLS, TLSVersion, DNSSECValidated,
CallerTrustScore, Method, Intent, GeoCountry and ConsentToken and thus
ModeStrict will unconditionally deny policies that require those fields; update
the evaluator in Guard.Check()/the policy evaluation path to treat
unavailable/zero-valued signals as "unknown" instead of outright
failing—specifically, detect when pctx fields are not populated and either skip
those specific checks (emit a warning/log when ModePermissive or when skipping
in ModeStrict) or convert them to non-failing tri-state checks so
required_auth_types, require_dnssec, require_mutual_tls, min_tls_version,
required_caller_trust_score, allowed_methods, allowed_intents, geo_restrictions
and consent_required do not cause automatic deny at Layer 1; include clear log
messages referencing PolicyContext and ModeStrict/ModePermissive to aid
debugging.
In `@pkg/openshell/types.go`:
- Around line 60-80: The struct comment for PolicyRules currently states "All
fields are pointers or slices to distinguish 'not set' from zero values" but
RequireDNSSEC, RequireMutualTLS, and ConsentRequired are plain bools; either
update the comment to call out these three fields as intentional plain booleans
(and briefly state that "not set" == false semantics) or change those three
fields (RequireDNSSEC, RequireMutualTLS, ConsentRequired) to *bool and add
`omitempty` to their json tags so consumers can distinguish unset vs false;
apply the chosen change consistently in pkg/openshell/types.go near the
PolicyRules type.
In `@pkg/server/config.go`:
- Around line 39-51: Replace the tolerant fmt.Sscanf parsing in ResolvePort with
strict numeric parsing: import strconv, use strconv.Atoi(portStr) to parse the
env var (handle the returned error and the parsed port bounds as before), and
keep logging the error and returning defaultPort on any parse failure; retain
fmt import only if other functions (e.g., parseConfig/SHUTDOWN_TIMEOUT_SECONDS)
still need it.
In `@pkg/server/lifecycle_test.go`:
- Around line 202-230: The test TestMultipleHooksExecuteInOrder currently
invokes hooks directly; change it to register two OnStart hooks that append to a
shared slice and then call s.Start(context.Background()) (using the same s
created with WithOnStart) to drive hook execution via Start(); after Start
returns assert the slice length and ordering (1 then 2) and fail the test if
Start returns an error. Also add a companion test that registers multiple
WithOnShutdown hooks, calls s.Shutdown(ctx) (or the appropriate shutdown entry
point) and asserts shutdown hooks run in registration order and that execution
stops on the first error where applicable. Ensure you reference s.config.OnStart
/ OnShutdown only to confirm registration counts if needed but rely on
Start()/Shutdown() for actual execution verification.
- Around line 27-36: The helper freePort currently ignores the error returned by
l.Close(); modify freePort to capture the Close() error from the net.Listener
variable l and fail the test if Close() returns an error (e.g., call err =
l.Close() and t.Fatalf/t.Errorf with the error). Update the freePort function so
the listener close is checked and reported, referencing the freePort function,
the listener variable l, and the l.Close() call.
In `@pkg/server/server.go`:
- Around line 156-161: The OnStart hook loop (iterating s.config.OnStart and
calling each hook(ctx)) can block indefinitely; wrap each hook invocation in a
bounded context using context.WithTimeout and cancel (e.g., derive ctx with a
startup timeout constant like defaults.StartupTimeout) and pass that derived
context into hook(ctxWithTimeout), then handle the case where the timeout causes
the hook to return a context.DeadlineExceeded error and wrap/return it as
before; additionally update the LifecycleHook documentation to state that hooks
must respect the provided context (or, if you prefer the alternate contract,
require each hook to set its own timeout and document that) so callers know the
enforced timeout policy.
---
Outside diff comments:
In `@pkg/server/server.go`:
- Around line 163-172: The readiness is set before the HTTP listener is bound;
change startup to synchronously create/bind the listener and only set readiness
after bind succeeds, then serve in a goroutine: create a net.Listener (or call
s.httpServer.Listen on a previously created listener) and check for bind errors
before calling s.setReady(true), then start serving with
s.httpServer.Serve(listener) inside the existing goroutine and forward Serve
errors to errChan (preserve the existing ListenAndServe error handling logic but
use Serve to avoid the race). Ensure you reference and modify the existing
s.httpServer.ListenAndServe usage, errChan, and s.setReady calls so readiness
flips only after successful listener bind.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 25f44d89-fc1e-4cb9-afe0-c7e85f6ea6d0
⛔ Files ignored due to path filters (271)
go.sumis excluded by!**/*.sumvendor/github.com/miekg/dns/.codecov.ymlis excluded by!vendor/**vendor/github.com/miekg/dns/.gitignoreis excluded by!vendor/**vendor/github.com/miekg/dns/AUTHORSis excluded by!vendor/**vendor/github.com/miekg/dns/CODEOWNERSis excluded by!vendor/**vendor/github.com/miekg/dns/CONTRIBUTORSis excluded by!vendor/**vendor/github.com/miekg/dns/COPYRIGHTis excluded by!vendor/**vendor/github.com/miekg/dns/LICENSEis excluded by!vendor/**vendor/github.com/miekg/dns/Makefile.fuzzis excluded by!vendor/**vendor/github.com/miekg/dns/Makefile.releaseis excluded by!vendor/**vendor/github.com/miekg/dns/README.mdis excluded by!vendor/**vendor/github.com/miekg/dns/acceptfunc.gois excluded by!vendor/**vendor/github.com/miekg/dns/client.gois excluded by!vendor/**vendor/github.com/miekg/dns/clientconfig.gois excluded by!vendor/**vendor/github.com/miekg/dns/dane.gois excluded by!vendor/**vendor/github.com/miekg/dns/defaults.gois excluded by!vendor/**vendor/github.com/miekg/dns/dns.gois excluded by!vendor/**vendor/github.com/miekg/dns/dnssec.gois excluded by!vendor/**vendor/github.com/miekg/dns/dnssec_keygen.gois excluded by!vendor/**vendor/github.com/miekg/dns/dnssec_keyscan.gois excluded by!vendor/**vendor/github.com/miekg/dns/dnssec_privkey.gois excluded by!vendor/**vendor/github.com/miekg/dns/doc.gois excluded by!vendor/**vendor/github.com/miekg/dns/duplicate.gois excluded by!vendor/**vendor/github.com/miekg/dns/edns.gois excluded by!vendor/**vendor/github.com/miekg/dns/format.gois excluded by!vendor/**vendor/github.com/miekg/dns/fuzz.gois excluded by!vendor/**vendor/github.com/miekg/dns/generate.gois excluded by!vendor/**vendor/github.com/miekg/dns/hash.gois excluded by!vendor/**vendor/github.com/miekg/dns/labels.gois excluded by!vendor/**vendor/github.com/miekg/dns/listen_no_socket_options.gois excluded by!vendor/**vendor/github.com/miekg/dns/listen_socket_options.gois excluded by!vendor/**vendor/github.com/miekg/dns/msg.gois excluded by!vendor/**vendor/github.com/miekg/dns/msg_helpers.gois excluded by!vendor/**vendor/github.com/miekg/dns/msg_truncate.gois excluded by!vendor/**vendor/github.com/miekg/dns/nsecx.gois excluded by!vendor/**vendor/github.com/miekg/dns/privaterr.gois excluded by!vendor/**vendor/github.com/miekg/dns/reverse.gois excluded by!vendor/**vendor/github.com/miekg/dns/sanitize.gois excluded by!vendor/**vendor/github.com/miekg/dns/scan.gois excluded by!vendor/**vendor/github.com/miekg/dns/scan_rr.gois excluded by!vendor/**vendor/github.com/miekg/dns/serve_mux.gois excluded by!vendor/**vendor/github.com/miekg/dns/server.gois excluded by!vendor/**vendor/github.com/miekg/dns/sig0.gois excluded by!vendor/**vendor/github.com/miekg/dns/smimea.gois excluded by!vendor/**vendor/github.com/miekg/dns/svcb.gois excluded by!vendor/**vendor/github.com/miekg/dns/tlsa.gois excluded by!vendor/**vendor/github.com/miekg/dns/tools.gois excluded by!vendor/**vendor/github.com/miekg/dns/tsig.gois excluded by!vendor/**vendor/github.com/miekg/dns/types.gois excluded by!vendor/**vendor/github.com/miekg/dns/udp.gois excluded by!vendor/**vendor/github.com/miekg/dns/udp_no_control.gois excluded by!vendor/**vendor/github.com/miekg/dns/update.gois excluded by!vendor/**vendor/github.com/miekg/dns/version.gois excluded by!vendor/**vendor/github.com/miekg/dns/xfr.gois excluded by!vendor/**vendor/github.com/miekg/dns/zduplicate.gois excluded by!vendor/**vendor/github.com/miekg/dns/zmsg.gois excluded by!vendor/**vendor/github.com/miekg/dns/ztypes.gois excluded by!vendor/**vendor/github.com/stretchr/testify/require/doc.gois excluded by!vendor/**vendor/github.com/stretchr/testify/require/forward_requirements.gois excluded by!vendor/**vendor/github.com/stretchr/testify/require/require.gois excluded by!vendor/**vendor/github.com/stretchr/testify/require/require.go.tmplis excluded by!vendor/**vendor/github.com/stretchr/testify/require/require_forward.gois excluded by!vendor/**vendor/github.com/stretchr/testify/require/require_forward.go.tmplis excluded by!vendor/**vendor/github.com/stretchr/testify/require/requirements.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/asm.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/constants.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/doc.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/instructions.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/setter.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/vm.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/vm_instructions.gois excluded by!vendor/**vendor/golang.org/x/net/internal/iana/const.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_bsd.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_linux_32bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_linux_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_solaris_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_stub.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_unix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_zos_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/complete_dontwait.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/complete_nodontwait.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/empty.sis excluded by!vendor/**vendor/golang.org/x/net/internal/socket/error_unix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/error_windows.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/iovec_32bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/iovec_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/iovec_solaris_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/iovec_stub.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/mmsghdr_stub.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/mmsghdr_unix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_bsd.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_bsdvar.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_linux.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_linux_32bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_linux_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_openbsd.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_solaris_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_stub.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_zos_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/norace.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/race.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/rawconn.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/rawconn_mmsg.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/rawconn_msg.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/rawconn_nommsg.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/rawconn_nomsg.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/socket.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_bsd.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_const_unix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_386.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_386.sis excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_arm.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_loong64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_mips.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_mips64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_mips64le.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_mipsle.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_ppc.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_ppc64le.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_s390x.sis excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_netbsd.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_posix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_stub.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_unix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_windows.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_zos_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_zos_s390x.sis excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_aix_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_darwin_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_darwin_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_dragonfly_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_freebsd_386.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_freebsd_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_freebsd_arm.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_freebsd_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_freebsd_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_386.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_arm.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_loong64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_mips.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_mips64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_mips64le.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_mipsle.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_ppc.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_ppc64le.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_netbsd_386.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_netbsd_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_netbsd_arm.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_netbsd_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_386.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_arm.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_mips64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_solaris_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_zos_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/batch.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_bsd.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_pktinfo.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_unix.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_windows.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_zos.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/dgramopt.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/doc.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/endpoint.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/genericopt.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/header.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/helper.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/iana.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/icmp.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/icmp_linux.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/icmp_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/packet.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/payload.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/payload_cmsg.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/payload_nocmsg.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sockopt.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sockopt_posix.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sockopt_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_aix.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_asmreq.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_asmreq_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_asmreqn.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_asmreqn_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_bpf.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_bpf_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_bsd.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_darwin.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_dragonfly.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_freebsd.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_linux.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_solaris.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_ssmreq.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_ssmreq_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_windows.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_zos.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_aix_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_darwin.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_dragonfly.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_freebsd_386.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_freebsd_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_freebsd_arm.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_freebsd_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_freebsd_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_386.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_arm.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_loong64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_mips.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_mips64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_mips64le.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_mipsle.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_ppc.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_ppc64le.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_netbsd.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_openbsd.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_solaris.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_zos_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/batch.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/control.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/control_rfc2292_unix.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/control_rfc3542_unix.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/control_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/control_unix.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/control_windows.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/dgramopt.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/doc.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/endpoint.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/genericopt.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/header.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/helper.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/iana.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/icmp.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/icmp_bsd.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/icmp_linux.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/icmp_solaris.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/icmp_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/icmp_windows.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/icmp_zos.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/payload.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/payload_cmsg.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/payload_nocmsg.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/sockopt.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/sockopt_posix.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/sockopt_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/sys_aix.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/sys_asmreq.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/sys_asmreq_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/sys_bpf.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/sys_bpf_stub.gois excluded by!vendor/**
📒 Files selected for processing (29)
go.modinternal/testdns/testdns.gopkg/api/agents.gopkg/api/agents_test.gopkg/api/server.gopkg/defaults/timeouts.gopkg/discovery/doc.gopkg/discovery/options.gopkg/discovery/publisher.gopkg/discovery/publisher_test.gopkg/discovery/resolver.gopkg/discovery/resolver_test.gopkg/discovery/svcb.gopkg/discovery/svcb_test.gopkg/discovery/testhelper_test.gopkg/discovery/types.gopkg/errors/errors.gopkg/openshell/doc.gopkg/openshell/evaluator.gopkg/openshell/evaluator_test.gopkg/openshell/fetcher.gopkg/openshell/fetcher_test.gopkg/openshell/guard.gopkg/openshell/guard_test.gopkg/openshell/options.gopkg/openshell/types.gopkg/server/config.gopkg/server/lifecycle_test.gopkg/server/server.go
|
@mchmarny thanks for the thorough review. All findings addressed in Key issues
Cleanup
Acknowledged but not changed
All tests pass with |
|
@CodeRabbit thanks for the deep dive. All 24 threads addressed in Security (must-fix)
Correctness
Cleanup / nits
All 6 affected packages pass |
|
Only users with a collaborator, contributor, member, or owner role can interact with CodeRabbit. |
|
@IngmarVG-IB looks like most of the issues are now addressed. Only the readiness-bind race (outside-diff) is fully unaddressed; the Layer 1 strict-mode issue has been downgraded to documented limitation rather than fixed. So once you rebase (dirty state) we should be good to merge. Thanks for your work on this. |
a864d66 to
be4e034
Compare
Address reviewer (mchmarny) findings on PR NVIDIA#469: Security: - openshell/fetcher: close SSRF hostname bypass by adding a safeDialContext on the HTTP transport that rejects connections to private/loopback IPs even when the URL hostname resolves to one. Adds isBlockedIP helper and a regression test that fails closed against an httptest TLS server. Correctness: - discovery/resolver: propagate context cancellation in DiscoverAll so callers can distinguish "no agents" from "timed out mid-discovery". Remove the dead g.Wait() error path now that goroutines genuinely return ctx.Err() on cancellation. - openshell/guard: simplify the Check contract — fetch failures fail open with (Allowed=true, nil error) rather than the dual-return that was easy to misuse. Return an error only on programmer errors (nil record). Update agents.go to match. - openshell/guard: remove panic from NewGuard; document that callers must validate via ValidMode before construction (server.go already does this). - discovery/types: strip leading semver "v" prefix in bapVersions so "v2.1.0" produces "mcp/2" rather than "mcp/v2". Cleanup: - discovery/resolver: remove redundant `entry := entry` loop variable capture (Go 1.22+ scoping makes it unnecessary). - openshell/evaluator: restructure rate_limits to a clearer switch (violation vs. warning) so future map changes do the right thing. - server/config + api/server: extract a single ResolvePort helper so agent registration and the HTTP server agree on the bound port. - internal/testdns: extract the duplicated startMockDNS helper into a shared internal package; pkg/discovery and pkg/api now both import it. - openshell/fetcher: document the singleflight + context coalescing trade-off (3s timeout makes shared cancellation acceptable). - Remove .claude/agents/k8s-discovery.md (local Claude Code config, unintentionally committed). Tests: - New: TestFetcherSSRFDialBlocksPrivateIP, TestIsBlockedIP, TestGuardNilRecordReturnsError, expanded TestBapVersions cases. - Updated: TestGuardFailOpenOnFetchError now asserts (Allowed=true, nil) rather than the old dual-return; TestGuardInvalidModePanics removed (no longer panics).
Address the May 11 PR NVIDIA#469 follow-up review. Readiness-bind race (the only fully-unaddressed item from round 2): pkg/server/server.go Start() previously kicked off ListenAndServe in a goroutine and immediately flipped readiness on the main goroutine, opening a window where /ready returned 200 while the kernel was still binding the socket. Probes during that window would 200 then get connection-refused on the next request. Bind the listener synchronously (net.ListenConfig.Listen — context- aware, satisfies noctx) BEFORE setReady(true), then start http.Server.Serve(ln) in a goroutine. If the bind fails the server never advertises ready. New regression test TestReadyImpliesListenerBound asserts that a dial against the port succeeds the moment ready flips to true. golangci-lint failures cleared (the CI gate this PR currently fails on): - exhaustive (3 sites): pkg/errors/exitcode.go, pkg/server/errors.go — ErrCodePolicyFetch is now covered in the exit-code, HTTP-status, and retryable switches. Treated as upstream-unavailable (ExitUnavailable / 503 / retryable=true) since policy fetch hits an external URI. - gosec G115 (int→uint16): pkg/server/config.go ResolvePort now returns uint16 directly, matching net's port representation. Single cast still needed in parseConfig because config.Port is int for back-compat with existing tests that set cfg.Port = port; gated with a nolint comment since ResolvePort guarantees the range. pkg/api/server.go drops the int→uint16 cast at the registration site. - gosec G704 (SSRF taint): pkg/openshell/fetcher.go Fetch annotated with nolint pointing at the three layers already mitigating the finding (URL pre-flight, post-resolution dial guard, redirect check) and the regression test that exercises the dial path. Also unblocks the CodeQL job which was firing on the same taint analysis. - noctx: internal/testdns/testdns.go Start switches from net.ListenPacket to (*net.ListenConfig).ListenPacket with context.Background() per noctx lint requirement. Layer 1 strict-mode policy fields remain a documented limitation in Guard's doc comment (per the reviewer's "downgraded to documented limitation rather than fixed" note). Tracked in a follow-up issue.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/api/agents.go`:
- Around line 40-41: The dnsNameRegex currently allows underscores in labels
(character class `[a-z0-9_]`) which is non‑standard; update the regex named
dnsNameRegex so label character classes use `[a-z0-9-]` (and ensure the
leading-character class likewise disallows `_`) to enforce RFC 1035 rules for
domain labels, or if underscores were intentionally permitted (e.g., for
Kubernetes SRV/prefixed names), add a concise inline comment above dnsNameRegex
explaining that deliberate deviation and why; then run/update any unit tests
that validate domain parsing to reflect the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: f736c478-2c31-4e82-9f11-69a42bff6271
⛔ Files ignored due to path filters (1)
go.sumis excluded by!**/*.sum
📒 Files selected for processing (5)
go.modinternal/testdns/testdns.gopkg/api/agents.gopkg/api/agents_test.gopkg/api/server.go
| // Each label: 1-63 chars of [a-z0-9_-], total <= 253 chars. | ||
| var dnsNameRegex = regexp.MustCompile(`^[a-z0-9_][a-z0-9_\-]{0,62}(\.[a-z0-9_][a-z0-9_\-]{0,62})*\.?$`) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Clarify underscore allowance in DNS label validation.
The regex permits underscores ([a-z0-9_]) in DNS labels, which is non-standard per RFC 1035 (labels should be [a-z0-9-] only). Underscores are valid in SRV-style prefixed names (e.g., _svc._tcp) but not in general domain labels. Since this validates the user-supplied domain query parameter (a domain suffix, not an underscore-prefixed service name), the leniency may be unintentional. If this is deliberate for Kubernetes compatibility or another reason, a brief inline comment would clarify the intent.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/api/agents.go` around lines 40 - 41, The dnsNameRegex currently allows
underscores in labels (character class `[a-z0-9_]`) which is non‑standard;
update the regex named dnsNameRegex so label character classes use `[a-z0-9-]`
(and ensure the leading-character class likewise disallows `_`) to enforce RFC
1035 rules for domain labels, or if underscores were intentionally permitted
(e.g., for Kubernetes SRV/prefixed names), add a concise inline comment above
dnsNameRegex explaining that deliberate deviation and why; then run/update any
unit tests that validate domain parsing to reflect the change.
|
@mchmarny thanks — pushed round-2 fixes as Readiness-bind race (the only fully-unaddressed item)
Lint failures cleared (this PR's
Layer 1 strict-mode — kept as a documented limitation in Verified locally
The pre-existing |
Address reviewer (mchmarny) findings on PR NVIDIA#469: Security: - openshell/fetcher: close SSRF hostname bypass by adding a safeDialContext on the HTTP transport that rejects connections to private/loopback IPs even when the URL hostname resolves to one. Adds isBlockedIP helper and a regression test that fails closed against an httptest TLS server. Correctness: - discovery/resolver: propagate context cancellation in DiscoverAll so callers can distinguish "no agents" from "timed out mid-discovery". Remove the dead g.Wait() error path now that goroutines genuinely return ctx.Err() on cancellation. - openshell/guard: simplify the Check contract — fetch failures fail open with (Allowed=true, nil error) rather than the dual-return that was easy to misuse. Return an error only on programmer errors (nil record). Update agents.go to match. - openshell/guard: remove panic from NewGuard; document that callers must validate via ValidMode before construction (server.go already does this). - discovery/types: strip leading semver "v" prefix in bapVersions so "v2.1.0" produces "mcp/2" rather than "mcp/v2". Cleanup: - discovery/resolver: remove redundant `entry := entry` loop variable capture (Go 1.22+ scoping makes it unnecessary). - openshell/evaluator: restructure rate_limits to a clearer switch (violation vs. warning) so future map changes do the right thing. - server/config + api/server: extract a single ResolvePort helper so agent registration and the HTTP server agree on the bound port. - internal/testdns: extract the duplicated startMockDNS helper into a shared internal package; pkg/discovery and pkg/api now both import it. - openshell/fetcher: document the singleflight + context coalescing trade-off (3s timeout makes shared cancellation acceptable). - Remove .claude/agents/k8s-discovery.md (local Claude Code config, unintentionally committed). Tests: - New: TestFetcherSSRFDialBlocksPrivateIP, TestIsBlockedIP, TestGuardNilRecordReturnsError, expanded TestBapVersions cases. - Updated: TestGuardFailOpenOnFetchError now asserts (Allowed=true, nil) rather than the old dual-return; TestGuardInvalidModePanics removed (no longer panics).
Address the May 11 PR NVIDIA#469 follow-up review. Readiness-bind race (the only fully-unaddressed item from round 2): pkg/server/server.go Start() previously kicked off ListenAndServe in a goroutine and immediately flipped readiness on the main goroutine, opening a window where /ready returned 200 while the kernel was still binding the socket. Probes during that window would 200 then get connection-refused on the next request. Bind the listener synchronously (net.ListenConfig.Listen — context- aware, satisfies noctx) BEFORE setReady(true), then start http.Server.Serve(ln) in a goroutine. If the bind fails the server never advertises ready. New regression test TestReadyImpliesListenerBound asserts that a dial against the port succeeds the moment ready flips to true. golangci-lint failures cleared (the CI gate this PR currently fails on): - exhaustive (3 sites): pkg/errors/exitcode.go, pkg/server/errors.go — ErrCodePolicyFetch is now covered in the exit-code, HTTP-status, and retryable switches. Treated as upstream-unavailable (ExitUnavailable / 503 / retryable=true) since policy fetch hits an external URI. - gosec G115 (int→uint16): pkg/server/config.go ResolvePort now returns uint16 directly, matching net's port representation. Single cast still needed in parseConfig because config.Port is int for back-compat with existing tests that set cfg.Port = port; gated with a nolint comment since ResolvePort guarantees the range. pkg/api/server.go drops the int→uint16 cast at the registration site. - gosec G704 (SSRF taint): pkg/openshell/fetcher.go Fetch annotated with nolint pointing at the three layers already mitigating the finding (URL pre-flight, post-resolution dial guard, redirect check) and the regression test that exercises the dial path. Also unblocks the CodeQL job which was firing on the same taint analysis. - noctx: internal/testdns/testdns.go Start switches from net.ListenPacket to (*net.ListenConfig).ListenPacket with context.Background() per noctx lint requirement. Layer 1 strict-mode policy fields remain a documented limitation in Guard's doc comment (per the reviewer's "downgraded to documented limitation rather than fixed" note). Tracked in a follow-up issue.
be4e034 to
02f7453
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
♻️ Duplicate comments (2)
pkg/api/server.go (1)
219-219:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift
/v1/agentsis hard-wired to query127.0.0.1:53.
setupDiscovery()builds the discoverer with noWithDNSServeroverride, so the handler always dials localhost DNS. In most Kubernetes pods/etc/resolv.confpoints at the cluster DNS service IP, not a listener on127.0.0.1, so discovery will fail as soon as this feature is enabled unless every pod also runs a local DNS forwarder.Does github.com/miekg/dns Client.ExchangeContext use the explicit server address passed to it (for example "127.0.0.1:53"), or does it consult /etc/resolv.conf? Also, what nameserver is typically written to /etc/resolv.conf inside a Kubernetes pod?🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/api/server.go` at line 219, The discoverer is being created without a DNS server override so discovery always dials 127.0.0.1:53; update setupDiscovery() to pass a realistic DNS server via discovery.WithDNSServer by reading the nameserver from the host's resolver (e.g., parse /etc/resolv.conf for the first "nameserver" entry) or by exposing a configurable option/environment var and fall back to the parsed resolver; call discovery.NewDiscoverer(..., discovery.WithDNSServer(resolverAddr)) so the discoverer uses the cluster DNS IP (as found in pod /etc/resolv.conf) instead of hard-coded 127.0.0.1:53.pkg/api/agents.go (1)
39-41:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winTighten
domainvalidation to real DNS label syntax.This regex still accepts invalid suffixes like
_foo.exampleandfoo-.example. Sincedomainis a user-supplied domain suffix, not an underscore-prefixed service label, letting those through means we do discovery lookups for names we should reject at the handler boundary.🔧 Suggested regex
-var dnsNameRegex = regexp.MustCompile(`^[a-z0-9_][a-z0-9_\-]{0,62}(\.[a-z0-9_][a-z0-9_\-]{0,62})*\.?$`) +var dnsNameRegex = regexp.MustCompile(`^(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?)(?:\.(?:[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?))*\.?$`)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/api/agents.go` around lines 39 - 41, The current dnsNameRegex accepts labels starting/ending with underscores or hyphens which allows invalid domains; update the dnsNameRegex constant used to validate the user-supplied domain (the `domain` check in pkg/api/agents.go) to a stricter DNS label regex that: forbids underscores, requires each label to start and end with an alphanumeric character, allows hyphens only internally, limits labels to 1–63 chars, allows dot-separated labels up to the 253-char total and an optional trailing dot; replace the existing regexp.MustCompile pattern for dnsNameRegex with that stricter pattern and ensure all places that reference dnsNameRegex continue to use it for handler-side validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/api/server.go`:
- Around line 120-123: The current error handling when calling setupDiscovery()
wraps all errors as ErrCodeInternal and loses existing structured error codes;
change the return to use errors.PropagateOrWrap so that setupDiscovery's coded
errors (e.g., invalid OPENSHELL_MODE) are propagated and only fallback to
ErrCodeInternal when necessary — replace the errors.Wrap call in the
setupDiscovery() error branch with errors.PropagateOrWrap(err,
errors.ErrCodeInternal, "failed to setup discovery") so existing structured
errors are preserved.
In `@pkg/discovery/options.go`:
- Around line 46-52: WithLabels currently writes into p.labels without guarding
against a nil map which can panic if Publisher wasn't initialized; update the
WithLabels(PublisherOption) closure to check if p.labels == nil and if so
allocate a new map before the for loop, ensuring p.labels is always non-nil when
assigning keys (reference the WithLabels function and the Publisher.labels
field).
In `@pkg/discovery/publisher.go`:
- Around line 261-271: ConfigMap listing order is unstable so build a
deterministic index by sorting the entries before joining; after accumulating
entries into the lines slice in the loop that iterates cmList.Items (the block
that sets name/proto and appends fmt.Sprintf("%s:%s", name, proto)), call
sort.Strings(lines) and then compute agentsData := strings.Join(lines, "\n");
also add the required import for the sort package to ensure compilation.
In `@pkg/discovery/resolver.go`:
- Around line 109-153: The current concurrent collection appends discoveries
under a mutex which yields non-deterministic order; change to allocate a
per-input result slot and compact after g.Wait(): iterate with for i, entry :=
range entries and capture i in the closure, create records :=
make([]AgentRecord, len(entries)) plus a parallel filled := make([]bool,
len(entries)) (or store pointers/nils) and inside the g.Go closure write
records[i] = *record and set filled[i] = true under mu after a successful
d.Discover call; after g.Wait() build the returned slice by iterating the filled
flags in order and appending only successful entries so the final order matches
the original entries order.
In `@pkg/openshell/options.go`:
- Around line 77-83: The exported option WithAllowPrivate exposes a test-only
SSRF bypass; remove it from the production API by either making it unexported
(rename WithAllowPrivate -> withAllowPrivate) or moving its implementation into
test-only code (a test helper) so callers cannot disable SSRF checks at runtime;
update any references to WithAllowPrivate in tests to use the new unexported
variant via test package or the test helper, and ensure the Fetcher struct field
allowPrivate remains internal (Fetcher.allowPrivate) with no exported setters.
---
Duplicate comments:
In `@pkg/api/agents.go`:
- Around line 39-41: The current dnsNameRegex accepts labels starting/ending
with underscores or hyphens which allows invalid domains; update the
dnsNameRegex constant used to validate the user-supplied domain (the `domain`
check in pkg/api/agents.go) to a stricter DNS label regex that: forbids
underscores, requires each label to start and end with an alphanumeric
character, allows hyphens only internally, limits labels to 1–63 chars, allows
dot-separated labels up to the 253-char total and an optional trailing dot;
replace the existing regexp.MustCompile pattern for dnsNameRegex with that
stricter pattern and ensure all places that reference dnsNameRegex continue to
use it for handler-side validation.
In `@pkg/api/server.go`:
- Line 219: The discoverer is being created without a DNS server override so
discovery always dials 127.0.0.1:53; update setupDiscovery() to pass a realistic
DNS server via discovery.WithDNSServer by reading the nameserver from the host's
resolver (e.g., parse /etc/resolv.conf for the first "nameserver" entry) or by
exposing a configurable option/environment var and fall back to the parsed
resolver; call discovery.NewDiscoverer(...,
discovery.WithDNSServer(resolverAddr)) so the discoverer uses the cluster DNS IP
(as found in pod /etc/resolv.conf) instead of hard-coded 127.0.0.1:53.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Enterprise
Run ID: 85ab687d-edf7-4916-945a-434f44cd615d
⛔ Files ignored due to path filters (242)
go.sumis excluded by!**/*.sumvendor/github.com/miekg/dns/.codecov.ymlis excluded by!vendor/**vendor/github.com/miekg/dns/.gitignoreis excluded by!vendor/**vendor/github.com/miekg/dns/AUTHORSis excluded by!vendor/**vendor/github.com/miekg/dns/CODEOWNERSis excluded by!vendor/**vendor/github.com/miekg/dns/CONTRIBUTORSis excluded by!vendor/**vendor/github.com/miekg/dns/COPYRIGHTis excluded by!vendor/**vendor/github.com/miekg/dns/LICENSEis excluded by!vendor/**vendor/github.com/miekg/dns/Makefile.fuzzis excluded by!vendor/**vendor/github.com/miekg/dns/Makefile.releaseis excluded by!vendor/**vendor/github.com/miekg/dns/README.mdis excluded by!vendor/**vendor/github.com/miekg/dns/acceptfunc.gois excluded by!vendor/**vendor/github.com/miekg/dns/client.gois excluded by!vendor/**vendor/github.com/miekg/dns/clientconfig.gois excluded by!vendor/**vendor/github.com/miekg/dns/dane.gois excluded by!vendor/**vendor/github.com/miekg/dns/defaults.gois excluded by!vendor/**vendor/github.com/miekg/dns/dns.gois excluded by!vendor/**vendor/github.com/miekg/dns/dnssec.gois excluded by!vendor/**vendor/github.com/miekg/dns/dnssec_keygen.gois excluded by!vendor/**vendor/github.com/miekg/dns/dnssec_keyscan.gois excluded by!vendor/**vendor/github.com/miekg/dns/dnssec_privkey.gois excluded by!vendor/**vendor/github.com/miekg/dns/doc.gois excluded by!vendor/**vendor/github.com/miekg/dns/duplicate.gois excluded by!vendor/**vendor/github.com/miekg/dns/edns.gois excluded by!vendor/**vendor/github.com/miekg/dns/format.gois excluded by!vendor/**vendor/github.com/miekg/dns/fuzz.gois excluded by!vendor/**vendor/github.com/miekg/dns/generate.gois excluded by!vendor/**vendor/github.com/miekg/dns/hash.gois excluded by!vendor/**vendor/github.com/miekg/dns/labels.gois excluded by!vendor/**vendor/github.com/miekg/dns/listen_no_socket_options.gois excluded by!vendor/**vendor/github.com/miekg/dns/listen_socket_options.gois excluded by!vendor/**vendor/github.com/miekg/dns/msg.gois excluded by!vendor/**vendor/github.com/miekg/dns/msg_helpers.gois excluded by!vendor/**vendor/github.com/miekg/dns/msg_truncate.gois excluded by!vendor/**vendor/github.com/miekg/dns/nsecx.gois excluded by!vendor/**vendor/github.com/miekg/dns/privaterr.gois excluded by!vendor/**vendor/github.com/miekg/dns/reverse.gois excluded by!vendor/**vendor/github.com/miekg/dns/sanitize.gois excluded by!vendor/**vendor/github.com/miekg/dns/scan.gois excluded by!vendor/**vendor/github.com/miekg/dns/scan_rr.gois excluded by!vendor/**vendor/github.com/miekg/dns/serve_mux.gois excluded by!vendor/**vendor/github.com/miekg/dns/server.gois excluded by!vendor/**vendor/github.com/miekg/dns/sig0.gois excluded by!vendor/**vendor/github.com/miekg/dns/smimea.gois excluded by!vendor/**vendor/github.com/miekg/dns/svcb.gois excluded by!vendor/**vendor/github.com/miekg/dns/tlsa.gois excluded by!vendor/**vendor/github.com/miekg/dns/tools.gois excluded by!vendor/**vendor/github.com/miekg/dns/tsig.gois excluded by!vendor/**vendor/github.com/miekg/dns/types.gois excluded by!vendor/**vendor/github.com/miekg/dns/udp.gois excluded by!vendor/**vendor/github.com/miekg/dns/udp_no_control.gois excluded by!vendor/**vendor/github.com/miekg/dns/update.gois excluded by!vendor/**vendor/github.com/miekg/dns/version.gois excluded by!vendor/**vendor/github.com/miekg/dns/xfr.gois excluded by!vendor/**vendor/github.com/miekg/dns/zduplicate.gois excluded by!vendor/**vendor/github.com/miekg/dns/zmsg.gois excluded by!vendor/**vendor/github.com/miekg/dns/ztypes.gois excluded by!vendor/**vendor/github.com/stretchr/testify/require/doc.gois excluded by!vendor/**vendor/github.com/stretchr/testify/require/forward_requirements.gois excluded by!vendor/**vendor/github.com/stretchr/testify/require/require.gois excluded by!vendor/**vendor/github.com/stretchr/testify/require/require.go.tmplis excluded by!vendor/**vendor/github.com/stretchr/testify/require/require_forward.gois excluded by!vendor/**vendor/github.com/stretchr/testify/require/require_forward.go.tmplis excluded by!vendor/**vendor/github.com/stretchr/testify/require/requirements.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/asm.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/constants.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/doc.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/instructions.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/setter.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/vm.gois excluded by!vendor/**vendor/golang.org/x/net/bpf/vm_instructions.gois excluded by!vendor/**vendor/golang.org/x/net/internal/iana/const.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_bsd.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_linux_32bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_linux_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_solaris_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_stub.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_unix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/cmsghdr_zos_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/complete_dontwait.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/complete_nodontwait.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/empty.sis excluded by!vendor/**vendor/golang.org/x/net/internal/socket/error_unix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/error_windows.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/iovec_32bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/iovec_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/iovec_solaris_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/iovec_stub.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/mmsghdr_stub.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/mmsghdr_unix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_bsd.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_bsdvar.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_linux.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_linux_32bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_linux_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_openbsd.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_solaris_64bit.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_stub.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/msghdr_zos_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/norace.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/race.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/rawconn.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/rawconn_mmsg.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/rawconn_msg.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/rawconn_nommsg.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/rawconn_nomsg.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/socket.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_bsd.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_const_unix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_386.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_386.sis excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_arm.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_loong64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_mips.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_mips64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_mips64le.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_mipsle.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_ppc.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_ppc64le.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_linux_s390x.sis excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_netbsd.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_posix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_stub.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_unix.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_windows.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_zos_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/sys_zos_s390x.sis excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_aix_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_darwin_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_darwin_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_dragonfly_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_freebsd_386.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_freebsd_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_freebsd_arm.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_freebsd_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_freebsd_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_386.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_arm.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_loong64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_mips.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_mips64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_mips64le.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_mipsle.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_ppc.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_ppc64le.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_linux_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_netbsd_386.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_netbsd_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_netbsd_arm.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_netbsd_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_386.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_arm.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_mips64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_openbsd_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_solaris_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/internal/socket/zsys_zos_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/batch.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_bsd.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_pktinfo.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_unix.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_windows.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/control_zos.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/dgramopt.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/doc.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/endpoint.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/genericopt.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/header.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/helper.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/iana.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/icmp.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/icmp_linux.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/icmp_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/packet.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/payload.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/payload_cmsg.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/payload_nocmsg.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sockopt.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sockopt_posix.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sockopt_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_aix.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_asmreq.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_asmreq_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_asmreqn.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_asmreqn_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_bpf.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_bpf_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_bsd.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_darwin.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_dragonfly.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_freebsd.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_linux.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_solaris.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_ssmreq.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_ssmreq_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_stub.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_windows.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/sys_zos.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_aix_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_darwin.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_dragonfly.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_freebsd_386.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_freebsd_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_freebsd_arm.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_freebsd_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_freebsd_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_386.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_amd64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_arm.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_arm64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_loong64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_mips.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_mips64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_mips64le.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_mipsle.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_ppc.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_ppc64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_ppc64le.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_riscv64.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_linux_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_netbsd.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_openbsd.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_solaris.gois excluded by!vendor/**vendor/golang.org/x/net/ipv4/zsys_zos_s390x.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/batch.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/control.gois excluded by!vendor/**vendor/golang.org/x/net/ipv6/control_rfc2292_unix.gois excluded by!vendor/**
📒 Files selected for processing (31)
go.modinternal/testdns/testdns.gopkg/api/agents.gopkg/api/agents_test.gopkg/api/server.gopkg/defaults/timeouts.gopkg/discovery/doc.gopkg/discovery/options.gopkg/discovery/publisher.gopkg/discovery/publisher_test.gopkg/discovery/resolver.gopkg/discovery/resolver_test.gopkg/discovery/svcb.gopkg/discovery/svcb_test.gopkg/discovery/testhelper_test.gopkg/discovery/types.gopkg/errors/errors.gopkg/errors/exitcode.gopkg/openshell/doc.gopkg/openshell/evaluator.gopkg/openshell/evaluator_test.gopkg/openshell/fetcher.gopkg/openshell/fetcher_test.gopkg/openshell/guard.gopkg/openshell/guard_test.gopkg/openshell/options.gopkg/openshell/types.gopkg/server/config.gopkg/server/errors.gopkg/server/lifecycle_test.gopkg/server/server.go
| opts, agentHandler, err := setupDiscovery() | ||
| if err != nil { | ||
| return errors.Wrap(errors.ErrCodeInternal, "failed to setup discovery", err) | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Don’t squash structured discovery setup errors to internal.
setupDiscovery() already returns coded errors (for example invalid OPENSHELL_MODE). Wrapping everything here as ErrCodeInternal loses that classification; errors.PropagateOrWrap(...) preserves structured errors and only applies the fallback when needed.
♻️ Minimal fix
- return errors.Wrap(errors.ErrCodeInternal, "failed to setup discovery", err)
+ return errors.PropagateOrWrap(err, errors.ErrCodeInternal, "failed to setup discovery")Based on learnings: use errors.PropagateOrWrap(err, errCode, message) so existing structured errors are propagated as-is.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/api/server.go` around lines 120 - 123, The current error handling when
calling setupDiscovery() wraps all errors as ErrCodeInternal and loses existing
structured error codes; change the return to use errors.PropagateOrWrap so that
setupDiscovery's coded errors (e.g., invalid OPENSHELL_MODE) are propagated and
only fallback to ErrCodeInternal when necessary — replace the errors.Wrap call
in the setupDiscovery() error branch with errors.PropagateOrWrap(err,
errors.ErrCodeInternal, "failed to setup discovery") so existing structured
errors are preserved.
| // WithLabels sets additional labels on created Kubernetes resources. | ||
| func WithLabels(labels map[string]string) PublisherOption { | ||
| return func(p *Publisher) { | ||
| for k, v := range labels { | ||
| p.labels[k] = v | ||
| } | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider defensive initialization of p.labels.
WithLabels writes directly to p.labels[k] without checking if the map is nil. While the constructor likely initializes this field (tests confirm it works), defensive coding would add a nil-check guard to prevent panics if the option is ever applied to an improperly initialized Publisher.
🛡️ Optional defensive guard
func WithLabels(labels map[string]string) PublisherOption {
return func(p *Publisher) {
+ if p.labels == nil {
+ p.labels = make(map[string]string)
+ }
for k, v := range labels {
p.labels[k] = v
}
}
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/discovery/options.go` around lines 46 - 52, WithLabels currently writes
into p.labels without guarding against a nil map which can panic if Publisher
wasn't initialized; update the WithLabels(PublisherOption) closure to check if
p.labels == nil and if so allocate a new map before the for loop, ensuring
p.labels is always non-nil when assigning keys (reference the WithLabels
function and the Publisher.labels field).
| // Build index as "name:protocol" entries, one per line | ||
| var lines []string | ||
| for _, cm := range cmList.Items { | ||
| name := cm.Labels[labelName] | ||
| proto := cm.Labels[labelProtocol] | ||
| if name != "" && proto != "" { | ||
| lines = append(lines, fmt.Sprintf("%s:%s", name, proto)) | ||
| } | ||
| } | ||
|
|
||
| agentsData := strings.Join(lines, "\n") |
There was a problem hiding this comment.
Sort index entries before writing the shared ConfigMap.
ConfigMaps().List(...) does not guarantee a stable order, so equivalent agent sets can produce different agents payloads and trigger needless index updates/conflicts. Sorting lines first keeps the published TXT index deterministic.
♻️ Minimal fix
+ sort.Strings(lines)
agentsData := strings.Join(lines, "\n")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/discovery/publisher.go` around lines 261 - 271, ConfigMap listing order
is unstable so build a deterministic index by sorting the entries before
joining; after accumulating entries into the lines slice in the loop that
iterates cmList.Items (the block that sets name/proto and appends
fmt.Sprintf("%s:%s", name, proto)), call sort.Strings(lines) and then compute
agentsData := strings.Join(lines, "\n"); also add the required import for the
sort package to ensure compilation.
| var ( | ||
| mu sync.Mutex | ||
| records []AgentRecord | ||
| ) | ||
|
|
||
| g, gctx := errgroup.WithContext(ctx) | ||
| g.SetLimit(maxDiscoverConcurrency) | ||
|
|
||
| // Loop variables are per-iteration scoped under Go 1.22+; no shadowing needed. | ||
| for _, entry := range entries { | ||
| g.Go(func() error { | ||
| record, err := d.Discover(gctx, entry.Name, entry.Protocol, domain) | ||
| if err != nil { | ||
| // Treat context cancellation as fatal so the caller learns the | ||
| // result is incomplete; everything else is logged and skipped. | ||
| if gctx.Err() != nil { | ||
| return gctx.Err() | ||
| } | ||
| slog.Warn("failed to resolve agent from index", | ||
| slog.String("name", entry.Name), | ||
| slog.String("protocol", string(entry.Protocol)), | ||
| slog.String("error", err.Error()), | ||
| ) | ||
| return nil | ||
| } | ||
| mu.Lock() | ||
| records = append(records, *record) | ||
| mu.Unlock() | ||
| return nil | ||
| }) | ||
| } | ||
|
|
||
| // g.Wait() may return ctx.Err() if any goroutine propagated cancellation. | ||
| if err := g.Wait(); err != nil { | ||
| return nil, errors.Wrap(errors.ErrCodeTimeout, | ||
| "agent discovery interrupted before completion", err) | ||
| } | ||
| // Defensive: if the parent context was canceled after all goroutines | ||
| // returned successfully but before we got here, surface that too. | ||
| if err := ctx.Err(); err != nil { | ||
| return nil, errors.Wrap(errors.ErrCodeTimeout, | ||
| "agent discovery interrupted before completion", err) | ||
| } | ||
|
|
||
| return records, nil |
There was a problem hiding this comment.
Preserve index order when collecting concurrent discoveries.
append under a mutex makes the returned slice depend on DNS response timing, so /v1/agents order will flap between identical requests. Keep a result slot per input index and compact after g.Wait() instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/discovery/resolver.go` around lines 109 - 153, The current concurrent
collection appends discoveries under a mutex which yields non-deterministic
order; change to allocate a per-input result slot and compact after g.Wait():
iterate with for i, entry := range entries and capture i in the closure, create
records := make([]AgentRecord, len(entries)) plus a parallel filled :=
make([]bool, len(entries)) (or store pointers/nils) and inside the g.Go closure
write records[i] = *record and set filled[i] = true under mu after a successful
d.Discover call; after g.Wait() build the returned slice by iterating the filled
flags in order and appending only successful entries so the final order matches
the original entries order.
| // WithAllowPrivate disables SSRF private/loopback IP checks. | ||
| // This must only be used in tests. | ||
| func WithAllowPrivate(allow bool) FetcherOption { | ||
| return func(f *Fetcher) { | ||
| f.allowPrivate = allow | ||
| } | ||
| } |
There was a problem hiding this comment.
Keep the private-IP bypass out of the production API surface.
The fetcher is otherwise positioned as SSRF-protected, but exporting WithAllowPrivate(true) lets any non-test caller disable those guards with one option. If this is truly test-only, move it to a test helper or make the option unexported.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/openshell/options.go` around lines 77 - 83, The exported option
WithAllowPrivate exposes a test-only SSRF bypass; remove it from the production
API by either making it unexported (rename WithAllowPrivate -> withAllowPrivate)
or moving its implementation into test-only code (a test helper) so callers
cannot disable SSRF checks at runtime; update any references to WithAllowPrivate
in tests to use the new unexported variant via test package or the test helper,
and ensure the Fetcher struct field allowPrivate remains internal
(Fetcher.allowPrivate) with no exported setters.
|
@IngmarVG-IB this PR now has merge conflicts with |
02f7453 to
ff4b714
Compare
Address reviewer (mchmarny) findings on PR NVIDIA#469: Security: - openshell/fetcher: close SSRF hostname bypass by adding a safeDialContext on the HTTP transport that rejects connections to private/loopback IPs even when the URL hostname resolves to one. Adds isBlockedIP helper and a regression test that fails closed against an httptest TLS server. Correctness: - discovery/resolver: propagate context cancellation in DiscoverAll so callers can distinguish "no agents" from "timed out mid-discovery". Remove the dead g.Wait() error path now that goroutines genuinely return ctx.Err() on cancellation. - openshell/guard: simplify the Check contract — fetch failures fail open with (Allowed=true, nil error) rather than the dual-return that was easy to misuse. Return an error only on programmer errors (nil record). Update agents.go to match. - openshell/guard: remove panic from NewGuard; document that callers must validate via ValidMode before construction (server.go already does this). - discovery/types: strip leading semver "v" prefix in bapVersions so "v2.1.0" produces "mcp/2" rather than "mcp/v2". Cleanup: - discovery/resolver: remove redundant `entry := entry` loop variable capture (Go 1.22+ scoping makes it unnecessary). - openshell/evaluator: restructure rate_limits to a clearer switch (violation vs. warning) so future map changes do the right thing. - server/config + api/server: extract a single ResolvePort helper so agent registration and the HTTP server agree on the bound port. - internal/testdns: extract the duplicated startMockDNS helper into a shared internal package; pkg/discovery and pkg/api now both import it. - openshell/fetcher: document the singleflight + context coalescing trade-off (3s timeout makes shared cancellation acceptable). - Remove .claude/agents/k8s-discovery.md (local Claude Code config, unintentionally committed). Tests: - New: TestFetcherSSRFDialBlocksPrivateIP, TestIsBlockedIP, TestGuardNilRecordReturnsError, expanded TestBapVersions cases. - Updated: TestGuardFailOpenOnFetchError now asserts (Allowed=true, nil) rather than the old dual-return; TestGuardInvalidModePanics removed (no longer panics).
Address the May 11 PR NVIDIA#469 follow-up review. Readiness-bind race (the only fully-unaddressed item from round 2): pkg/server/server.go Start() previously kicked off ListenAndServe in a goroutine and immediately flipped readiness on the main goroutine, opening a window where /ready returned 200 while the kernel was still binding the socket. Probes during that window would 200 then get connection-refused on the next request. Bind the listener synchronously (net.ListenConfig.Listen — context- aware, satisfies noctx) BEFORE setReady(true), then start http.Server.Serve(ln) in a goroutine. If the bind fails the server never advertises ready. New regression test TestReadyImpliesListenerBound asserts that a dial against the port succeeds the moment ready flips to true. golangci-lint failures cleared (the CI gate this PR currently fails on): - exhaustive (3 sites): pkg/errors/exitcode.go, pkg/server/errors.go — ErrCodePolicyFetch is now covered in the exit-code, HTTP-status, and retryable switches. Treated as upstream-unavailable (ExitUnavailable / 503 / retryable=true) since policy fetch hits an external URI. - gosec G115 (int→uint16): pkg/server/config.go ResolvePort now returns uint16 directly, matching net's port representation. Single cast still needed in parseConfig because config.Port is int for back-compat with existing tests that set cfg.Port = port; gated with a nolint comment since ResolvePort guarantees the range. pkg/api/server.go drops the int→uint16 cast at the registration site. - gosec G704 (SSRF taint): pkg/openshell/fetcher.go Fetch annotated with nolint pointing at the three layers already mitigating the finding (URL pre-flight, post-resolution dial guard, redirect check) and the regression test that exercises the dial path. Also unblocks the CodeQL job which was firing on the same taint analysis. - noctx: internal/testdns/testdns.go Start switches from net.ListenPacket to (*net.ListenConfig).ListenPacket with context.Background() per noctx lint requirement. Layer 1 strict-mode policy fields remain a documented limitation in Guard's doc comment (per the reviewer's "downgraded to documented limitation rather than fixed" note). Tracked in a follow-up issue.
Introduce pkg/discovery for DNS-AID based agent discovery using SVCB records (RFC 9460) with private-use SvcParamKeys per draft-mozleywilliams-dnsop-dnsaid-01. Agents publish themselves via K8s ConfigMaps and discover peers through DNS lookups. - pkg/discovery: Discoverer (DNS SVCB resolver), Publisher (K8s ConfigMap-based registration), SVCB parser for keys 65400-65408 - pkg/server: WithOnStart/WithOnShutdown lifecycle hooks as functional options, executed during server start/shutdown respectively - pkg/api: /v1/agents endpoint listing discovered agents in a domain, with input validation and feature-gated via AICR_DISCOVERY_ENABLED - pkg/defaults: discovery timeout constants and ServerDefaultPort
Strip rdatapolicy plugin mentions from pkg/discovery doc comments and the k8s-discovery agent definition. These features are under embargo and should not be referenced in public code.
OpenShell evaluates target agents' policy documents (served at their dns-aid-core policy URI, SvcParamKey 65403) as Layer 1 caller-side enforcement before allowing connections. This completes the security story: "OpenShell secures the agent boundary; DNS-AID resolves what's inside it." New pkg/openshell package implements: - All 16 native policy rules matching dns-aid-core's PolicyDocument schema - Three enforcement modes: strict (deny), permissive (log, default), disabled - Fail-open on policy fetch errors (matches dns-aid-core behavior) - SSRF-protected fetcher with HTTPS-only, private IP rejection - TTL-based cache with bounded eviction and singleflight coalescing - Enforcement layer filtering (Layer 0/1/2 per rule) - Realm isolation detection with cross-realm access logging Integration: - /v1/agents endpoint evaluates OpenShell policy per discovered agent - OPENSHELL_MODE env var controls enforcement (strict/permissive/disabled) - Guard accepts nil safely — zero behavioral change when discovery disabled New error codes: POLICY_DENIED, POLICY_FETCH New defaults: PolicyFetchTimeout (3s), PolicyCacheTTL (5m), PolicyMaxBytes (64KB) Test coverage: 84.2% (30 tests across evaluator, fetcher, guard)
Address reviewer (mchmarny) findings on PR NVIDIA#469: Security: - openshell/fetcher: close SSRF hostname bypass by adding a safeDialContext on the HTTP transport that rejects connections to private/loopback IPs even when the URL hostname resolves to one. Adds isBlockedIP helper and a regression test that fails closed against an httptest TLS server. Correctness: - discovery/resolver: propagate context cancellation in DiscoverAll so callers can distinguish "no agents" from "timed out mid-discovery". Remove the dead g.Wait() error path now that goroutines genuinely return ctx.Err() on cancellation. - openshell/guard: simplify the Check contract — fetch failures fail open with (Allowed=true, nil error) rather than the dual-return that was easy to misuse. Return an error only on programmer errors (nil record). Update agents.go to match. - openshell/guard: remove panic from NewGuard; document that callers must validate via ValidMode before construction (server.go already does this). - discovery/types: strip leading semver "v" prefix in bapVersions so "v2.1.0" produces "mcp/2" rather than "mcp/v2". Cleanup: - discovery/resolver: remove redundant `entry := entry` loop variable capture (Go 1.22+ scoping makes it unnecessary). - openshell/evaluator: restructure rate_limits to a clearer switch (violation vs. warning) so future map changes do the right thing. - server/config + api/server: extract a single ResolvePort helper so agent registration and the HTTP server agree on the bound port. - internal/testdns: extract the duplicated startMockDNS helper into a shared internal package; pkg/discovery and pkg/api now both import it. - openshell/fetcher: document the singleflight + context coalescing trade-off (3s timeout makes shared cancellation acceptable). - Remove .claude/agents/k8s-discovery.md (local Claude Code config, unintentionally committed). Tests: - New: TestFetcherSSRFDialBlocksPrivateIP, TestIsBlockedIP, TestGuardNilRecordReturnsError, expanded TestBapVersions cases. - Updated: TestGuardFailOpenOnFetchError now asserts (Allowed=true, nil) rather than the old dual-return; TestGuardInvalidModePanics removed (no longer panics).
Security:
- openshell/fetcher: close TOCTOU between LookupIPAddr and dialer.DialContext
by pinning the dial to the validated IP literal so the stdlib dialer cannot
re-resolve the hostname to a now-private address (DNS rebinding mitigation).
- openshell/fetcher: add CheckRedirect that re-validates each redirect target
scheme so a 30x cannot downgrade the connection from HTTPS.
- server/config: switch ResolvePort and SHUTDOWN_TIMEOUT_SECONDS parsing from
fmt.Sscanf to strconv.Atoi — Sscanf accepts trailing garbage ("8080abc"
parses as 8080), Atoi rejects it.
- api/server: trim whitespace on SA-namespace file read (CRI runtimes append
trailing newline) and on AICR_DISCOVERY_NAMESPACE; switch
AICR_DISCOVERY_ENABLED to strconv.ParseBool so "1"/"True"/"TRUE" work.
- discovery/svcb: guard BAP parsing against empty string so "" yields nil
(treat as unset) rather than [""], which would shadow protocol defaults.
- discovery/resolver: nil-check resp before reading resp.Rcode in both
ListIndex and querySVCB; miekg/dns can return (nil, nil) on certain header
failures.
Correctness:
- api/agents: fan out per-agent OpenShell policy checks via errgroup with a
concurrency cap (8). Sequential checks would scale as N×PolicyFetchTimeout
(3s) on a cold cache and risk blowing past DiscoveryHandlerTimeout.
- discovery/publisher: replace Create-then-blind-Update with Create →
AlreadyExists → Get/Update with bounded retries on Conflict so concurrent
publishers (or admission-controller mutations) produce a clean retry
rather than silent clobber.
- discovery/resolver: add ctx.Done() check inside the TXT parse loop in
ListIndex — pathologically large index responses must remain interruptible.
- server/server: bound each OnStart and OnShutdown lifecycle hook in a
per-hook context (defaults.ServerHookTimeout = 30s) so a misbehaving hook
cannot stall startup or consume the whole shutdown budget.
- openshell/evaluator: parseTLSVersion no longer shadows the Go 1.21+
predeclared identifier "min".
- openshell/evaluator: rate_limits violation path now reports
"not yet implemented at this layer" with a TODO; the previous "exceeded"
message would have unconditionally denied at LayerTarget once Layer 2
shipped because PolicyContext has no request counters.
Cleanup / docs:
- openshell/guard: document Layer 1 limitations explicitly — many policy
fields cannot be populated pre-connection, so ModeStrict against
connection-state rules will deny by zero value.
- openshell/types: clarify the doc comment on PolicyRules that the three
bool fields are an intentional exception to the pointer-or-slice convention.
- errors: drop unused ErrCodePolicyDenied (no consumer in this PR).
- discovery/svcb: comment the equal-priority SVCB tie-break (RFC 9460 §2.4.3).
- discovery/resolver: replace misleading "system resolver" comment on the
127.0.0.1:53 default — that's the K8s pod resolver, not the host's.
- api/server: extract saNamespacePath as a package var so tests can redirect
it; TestDiscoveryDomain is now host-independent and asserts the exact
expected default ("default.svc.cluster.local."). Adds
TestPodNamespaceTrimsWhitespace.
- server/lifecycle_test: TestMultipleOnStartHooksExecuteInOrder now drives
Start() rather than calling hooks manually so it actually covers the
lifecycle dispatcher; pair test added for OnShutdown ordering.
- server/lifecycle_test: check Close() error in freePort.
- agents_test, resolver_test: drop staticcheck S1039 fmt.Sprintf wrappers.
- openshell/fetcher_test: replace stub SSRF test with one that points at the
hostname "localhost" so it actually exercises the dial-time guard rather
than URL-level validation. Add TestFetcherRejectsNonHTTPSRedirect.
Tests: all packages pass with -race. No behavior change when discovery is
disabled (the AICR_DISCOVERY_ENABLED gate still defaults to off).
Address the May 11 PR NVIDIA#469 follow-up review. Readiness-bind race (the only fully-unaddressed item from round 2): pkg/server/server.go Start() previously kicked off ListenAndServe in a goroutine and immediately flipped readiness on the main goroutine, opening a window where /ready returned 200 while the kernel was still binding the socket. Probes during that window would 200 then get connection-refused on the next request. Bind the listener synchronously (net.ListenConfig.Listen — context- aware, satisfies noctx) BEFORE setReady(true), then start http.Server.Serve(ln) in a goroutine. If the bind fails the server never advertises ready. New regression test TestReadyImpliesListenerBound asserts that a dial against the port succeeds the moment ready flips to true. golangci-lint failures cleared (the CI gate this PR currently fails on): - exhaustive (3 sites): pkg/errors/exitcode.go, pkg/server/errors.go — ErrCodePolicyFetch is now covered in the exit-code, HTTP-status, and retryable switches. Treated as upstream-unavailable (ExitUnavailable / 503 / retryable=true) since policy fetch hits an external URI. - gosec G115 (int→uint16): pkg/server/config.go ResolvePort now returns uint16 directly, matching net's port representation. Single cast still needed in parseConfig because config.Port is int for back-compat with existing tests that set cfg.Port = port; gated with a nolint comment since ResolvePort guarantees the range. pkg/api/server.go drops the int→uint16 cast at the registration site. - gosec G704 (SSRF taint): pkg/openshell/fetcher.go Fetch annotated with nolint pointing at the three layers already mitigating the finding (URL pre-flight, post-resolution dial guard, redirect check) and the regression test that exercises the dial path. Also unblocks the CodeQL job which was firing on the same taint analysis. - noctx: internal/testdns/testdns.go Start switches from net.ListenPacket to (*net.ListenConfig).ListenPacket with context.Background() per noctx lint requirement. Layer 1 strict-mode policy fields remain a documented limitation in Guard's doc comment (per the reviewer's "downgraded to documented limitation rather than fixed" note). Tracked in a follow-up issue.
ff4b714 to
6f2d22f
Compare
Summary
/v1/agentsendpoint with feature gatingStory: "OpenShell secures the agent boundary; DNS-AID resolves what's inside it."
Motivation / Context
AICR agents in Kubernetes need a standard discovery layer to find peers and a security boundary to control which connections are permitted. DNS-AID provides discovery via DNS SVCB lookups; OpenShell evaluates target agents' published policy documents before allowing the calling agent to connect.
Type of Change
Component(s) Affected
/v1/agentsendpoint)pkg/discovery,pkg/openshell,pkg/errors,pkg/defaults)Implementation Notes
DNS-AID Discovery (
pkg/discovery/)Discoverer: DNS SVCB queries for_{name}._{protocol}._agents.{domain}Publisher: K8s ConfigMap-based agent registration with create-or-update semantics_agents.{domain}for listing all agentsOnStart/OnShutdown) for auto-publish/deregisterAICR_DISCOVERY_ENABLED=trueOpenShell Policy Enforcement (
pkg/openshell/)strict(deny on violation),permissive(log + allow, default),disabledOPENSHELL_MODEenv var with startup validationIntegration
/v1/agentsevaluates OpenShell policy for each discovered agent with a policy URInil— zero behavioral change when discovery is disabledPOLICY_DENIED,POLICY_FETCHPolicyFetchTimeout(3s),PolicyCacheTTL(5m),PolicyMaxBytes(64KB)Key Design Decisions
major.minoras integers (not lexicographic) to correctly handle versions like1.12.Testing
pkg/openshell: 84.2% coverage (30 tests)pkg/discovery: Mock DNS server tests for SVCB parsing, index listing, publisherpkg/api: Handler tests with nil guard, domain validation, empty indexRisk Assessment
Medium — New packages with no existing consumers beyond the
/v1/agentsendpoint. Feature-gated behindAICR_DISCOVERY_ENABLEDandOPENSHELL_MODEenv vars, so zero impact on existing functionality when disabled. Default mode ispermissive(log-only), providing a safe rollout path.Checklist
-racego vetclean,gofmtclean)pkg/errors, slog,pkg/defaults)git commit -S)