Skip to content

Error: × K8s namespace not ready ╰─▶ timed out waiting for namespace 'openshell' to exist: Error from server (NotFound): namespaces "openshell" not found #722

@xiaobai012910

Description

@xiaobai012910

Agent Diagnostic

um idk

Description

I run the command: curl -fsSL https://nvidia.com/nemoclaw.sh | bash
then step 2 i got something wrong

Reproduction Steps

i run command: curl -fsSL https://nvidia.com/nemoclaw.sh | bash

Environment

Ubuntu 24.04 with vmware
docker version: 29.3.1

Logs

███╗   ██╗███████╗███╗   ███╗ ██████╗  ██████╗██╗      █████╗ ██╗    ██╗
   ████╗  ██║██╔════╝████╗ ████║██╔═══██╗██╔════╝██║     ██╔══██╗██║    ██║
   ██╔██╗ ██║█████╗  ██╔████╔██║██║   ██║██║     ██║     ███████║██║ █╗ ██║
   ██║╚██╗██║██╔══╝  ██║╚██╔╝██║██║   ██║██║     ██║     ██╔══██║██║███╗██║
   ██║ ╚████║███████╗██║ ╚═╝ ██║╚██████╔╝╚██████╗███████╗██║  ██║╚███╔███╔╝
   ╚═╝  ╚═══╝╚══════╝╚═╝     ╚═╝ ╚═════╝  ╚═════╝╚══════╝╚═╝  ╚═╝ ╚══╝╚══╝

  Launch OpenClaw in an OpenShell sandbox.  v0.1.0


[1/3] Node.js
  ──────────────────────────────────────────────────
[INFO]  Node.js found: v22.22.2
[INFO]  Runtime OK: Node.js v22.22.2, npm 10.9.7

[2/3] NemoClaw CLI
  ──────────────────────────────────────────────────
[INFO]  Installing NemoClaw from GitHub…
[INFO]  Resolved install ref: latest
  ✓  Cloning NemoClaw source
  ✓  Preparing OpenClaw package
  ✓  Installing NemoClaw dependencies
  ✓  Building NemoClaw plugin
  ✓  Linking NemoClaw CLI
[INFO]  Created user-local shim at /home/baibai/.local/bin/nemoclaw
[INFO]  Created user-local shim at /home/baibai/.local/bin/nemoclaw
[INFO]  Verified: nemoclaw is available at /home/baibai/.local/bin/nemoclaw

[3/3] Onboarding
  ──────────────────────────────────────────────────
[INFO]  Running nemoclaw onboard…
[INFO]  Installer stdin is piped; attaching onboarding to /dev/tty…

  NemoClaw Onboarding
  ===================

  [1/7] Preflight checks
  ──────────────────────────────────────────────────
  ✓ Docker is running
  ✓ Container runtime: docker
  ✓ openshell CLI: openshell 0.0.19
  ✓ Port 8080 available (OpenShell gateway)
  ✓ Port 18789 available (NemoClaw dashboard)
  ⓘ No GPU detected — will use cloud inference
  ⚠ Low memory detected (7893 MB RAM + 4095 MB swap = 11988 MB total)
  Create a 4 GB swap file to prevent OOM during sandbox build? (requires sudo) [y/N]: y
  Creating 4 GB swap file to prevent OOM during sandbox build...
  ✓ Swap file created and activated

  [2/7] Starting OpenShell gateway
  ──────────────────────────────────────────────────
  Using pinned OpenShell gateway image: ghcr.io/nvidia/openshell/cluster:0.0.19
✓ Checking Docker
✓ Downloading gateway
x Initializing environment                                                                                                                                         x Gateway failed: nemoclaw

Gateway failed to start

  The gateway encountered an unexpected error during startup.

  To fix:

  1. Check container logs for details

     openshell doctor logs --name nemoclaw

  2. Run diagnostics

     openshell doctor check --name nemoclaw

  3. Try destroying and recreating the gateway

     openshell gateway destroy --name nemoclaw && openshell gateway start

  4. If the issue persists, report it at https://github.com/nvidia/openshell/issues

Error:   × K8s namespace not ready
  ╰─▶ timed out waiting for namespace 'openshell' to exist: Error from server (NotFound): namespaces "openshell" not found
      
      container logs:
        I0401 05:03:41.973089     114 iptables.go:212] Changing default FORWARD chain policy to ACCEPT
        time="2026-04-01T05:03:41Z" level=info msg="Wrote flannel subnet file to /run/flannel/subnet.env"
        time="2026-04-01T05:03:41Z" level=info msg="Running flannel backend."
        I0401 05:03:41.980274     114 vxlan_network.go:68] watching for new subnet leases
        I0401 05:03:41.980308     114 vxlan_network.go:115] starting vxlan device watcher
        I0401 05:03:41.997404     114 iptables.go:358] bootstrap done
        I0401 05:03:42.010541     114 iptables.go:358] bootstrap done
        time="2026-04-01T05:03:43Z" level=info msg="Started tunnel to 172.18.0.2:6443"
        time="2026-04-01T05:03:43Z" level=info msg="Stopped tunnel to 127.0.0.1:6443"
        time="2026-04-01T05:03:43Z" level=info msg="Connecting to proxy" url="wss://172.18.0.2:6443/v1-k3s/connect"
        time="2026-04-01T05:03:43Z" level=info msg="Proxy done" err="context canceled" url="wss://127.0.0.1:6443/v1-k3s/connect"
        time="2026-04-01T05:03:43Z" level=info msg="error in remotedialer server [400]: websocket: close 1006 (abnormal closure): unexpected EOF"
        time="2026-04-01T05:03:43Z" level=info msg="Handling backend connection request [f2ff8b107638]"
        time="2026-04-01T05:03:43Z" level=info msg="Connected to proxy" url="wss://172.18.0.2:6443/v1-k3s/connect"
        time="2026-04-01T05:03:43Z" level=info msg="Remotedialer connected to proxy" url="wss://172.18.0.2:6443/v1-k3s/connect"
        time="2026-04-01T05:04:00Z" level=info msg="Starting network policy controller version v2.6.3-k3s1, built on 2026-03-04T22:29:48Z, go1.25.7"
        I0401 05:04:00.878638     114 network_policy_controller.go:164] Starting network policy controller
        I0401 05:04:01.098114     114 network_policy_controller.go:179] Starting network policy controller full sync goroutine
        I0401 05:04:02.627279     114 pod_startup_latency_tracker.go:108] "Observed pod startup duration" pod="agent-sandbox-system/agent-sandbox-controller-0"
      podStartSLOduration=19.688508689 podStartE2EDuration="29.627238691s" podCreationTimestamp="2026-04-01 05:03:33 +0000 UTC" firstStartedPulling="2026-04-01
      05:03:52.06258159 +0000 UTC m=+30.969660287" lastFinishedPulling="2026-04-01 05:04:02.00131159 +0000 UTC m=+40.908390289" observedRunningTime="2026-04-01
      05:04:02.624180336 +0000 UTC m=+41.531259147" watchObservedRunningTime="2026-04-01 05:04:02.627238691 +0000 UTC m=+41.534317498"
        E0401 05:04:03.936234     114 resource_quota_controller.go:460] "Error during resource discovery" err="unable to retrieve the complete list of server
      APIs: metrics.k8s.io/v1beta1: stale GroupVersion discovery: metrics.k8s.io/v1beta1"
        I0401 05:04:04.507837     114 garbagecollector.go:792] "failed to discover some groups" groups="map[\"metrics.k8s.io/v1beta1\":\"stale GroupVersion
      discovery: metrics.k8s.io/v1beta1\"]"
        E0401 05:04:25.086577     114 handler_proxy.go:143] error resolving kube-system/metrics-server: no endpoints available for service "metrics-server"
        I0401 05:04:28.712626     114 pod_startup_latency_tracker.go:108] "Observed pod startup duration" pod="kube-system/local-path-provisioner-6bc6568469-
      pw7vs" podStartSLOduration=18.5120312 podStartE2EDuration="54.712594252s" podCreationTimestamp="2026-04-01 05:03:34 +0000 UTC" firstStartedPulling="2026-
      04-01 05:03:52.061248595 +0000 UTC m=+30.968327277" lastFinishedPulling="2026-04-01 05:04:28.261811596 +0000 UTC m=+67.168890329"
      observedRunningTime="2026-04-01 05:04:28.712209597 +0000 UTC m=+67.619288349" watchObservedRunningTime="2026-04-01 05:04:28.712594252 +0000 UTC
      m=+67.619672969"
        E0401 05:04:33.947118     114 resource_quota_controller.go:460] "Error during resource discovery" err="unable to retrieve the complete list of server
      APIs: metrics.k8s.io/v1beta1: stale GroupVersion discovery: metrics.k8s.io/v1beta1"
        I0401 05:04:34.514808     114 garbagecollector.go:792] "failed to discover some groups" groups="map[\"metrics.k8s.io/v1beta1\":\"stale GroupVersion
      discovery: metrics.k8s.io/v1beta1\"]"
        W0401 05:04:38.909980     114 handler_proxy.go:99] no RequestInfo found in the context
        E0401 05:04:38.910096     114 controller.go:102] "Unhandled Error" err=<
        loading OpenAPI spec for "v1beta1.metrics.k8s.io" failed with: failed to download v1beta1.metrics.k8s.io: failed to retrieve openAPI spec, http error:
      ResponseCode: 503, Body: service unavailable
        , Header: map[Content-Type:[text/plain; charset=utf-8] X-Content-Type-Options:[nosniff]]
        >
        I0401 05:04:38.910109     114 controller.go:109] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
        W0401 05:04:38.911389     114 handler_proxy.go:99] no RequestInfo found in the context
        E0401 05:04:38.911440     114 controller.go:113] "Unhandled Error" err="loading OpenAPI spec for \"v1beta1.metrics.k8s.io\" failed with: Error, could not
      get list of group versions for APIService"
        I0401 05:04:38.911452     114 controller.go:126] OpenAPI AggregationController: action for item v1beta1.metrics.k8s.io: Rate Limited Requeue.
        I0401 05:04:44.814697     114 pod_startup_latency_tracker.go:108] "Observed pod startup duration" pod="kube-system/coredns-7566b5ff58-vfqbv"
      podStartSLOduration=18.792296332 podStartE2EDuration="1m10.814679645s" podCreationTimestamp="2026-04-01 05:03:34 +0000 UTC" firstStartedPulling="2026-04-01
      05:03:52.06089448 +0000 UTC m=+30.967973167" lastFinishedPulling="2026-04-01 05:04:44.083277795 +0000 UTC m=+82.990356480" observedRunningTime="2026-04-01
      05:04:44.78691886 +0000 UTC m=+83.693997554" watchObservedRunningTime="2026-04-01 05:04:44.814679645 +0000 UTC m=+83.721758333"
        E0401 05:05:03.964111     114 resource_quota_controller.go:460] "Error during resource discovery" err="unable to retrieve the complete list of server
      APIs: metrics.k8s.io/v1beta1: stale GroupVersion discovery: metrics.k8s.io/v1beta1"
        I0401 05:05:04.521947     114 garbagecollector.go:792] "failed to discover some groups" groups="map[\"metrics.k8s.io/v1beta1\":\"stale GroupVersion
      discovery: metrics.k8s.io/v1beta1\"]"
        I0401 05:05:08.930175     114 pod_startup_latency_tracker.go:108] "Observed pod startup duration" pod="kube-system/metrics-server-786d997795-c4q7f"
      podStartSLOduration=15.936317969 podStartE2EDuration="1m31.930153884s" podCreationTimestamp="2026-04-01 05:03:37 +0000 UTC" firstStartedPulling="2026-04-01
      05:03:52.660143122 +0000 UTC m=+31.567221808" lastFinishedPulling="2026-04-01 05:05:08.653979022 +0000 UTC m=+107.561057723" observedRunningTime="2026-04-
      01 05:05:08.929511411 +0000 UTC m=+107.836590173" watchObservedRunningTime="2026-04-01 05:05:08.930153884 +0000 UTC m=+107.837232597"
        E0401 05:05:25.085345     114 handler_proxy.go:143] error resolving kube-system/metrics-server: no endpoints available for service "metrics-server"
        I0401 05:05:25.976750     114 handler.go:304] Adding GroupVersion metrics.k8s.io v1beta1 to ResourceManager

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions