Skip to content

[TCL-2751] Slinky v1.0: fix chart rendering, reconfigure deadlock, login auth, and SLURM_CONF_SERVER#18

Merged
jhu-svg merged 4 commits intoslurm-1.0-together-changesfrom
jhu/fix-slurm-login-dnsconfig
Mar 11, 2026
Merged

[TCL-2751] Slinky v1.0: fix chart rendering, reconfigure deadlock, login auth, and SLURM_CONF_SERVER#18
jhu-svg merged 4 commits intoslurm-1.0-together-changesfrom
jhu/fix-slurm-login-dnsconfig

Conversation

@jhu-svg
Copy link

@jhu-svg jhu-svg commented Mar 2, 2026

Summary

Fixes 6 issues blocking Slinky v1.0 Slurm clusters from provisioning correctly.

Chart rendering fixes

  • Add slurm.dnsConfig helper (TCL-4404) — login template called undefined helper
  • Remove broken authcred initContainer (TCL-4404) — referenced templates from upstream that don't exist in our fork
  • Add SACKD_OPTIONS env var (TCL-4408) — login entrypoint needs this for sackd to connect to controller

Reconfigure sidecar deadlock fix

  • Initialize lastHash in reconfigure.sh (TCL-4401) — the sidecar started with lastHash="", always triggering scontrol reconfigure on startup, which deadlocks slurmctld 25.11.2. Now captures initial hash before polling.

Login auth fixes

  • Mount auth secrets as projected volume (TCL-4402) — login Deployment only mounted configmap, missing slurm-auth-slurm and slurm-auth-jwths256 secrets needed by sackd for bootstrap auth. Uses projected volume with defaultMode: 0600.
  • Fix SLURM_CONF_SERVER env var (TCL-4403) — pointed to slurm.slurm.svc (doesn't exist), changed to slurm-controller.slurm.svc.

Linear

Test

  • helm template passes
  • Staging cluster reached Ready with all fixes (cluster t-f7eeaf3a)"

jhu-svg added 4 commits March 2, 2026 15:21
1. reconfigure sidecar: initialize lastHash to current config hash on
   startup so no spurious scontrol-reconfigure fires while slurmctld is
   still initializing (avoids a deadlock in slurm 25.11.2).

2. login deployment: mount slurm-auth-slurm and slurm-auth-jwths256
   secrets alongside the slurm-config configmap using a projected volume
   with mode 0600. sackd needs the slurm.key for bootstrap auth.

3. login SLURM_CONF_SERVER env: point to the controller service
   (slurm-controller) instead of the non-existent "slurm" service.

Made-with: Cursor
@jhu-svg jhu-svg changed the title slurm: fix login template helpers [slinky1.0] slurm: fix login template helpers Mar 6, 2026
@jhu-svg jhu-svg changed the title [slinky1.0] slurm: fix login template helpers [TCL-2751] Slinky v1.0: fix chart rendering, reconfigure deadlock, login auth, and SLURM_CONF_SERVER Mar 7, 2026
@jhu-svg jhu-svg requested a review from eb3095 March 10, 2026 00:58
@jhu-svg jhu-svg merged commit b95314c into slurm-1.0-together-changes Mar 11, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant