Skip to content

fix(lightspeed): fix rag-content folder ownership in init container#460

Merged
openshift-merge-bot[bot] merged 1 commit into
redhat-developer:mainfrom
rm3l:cherry-pick/main/RHDHBUGS-3371--e2e-lightspeed-core-sidecar-crashes-with-permissionerror-on-k8s-eks-aks-deployments
Jul 1, 2026
Merged

fix(lightspeed): fix rag-content folder ownership in init container#460
openshift-merge-bot[bot] merged 1 commit into
redhat-developer:mainfrom
rm3l:cherry-pick/main/RHDHBUGS-3371--e2e-lightspeed-core-sidecar-crashes-with-permissionerror-on-k8s-eks-aks-deployments

Conversation

@rm3l

@rm3l rm3l commented Jul 1, 2026

Copy link
Copy Markdown
Member

manual cherry-pick of #449

@rm3l rm3l requested a review from a team as a code owner July 1, 2026 08:45
@openshift-ci openshift-ci Bot requested review from gazarenkov and zdrapela July 1, 2026 08:45
…tebooks in init … (redhat-developer#449)

* On EKS/AKS, the RAG init container populates /rag-content/ but never creates the notebooks subdirectory. At runtime, llama-stack tries to write /rag-content/vector_db/notebooks/faiss_store.db and fails with PermissionError because it cannot create the directory on a volume it doesn't own. OCP avoids this via fsGroup/supplemental group defaults.

The fix pre-creates the directory and widens permissions before the sidecar starts, matching the fix the operator already applies via chmod -R 777 for the rest of vector_db.

Signed-off-by: Lucas <lyoon@redhat.com>

* Apply suggestions from code review

Co-authored-by: Armel Soro <armel@rm3l.org>

---------

Signed-off-by: Lucas <lyoon@redhat.com>
Co-authored-by: Armel Soro <armel@rm3l.org>
@rm3l rm3l force-pushed the cherry-pick/main/RHDHBUGS-3371--e2e-lightspeed-core-sidecar-crashes-with-permissionerror-on-k8s-eks-aks-deployments branch from d097331 to ed2a1af Compare July 1, 2026 08:46
@sonarqubecloud

sonarqubecloud Bot commented Jul 1, 2026

Copy link
Copy Markdown

@rm3l rm3l added the lgtm label Jul 1, 2026
@rhdh-qodo-merge

Copy link
Copy Markdown

Qodo is busy working

Check back in a few minutes. Qodo's code review agents are on it.

Grey Divider

@rhdh-qodo-merge

Copy link
Copy Markdown

PR Summary by Qodo

Fix Lightspeed RAG init container ownership/permissions for /rag-content notebooks

🐞 Bug fix ⚙️ Configuration changes 📝 Documentation 🕐 10-20 Minutes

Grey Divider

AI Description

• Preserve writable permissions when copying Lightspeed RAG assets into /rag-content on EKS/AKS.
• Pre-create /rag-content/vector_db/notebooks and chmod vector_db/model paths before sidecar starts.
• Bump Helm chart version and align CI/test chart securityContext overrides.
Diagram

graph TD
  A["Helm values (Lightspeed RAG)"] --> B["RAG init container"] --> C[("/rag-content volume")] --> D["llama-stack sidecar"]
  B --> E["Copy RAG assets"] --> C
  B --> F["Create notebooks dir"] --> C
  B --> G["chmod a+rwX"] --> C
  D --> H["Write faiss_store.db"] --> C
  subgraph Legend
    direction LR
    _cfg["Config"] ~~~ _pod["Container"] ~~~ _vol[("Volume")]
  end
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Rely on pod securityContext (runAsUser/runAsGroup/fsGroup) only
  • ➕ Avoids chmod/changing permissions broadly
  • ➕ Centralizes access control via Kubernetes security context
  • ➖ Behavior varies by platform/storage driver; does not reliably fix EKS/AKS volume ownership gaps
  • ➖ Doesn't address missing notebooks directory creation
2. Use init container chown to a specific UID/GID (least-privilege)
  • ➕ More restrictive than a+rwX/777-style permissions
  • ➕ Makes ownership explicit and predictable
  • ➖ Requires knowing the runtime UID/GID contract across images and deployments
  • ➖ chown can be slow on large volumes and may require elevated privileges/capabilities

Recommendation: Keep the PR’s approach: copying without preserving mode/ownership, explicitly creating the notebooks directory, and chmodding the target paths is a pragmatic cross-platform fix for EKS/AKS where fsGroup defaults don’t match OCP behavior. If tightening permissions becomes a requirement later, consider a targeted chown/chmod to the runtime UID/GID rather than a+rwX, but that likely needs stronger guarantees about the sidecar’s execution identity.

Files changed (5) +8 / -8

Bug fix (2) +5 / -3
values.schema.jsonFix Lightspeed RAG init container args for permissions and notebooks path +1/-1

Fix Lightspeed RAG init container args for permissions and notebooks path

• Updates the init container command to copy RAG assets without preserving mode/ownership, pre-create /rag-content/vector_db/notebooks, and chmod the relevant directories. Prevents PermissionError when the sidecar later writes the FAISS store on EKS/AKS.

charts/backstage/values.schema.json

values.yamlMake RAG bootstrap copy writable and create notebooks directory +4/-2

Make RAG bootstrap copy writable and create notebooks directory

• Adjusts the default init container script to use cp --no-preserve=mode,ownership, create the notebooks subdirectory, and chmod vector_db/embeddings_model for write access. Addresses cross-platform volume ownership differences that cause runtime writes to fail.

charts/backstage/values.yaml

Documentation (1) +2 / -2
README.mdUpdate README version references to 6.1.2 +2/-2

Update README version references to 6.1.2

• Updates the version badge and install command examples to match the new chart version. Keeps documentation consistent with the released chart metadata.

charts/backstage/README.md

Other (2) +1 / -3
action.ymlStop forcing runAsUser/runAsGroup in chart test action +0/-2

Stop forcing runAsUser/runAsGroup in chart test action

• Removes CI Helm overrides for runAsUser and runAsGroup while retaining fsGroup. This aligns test deployments with the expected security context behavior and reduces assumptions about fixed UIDs.

.github/actions/test-charts/action.yml

Chart.yamlBump Backstage chart version to 6.1.2 +1/-1

Bump Backstage chart version to 6.1.2

• Increments the Helm chart version to reflect the included fix. No functional template changes are made in this file beyond versioning.

charts/backstage/Chart.yaml

@openshift-merge-bot openshift-merge-bot Bot merged commit 837bccc into redhat-developer:main Jul 1, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants