Skip to content

Add xorl-client OPD example#6

Open
kiddyboots216 wants to merge 2 commits into
examples/filler-rl-pipelinefrom
opd-client-pipeline
Open

Add xorl-client OPD example#6
kiddyboots216 wants to merge 2 commits into
examples/filler-rl-pipelinefrom
opd-client-pipeline

Conversation

@kiddyboots216
Copy link
Copy Markdown
Contributor

@kiddyboots216 kiddyboots216 commented May 15, 2026

Summary

  • Adds examples/on_policy_distillation.py, a client-side OPD loop that samples from student SGLang/Dispatch endpoints, asks a XORL teacher service to prefill hidden-state caches, trains with opd_loss, optionally runs optimizer steps, and syncs weights back to inference with P2P.
  • Extends the sampling client for the chat-completions path used by Dispatch, including real chat messages, payload construction coverage, and batch/concurrency behavior used by the OPD Kubernetes runs.
  • Adds TrainingClient support for passing a weight-sync master address and timeout so the OPD example can use the P2P sync path from Kubernetes.
  • Parses scalar TensorData loss wire formats in ForwardBackwardOutput and covers OPD payload alignment/profile helpers with tests.

Latest Validation

  • python -m pytest tests/test_on_policy_distillation_example.py tests/test_training_client.py tests/test_sampling_client_generate_payload.py tests/test_batch_sampling.py -q -> 50 passed, 2 warnings.
  • python -m ruff check examples/on_policy_distillation.py tests/test_on_policy_distillation_example.py tests/test_training_client.py tests/test_batch_sampling.py tests/test_sampling_client_generate_payload.py xorl_client/client/sampling_client.py xorl_client/client/service_client.py xorl_client/client/training_client.py xorl_client/types/sampled_sequence.py xorl_client/types/sampling_params.py -> passed.
  • python -m ruff format --check examples/on_policy_distillation.py tests/test_on_policy_distillation_example.py tests/test_training_client.py tests/test_batch_sampling.py tests/test_sampling_client_generate_payload.py xorl_client/client/sampling_client.py xorl_client/client/service_client.py xorl_client/client/training_client.py xorl_client/types/sampled_sequence.py xorl_client/types/sampling_params.py -> passed.
  • K8s OPD packed TP2 validation: opd-q35-c4-t36-1792p-pack4 completed optimizer + P2P weight-sync iterations using the xorl-client OPD path. Latest steady row: valid_tokens_per_step_s=1888.8, teacher_prefill_s=64.60, forward_backward_s=169.32, sync_transfer_time_s=7.99 for 69.32GB to 16 endpoints.
  • Sampler saturation validation: opd-q35-tp2-pack4-s1-20260516-dispatch-bench completed 2048/2048 requests with 0 errors and 7,870.5 completion tok/s through Dispatch.

Notes

  • Companion XORL server PR: togethercomputer/xorl-internal#209.
  • Generated K8s manifests, result ledgers, and ad-hoc analysis artifacts are intentionally not included in this PR.

)
from transformers import AutoTokenizer

return AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Static Code Analysis Risk: Together python huggingface trust remote code

trust_remote_code=True downloads and executes arbitrary Python code from the model repository without sandboxing (OWASP LLM03:2025 Supply Chain). A malicious or compromised model repo can achieve RCE on every host that loads the model (CWE-94). Pin to a verified commit hash and audit remote code before use, or use models that don't require trust_remote_code.

Severity: High 🚨
Status: Open 🔴

References:

  1. https://cwe.mitre.org/data/definitions/94
  2. https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained
  3. https://genai.owasp.org/llmrisk/llm032025-supply-chain/
  4. https://hiddenlayer.com/research/weaponizing-machine-learning-models-with-ransomware/

Suggested reviewers 🧐: @kiddyboots216

More details:

🌻 View in Arnica

If you see an issue, please contact Shasheen in the #security-engineering Slack channel.


Take action by replying with an [arnica] command 💬

Actions

Use [arnica] or [a] to interact with the Arnica bot to acknowledge or dismiss code risks.

To acknowledge the finding as a valid code risk: [arnica] ack <acknowledge additional details>

To dismiss the risk with a reason: [arnica] dismiss <fp|accept|capacity> <dismissal reason>

Examples

  • [arnica] ack This is a valid risk and I'm looking into it

  • [arnica] dismiss fp Dismissed - Risk Not Accurate: (i.e. False Positive)

  • [arnica] dismiss accept Dismiss - Risk Accepted: Allow the risk to exist in the system

  • [arnica] dismiss capacity Dismiss - No Capacity: This will need to wait for a future sprint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant