Add xorl-client OPD example by kiddyboots216 · Pull Request #6 · togethercomputer/xorl-client

kiddyboots216 · 2026-05-15T01:05:11Z

Summary

Adds examples/on_policy_distillation.py, a client-side OPD loop that samples from student SGLang/Dispatch endpoints, asks a XORL teacher service to prefill hidden-state caches, trains with opd_loss, optionally runs optimizer steps, and syncs weights back to inference with P2P.
Extends the sampling client for the chat-completions path used by Dispatch, including real chat messages, payload construction coverage, and batch/concurrency behavior used by the OPD Kubernetes runs.
Adds TrainingClient support for passing a weight-sync master address and timeout so the OPD example can use the P2P sync path from Kubernetes.
Parses scalar TensorData loss wire formats in ForwardBackwardOutput and covers OPD payload alignment/profile helpers with tests.

Latest Validation

python -m pytest tests/test_on_policy_distillation_example.py tests/test_training_client.py tests/test_sampling_client_generate_payload.py tests/test_batch_sampling.py -q -> 50 passed, 2 warnings.
python -m ruff check examples/on_policy_distillation.py tests/test_on_policy_distillation_example.py tests/test_training_client.py tests/test_batch_sampling.py tests/test_sampling_client_generate_payload.py xorl_client/client/sampling_client.py xorl_client/client/service_client.py xorl_client/client/training_client.py xorl_client/types/sampled_sequence.py xorl_client/types/sampling_params.py -> passed.
python -m ruff format --check examples/on_policy_distillation.py tests/test_on_policy_distillation_example.py tests/test_training_client.py tests/test_batch_sampling.py tests/test_sampling_client_generate_payload.py xorl_client/client/sampling_client.py xorl_client/client/service_client.py xorl_client/client/training_client.py xorl_client/types/sampled_sequence.py xorl_client/types/sampling_params.py -> passed.
K8s OPD packed TP2 validation: opd-q35-c4-t36-1792p-pack4 completed optimizer + P2P weight-sync iterations using the xorl-client OPD path. Latest steady row: valid_tokens_per_step_s=1888.8, teacher_prefill_s=64.60, forward_backward_s=169.32, sync_transfer_time_s=7.99 for 69.32GB to 16 endpoints.
Sampler saturation validation: opd-q35-tp2-pack4-s1-20260516-dispatch-bench completed 2048/2048 requests with 0 errors and 7,870.5 completion tok/s through Dispatch.

Notes

Companion XORL server PR: togethercomputer/xorl-internal#209.
Generated K8s manifests, result ledgers, and ad-hoc analysis artifacts are intentionally not included in this PR.

arnica-github-connector · 2026-05-17T00:14:12Z

+        )
+    from transformers import AutoTokenizer
+
+    return AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True)


Static Code Analysis Risk: Together python huggingface trust remote code

trust_remote_code=True downloads and executes arbitrary Python code from the model repository without sandboxing (OWASP LLM03:2025 Supply Chain). A malicious or compromised model repo can achieve RCE on every host that loads the model (CWE-94). Pin to a verified commit hash and audit remote code before use, or use models that don't require trust_remote_code.

Severity: High 🚨
Status: Open 🔴

References:

https://cwe.mitre.org/data/definitions/94

https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained

https://genai.owasp.org/llmrisk/llm032025-supply-chain/

https://hiddenlayer.com/research/weaponizing-machine-learning-models-with-ransomware/

Suggested reviewers 🧐: @kiddyboots216

More details:

🌻 View in Arnica

If you see an issue, please contact Shasheen in the #security-engineering Slack channel.

Take action by replying with an [arnica] command 💬

Actions

Use [arnica] or [a] to interact with the Arnica bot to acknowledge or dismiss code risks.

To acknowledge the finding as a valid code risk: [arnica] ack <acknowledge additional details>

To dismiss the risk with a reason: [arnica] dismiss <fp|accept|capacity> <dismissal reason>

Examples

[arnica] ack This is a valid risk and I'm looking into it

[arnica] dismiss fp Dismissed - Risk Not Accurate: (i.e. False Positive)

[arnica] dismiss accept Dismiss - Risk Accepted: Allow the risk to exist in the system

[arnica] dismiss capacity Dismiss - No Capacity: This will need to wait for a future sprint

kiddyboots216 added 2 commits May 15, 2026 01:04

Add xorl-client OPD example

a8390a0

feat(opd): support chat sampling pipeline

fa522da

arnica-github-connector Bot reviewed May 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add xorl-client OPD example#6

Add xorl-client OPD example#6
kiddyboots216 wants to merge 2 commits into
examples/filler-rl-pipelinefrom
opd-client-pipeline

kiddyboots216 commented May 15, 2026 •

edited

Loading

Uh oh!

arnica-github-connector Bot May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kiddyboots216 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Latest Validation

Notes

Uh oh!

arnica-github-connector Bot May 17, 2026

Choose a reason for hiding this comment

Static Code Analysis Risk: Together python huggingface trust remote code

References:

More details:

Actions

Examples

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kiddyboots216 commented May 15, 2026 •

edited

Loading