Add xorl-client OPD example#6
Conversation
| ) | ||
| from transformers import AutoTokenizer | ||
|
|
||
| return AutoTokenizer.from_pretrained(tokenizer_path, trust_remote_code=True) |
There was a problem hiding this comment.
Static Code Analysis Risk: Together python huggingface trust remote code
trust_remote_code=True downloads and executes arbitrary Python code from the model repository without sandboxing (OWASP LLM03:2025 Supply Chain). A malicious or compromised model repo can achieve RCE on every host that loads the model (CWE-94). Pin to a verified commit hash and audit remote code before use, or use models that don't require trust_remote_code.
Severity: High 🚨
Status: Open 🔴
References:
- https://cwe.mitre.org/data/definitions/94
- https://huggingface.co/docs/transformers/main/en/main_classes/model#transformers.PreTrainedModel.from_pretrained
- https://genai.owasp.org/llmrisk/llm032025-supply-chain/
- https://hiddenlayer.com/research/weaponizing-machine-learning-models-with-ransomware/
Suggested reviewers 🧐: @kiddyboots216
More details:
If you see an issue, please contact Shasheen in the #security-engineering Slack channel.
Take action by replying with an [arnica] command 💬
Actions
Use [arnica] or [a] to interact with the Arnica bot to acknowledge or dismiss code risks.
To acknowledge the finding as a valid code risk: [arnica] ack <acknowledge additional details>
To dismiss the risk with a reason: [arnica] dismiss <fp|accept|capacity> <dismissal reason>
Examples
-
[arnica] ack This is a valid risk and I'm looking into it -
[arnica] dismiss fp Dismissed - Risk Not Accurate: (i.e. False Positive) -
[arnica] dismiss accept Dismiss - Risk Accepted: Allow the risk to exist in the system -
[arnica] dismiss capacity Dismiss - No Capacity: This will need to wait for a future sprint
Summary
examples/on_policy_distillation.py, a client-side OPD loop that samples from student SGLang/Dispatch endpoints, asks a XORL teacher service to prefill hidden-state caches, trains withopd_loss, optionally runs optimizer steps, and syncs weights back to inference with P2P.Latest Validation
python -m pytest tests/test_on_policy_distillation_example.py tests/test_training_client.py tests/test_sampling_client_generate_payload.py tests/test_batch_sampling.py -q-> 50 passed, 2 warnings.python -m ruff check examples/on_policy_distillation.py tests/test_on_policy_distillation_example.py tests/test_training_client.py tests/test_batch_sampling.py tests/test_sampling_client_generate_payload.py xorl_client/client/sampling_client.py xorl_client/client/service_client.py xorl_client/client/training_client.py xorl_client/types/sampled_sequence.py xorl_client/types/sampling_params.py-> passed.python -m ruff format --check examples/on_policy_distillation.py tests/test_on_policy_distillation_example.py tests/test_training_client.py tests/test_batch_sampling.py tests/test_sampling_client_generate_payload.py xorl_client/client/sampling_client.py xorl_client/client/service_client.py xorl_client/client/training_client.py xorl_client/types/sampled_sequence.py xorl_client/types/sampling_params.py-> passed.opd-q35-c4-t36-1792p-pack4completed optimizer + P2P weight-sync iterations using the xorl-client OPD path. Latest steady row:valid_tokens_per_step_s=1888.8,teacher_prefill_s=64.60,forward_backward_s=169.32,sync_transfer_time_s=7.99for 69.32GB to 16 endpoints.opd-q35-tp2-pack4-s1-20260516-dispatch-benchcompleted 2048/2048 requests with 0 errors and 7,870.5 completion tok/s through Dispatch.Notes