Bump tokenizers submodule to fix sentencepiece GCC 15 build by rascani · Pull Request #20135 · pytorch/executorch

rascani · 2026-06-08T22:24:11Z

Summary

Updates extension/llm/tokenizers to include
meta-pytorch/tokenizers#193, which bumps the sentencepiece submodule to pick up a missing #include <cstdint> (google/sentencepiece#1109).

Without this, pytorch_tokenizers fails to compile inside the executorch-ubuntu-26.04-gcc15 docker image, blocking the RISC-V baremetal CI (#19917).

Test plan

CI

Updates extension/llm/tokenizers to include meta-pytorch/tokenizers#193, which bumps the sentencepiece submodule to pick up a missing `#include <cstdint>` (google/sentencepiece#1109). Without this, `pytorch_tokenizers` fails to compile inside the `executorch-ubuntu-26.04-gcc15` docker image, blocking the RISC-V baremetal CI (pytorch#19917). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pytorch-bot · 2026-06-08T22:24:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20135

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Unclassified Failure

As of commit bc3e7cd with merge base ac3003e ():

NEW FAILURES - The following jobs have failed:

pull / test-lora-linux / linux-job (gh)
RuntimeError: Command docker exec -t 7f06869c119439f259e02815a7412a485f435b2f120f9bdd8fb1b39fb2148a10 /exec failed with exit code 1
pull / test-qnn-python-imports-linux / linux-job (gh)
RuntimeError: Command docker exec -t 0dc3b11ff96c56dd20ea6537639a6a089fd639b1ed015a16e0a13eb62e21ffcd /exec failed with exit code 92
Test WebGPU Backend / test-webgpu / test-backend-linux (webgpu, models) / linux-job (gh)
RuntimeError: Command docker exec -t 0210257beedf2610836b7a42975ba52f8de23dc83f991f87b82cd8292e182f03 /exec failed with exit code 1
Test WebGPU Backend / test-webgpu / test-backend-linux (webgpu, operators) / linux-job (gh)
RuntimeError: Command docker exec -t 41d09b579d2d741e8a8d2d2464321f8e43a0de66a1d934c6026dc953ed3dacc8 /exec failed with exit code 1

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

MLX / test-mlx / test-mlx (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 4

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-06-08T22:25:00Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

The tokenizers submodule bump (meta-pytorch/tokenizers#193) changed CMAKE_CXX_STANDARD from 17 to 20. Under C++20 the u8"▁" literal is const char8_t[], which has no implicit conversion to const char* and breaks std::string::rfind. Spell the SentencePiece word-boundary marker as raw UTF-8 bytes, matching the fix already on the 1.3 release branch (pytorch#19824). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

rascani · 2026-06-09T22:08:48Z

Failures unrelated.

rascani requested a review from kirklandsign June 8, 2026 22:24

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 8, 2026

rascani marked this pull request as ready for review June 8, 2026 22:24

rascani requested review from larryliu0820 and mergennachin as code owners June 8, 2026 22:24

kirklandsign approved these changes Jun 9, 2026

View reviewed changes

rascani merged commit 8e4fe08 into pytorch:main Jun 9, 2026
196 of 201 checks passed

rascani deleted the bump-tokenizers-sentencepiece branch June 9, 2026 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bump tokenizers submodule to fix sentencepiece GCC 15 build#20135

Bump tokenizers submodule to fix sentencepiece GCC 15 build#20135
rascani merged 2 commits into
pytorch:mainfrom
rascani:bump-tokenizers-sentencepiece

rascani commented Jun 8, 2026

Uh oh!

pytorch-bot Bot commented Jun 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 8, 2026

Uh oh!

rascani commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rascani commented Jun 8, 2026

Summary

Test plan

Uh oh!

pytorch-bot Bot commented Jun 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20135

❌ 4 New Failures, 1 Unclassified Failure

Uh oh!

github-actions Bot commented Jun 8, 2026

This PR needs a release notes: label

Uh oh!

rascani commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot Bot commented Jun 8, 2026 •

edited

Loading

This PR needs a `release notes:` label