Skip to content

Bump tokenizers submodule to fix sentencepiece GCC 15 build#20135

Merged
rascani merged 2 commits into
pytorch:mainfrom
rascani:bump-tokenizers-sentencepiece
Jun 9, 2026
Merged

Bump tokenizers submodule to fix sentencepiece GCC 15 build#20135
rascani merged 2 commits into
pytorch:mainfrom
rascani:bump-tokenizers-sentencepiece

Conversation

@rascani

@rascani rascani commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Updates extension/llm/tokenizers to include
meta-pytorch/tokenizers#193, which bumps the sentencepiece submodule to pick up a missing #include <cstdint> (google/sentencepiece#1109).

Without this, pytorch_tokenizers fails to compile inside the executorch-ubuntu-26.04-gcc15 docker image, blocking the RISC-V baremetal CI (#19917).

Test plan

CI

Updates extension/llm/tokenizers to include
meta-pytorch/tokenizers#193, which bumps the sentencepiece
submodule to pick up a missing `#include <cstdint>`
(google/sentencepiece#1109).

Without this, `pytorch_tokenizers` fails to compile inside the
`executorch-ubuntu-26.04-gcc15` docker image, blocking the RISC-V
baremetal CI (pytorch#19917).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rascani rascani requested a review from kirklandsign June 8, 2026 22:24
@pytorch-bot

pytorch-bot Bot commented Jun 8, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20135

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 New Failures, 1 Unclassified Failure

As of commit bc3e7cd with merge base ac3003e (image):

NEW FAILURES - The following jobs have failed:

UNCLASSIFIED FAILURE - DrCI could not classify the following job because the workflow did not run on the merge base. The failure may be pre-existing on trunk or introduced by this PR:

  • MLX / test-mlx / test-mlx (gh) (this job did not run on the merge base, so DrCI cannot tell whether the failure is pre-existing)
    RuntimeError: Command bash /Users/runner/work/_temp/exec_script failed with exit code 4

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 8, 2026
@rascani rascani marked this pull request as ready for review June 8, 2026 22:24
@github-actions

github-actions Bot commented Jun 8, 2026

Copy link
Copy Markdown

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

The tokenizers submodule bump (meta-pytorch/tokenizers#193) changed
CMAKE_CXX_STANDARD from 17 to 20. Under C++20 the u8"▁" literal is
const char8_t[], which has no implicit conversion to const char* and
breaks std::string::rfind.

Spell the SentencePiece word-boundary marker as raw UTF-8 bytes,
matching the fix already on the 1.3 release branch (pytorch#19824).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@rascani

rascani commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

Failures unrelated.

@rascani rascani merged commit 8e4fe08 into pytorch:main Jun 9, 2026
196 of 201 checks passed
@rascani rascani deleted the bump-tokenizers-sentencepiece branch June 9, 2026 22:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants