5% BERT benchmark improvement: Remove unnecessary to_vec() from slice() by jberg5 · Pull Request #1964 · huggingface/tokenizers

jberg5 · 2026-03-16T08:45:03Z

NormalizedString::slice was doing an unnecessary to_vec(), which meant doing an allocation and copy of N alignment tuples, and then doing it again on collect(). Removing to_vec() theoretically cuts the work that slice() has to do in half, which shows up as a consistent 5% performance improvement on the end to end runtime of the BERT benchmark.

Full AI disclosure: Claude (with Opus 4.6) spotted this one when I asked it to do a large scale audit of this repo looking for bugs and performance improvements. This is a small change in terms of code, and it seems legit to me :) I ran benchmarks both locally on my macbook, and on a gcloud c2-standard-4. Here's Claude's summary of what that looked like:

  We profiled the BERT encode pipeline on a GCloud c2-standard-4 (dedicated Intel Xeon cores) using perf record with
  Chinese text (红楼梦, 80% CJK). The perf profile showed NormalizedString::slice at 3.6% of total encode time, plus
  its allocation overhead spread across malloc (5.8%), cfree (3.0%), and Vec::from_iter (2.0%). The .to_vec() was
  creating a redundant intermediate Vec on every call — one per text segment, per split pass.

  Benchmark methodology

  Built baseline and optimized binaries from the same source, differing only in the .to_vec() line. Ran them in
  alternating ABBAAB pattern (6 runs each) to control for thermal drift and cache effects.

  BERT WordPiece, ASCII English text (big.txt, 6.2MB, ARM Apple M-series):
  Baseline:  mean=9.506s  stdev=0.110s  range=[9.413, 9.713]
  Optimized: mean=8.951s  stdev=0.082s  range=[8.866, 9.071]
  Improvement: 5.8%, ranges do not overlap

  BERT WordPiece, CJK Chinese text (红楼梦, 2.5MB, GCloud c2-standard-4 Intel Xeon):
  Baseline:  mean=3.178s  stdev=0.008s  range=[3.17, 3.19]
  Optimized: mean=3.055s  stdev=0.024s  range=[3.04, 3.10]
  Improvement: 3.9%, ranges do not overlap

jberg5 · 2026-03-16T08:45:51Z

I wasn't sure if it was appropriate to create a github issue for this; let me know if I should, happy to do so.

ArthurZucker

Sounds good, if tests are all passing merging!

HuggingFaceDocBuilderDev · 2026-03-23T16:15:10Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Remove unnecessary to_vec() from slice()

37bd283

ArthurZucker approved these changes Mar 23, 2026

View reviewed changes

ArthurZucker merged commit cbd8cf2 into huggingface:main Mar 24, 2026
34 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5% BERT benchmark improvement: Remove unnecessary to_vec() from slice()#1964

5% BERT benchmark improvement: Remove unnecessary to_vec() from slice()#1964
ArthurZucker merged 1 commit intohuggingface:mainfrom
jberg5:bert-perf

jberg5 commented Mar 16, 2026

Uh oh!

jberg5 commented Mar 16, 2026

Uh oh!

ArthurZucker left a comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jberg5 commented Mar 16, 2026

Uh oh!

jberg5 commented Mar 16, 2026

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants