Skip to content

Fix type_ids not applied to overflow encodings#1965

Open
joaquinhuigomez wants to merge 1 commit intohuggingface:mainfrom
joaquinhuigomez:fix/overflow-type-ids
Open

Fix type_ids not applied to overflow encodings#1965
joaquinhuigomez wants to merge 1 commit intohuggingface:mainfrom
joaquinhuigomez:fix/overflow-type-ids

Conversation

@joaquinhuigomez
Copy link
Copy Markdown

@joaquinhuigomez joaquinhuigomez commented Mar 17, 2026

When using TemplateProcessing with type_id mappings, type_ids were only applied to the main encoding but not to overflow encodings from truncation. This fix iterates over overflow encodings in apply_template and sets both type_ids and sequence_ids consistently.

…and RobertaProcessing

Fixes huggingface#1908. When a Piece is a Sequence with overflows, set_type_ids and
set_sequence_id were only called on the main encoding, leaving overflow
encodings with incorrect (default) type_ids. This applies the same
type_id and sequence_id to all overflow encodings in:
- TemplateProcessing::apply_template
- RobertaProcessing::process_encodings
- PostProcessor::process (default trait impl)

Adds a regression test that explicitly validates type_ids on overflow
and nested overflow encodings.
@joaquinhuigomez
Copy link
Copy Markdown
Author

Hi — just checking if this is still on the review queue. Happy to address any feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants