Skip to content
This repository was archived by the owner on Nov 26, 2025. It is now read-only.

Resolving conflicting values for padding_index#47

Merged
micedre merged 8 commits intomainfrom
tokenized_text_tokens
Apr 8, 2025
Merged

Resolving conflicting values for padding_index#47
micedre merged 8 commits intomainfrom
tokenized_text_tokens

Conversation

@meilame-tayebjee
Copy link
Copy Markdown
Member

@meilame-tayebjee meilame-tayebjee commented Mar 17, 2025

Resolves #43.

The padding index can be inferred from the tokenizer fields themselves, so padding_index is now considered as a tokenizer field. The model inherits from it but user can overwrite it, in that case the tokenizer's padding index is also updated.

All default values for this field have been removed and replaced by the suitable field value.

The method utils.tokenized_text_in_tokens has been moved to a private instance method in the tokenizer's class tokenizer._tokenized_text_in_tokens, which is more natural. This method is used "internally" in tokenizer.tokenize and model.predict.

@micedre micedre merged commit d95c11f into main Apr 8, 2025
3 checks passed
@meilame-tayebjee meilame-tayebjee deleted the tokenized_text_tokens branch April 13, 2025 10:56
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use of padding index in tokenizer.tokenize + utils.tokenized_text_in_tokens function

2 participants