Skip to content

FIX Transformers weight conversion PEFT model prefix#3230

Open
BenjaminBossan wants to merge 1 commit into
huggingface:mainfrom
BenjaminBossan:fix-transformers-weight-conversion-peft-model-prefix
Open

FIX Transformers weight conversion PEFT model prefix#3230
BenjaminBossan wants to merge 1 commit into
huggingface:mainfrom
BenjaminBossan:fix-transformers-weight-conversion-peft-model-prefix

Conversation

@BenjaminBossan
Copy link
Copy Markdown
Member

@BenjaminBossan BenjaminBossan commented May 12, 2026

In convert_peft_adapter_state_dict_for_transformers, we pass PEFT model to get_model_conversion_mapping. After the changes in huggingface/transformers#45661, a prefix check was introduced, which fails with PEFT models, since they have 'base_model.model' as an extra prefix.

https://github.com/huggingface/transformers/blob/a25b8efa0a3220da89493a72f57081aa5720291f/src/transformers/conversion_mapping.py#L1110-L1115

However, this bug was not immediately visible because the prefixes to check (and submodules) were determined once at Transformers model creation (i.e. before the model is wrapped by PEFT) and stored inside of _named_pretrained_submodules.

However, huggingface/transformers#45889 removed this attribute and now dynamically loops over the modules. At that point, the prefix error surfaces, which leads to failures in the PEFT CI.

This PR now changes the PEFT call to pass the Transformers base model to get_model_conversion_mapping instead of the PEFT model. I could confirm that the tests now pass on main and also with Transformers from before 45661.

Note that this error could also affect other models that are not PEFT and that wrap the Transformers module, if they use weight conversion. A more robust fix could thus be on the Transformers side to account for this possibility and make the scope_prefix check account for this possibility.

In convert_peft_adapter_state_dict_for_transformers, we pass PEFT
model to get_model_conversion_mapping. After the changes in
huggingface/transformers#45661, a prefix check
was introduced, which fails with PEFT models, since they have
'base_model.model' as an extra prefix.

https://github.com/huggingface/transformers/blob/a25b8efa0a3220da89493a72f57081aa5720291f/src/transformers/conversion_mapping.py#L1110-L1115

However, this bug was not immediately visible because the prefixes to
check (and submodules) were determined once at Transformers model
creation (i.e. before the model is wrapped by PEFT) and stored inside
of _named_pretrained_submodules.

However, huggingface/transformers#45889
removed this attribute and now dynamically loops over the modules. At
that point, the prefix error surfaces, which leads to failures in the
PEFT CI.

This PR now changes the PEFT call to pass the Transformers base model
to get_model_conversion_mapping instead of the PEFT model. I could
confirm that the tests now pass on main and also with Transformers
from before 45661.

Note that this error could also affect other models that are not PEFT
and that wrap the Transformers module, if they use weight
conversion. A more robust fix could thus be on the Transformers side
to account for this possibility and make the scope_prefix check
account for this possibility.
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@BenjaminBossan
Copy link
Copy Markdown
Member Author

@yonigozlan @Cyrilvallez could you please check if that makes sense?

@yonigozlan
Copy link
Copy Markdown
Member

Hey @BenjaminBossan! Do you have an example of a failure that I could reproduce? The prefix check is there only to scope the conversion so it shouldn't matter if the model is wrapped in another or not

@BenjaminBossan
Copy link
Copy Markdown
Member Author

I had actually asked Claude to generate a reproducer that doesn't involve PEFT, but I know too little about the weight conversion machinery to judge whether it makes sense or not, so I didn't post it. But perhaps it's helpful, so here it is.

Reproducer
"""
Reproducer for scope_prefix mis-attribution in
`transformers.conversion_mapping.get_model_conversion_mapping`.

The function iterates `model.named_modules()` and sets
`transform.scope_prefix = module_name` on every non-root `PreTrainedModel` match.
This is correct when `model` *is* the topmost `PreTrainedModel`. But when `model`
is any wrapper that holds a `PreTrainedModel` at a non-root attribute path, the
wrapper path bleeds into `scope_prefix`. The resulting transforms then silently
refuse to match any state-dict key that doesn't carry that wrapper path, because
`_scoped_match` requires the key to start with `f"{scope_prefix}."`.

Nothing in this file imports peft. The wrapper is a plain `nn.Module` that stands
in for any external integration that nests a `PreTrainedModel` under some
attribute (PEFT, accelerate hooks, RLHF reference-model holders, custom
distillation setups, ...).

Proposed fix (transformers side, around the `scope_prefix` mechanism): compute
`scope_prefix` relative to the topmost `PreTrainedModel` encountered during the
walk, not relative to `model` itself. When the caller hands in a wrapper, the
topmost `PreTrainedModel` sits at some non-root path; re-basing makes its
`scope_prefix` stay unset (which is what we want), while still letting any
genuine nested sub-`PreTrainedModel`s get correctly scoped relative paths.
"""

import torch
from torch import nn

from transformers import ViTConfig, ViTModel
from transformers.conversion_mapping import get_model_conversion_mapping
from transformers.core_model_loading import WeightRenaming


class Wrapper(nn.Module):
    """Minimal external wrapper that holds a PreTrainedModel at a non-root path.

    Any integration that nests a PreTrainedModel like this triggers the bug.
    """

    def __init__(self, inner: ViTModel):
        super().__init__()
        self.inner = inner


# Tiny ViT; we never run a forward, only inspect named_modules() / call the
# conversion-mapping helper. ViTModel is convenient because its registered
# conversion mapping is a clean chain of `WeightRenaming`s (no MoE fusion noise).
config = ViTConfig(
    hidden_size=32,
    num_hidden_layers=2,
    num_attention_heads=4,
    intermediate_size=64,
)
with torch.device("meta"):  # avoid real allocation; structure is all we need
    vit = ViTModel(config)
wrapped = Wrapper(vit)


def renamings(conversions):
    return [t for t in conversions if isinstance(t, WeightRenaming)]


# ---------------------------------------------------------------------------
# 1. Baseline: call the helper with the actual PreTrainedModel as argument.
#    scope_prefix stays None on every transform — they apply globally.
# ---------------------------------------------------------------------------
direct = get_model_conversion_mapping(vit)
print("=== get_model_conversion_mapping(vit)  (unwrapped) ===")
for t in renamings(direct):
    print(
        f"  {t.source_patterns} -> {t.target_patterns}  "
        f"scope_prefix={t.scope_prefix!r}"
    )

# ---------------------------------------------------------------------------
# 2. Buggy: call the helper with a wrapper. The walk reports the ViTModel at
#    path 'inner', so every transform gets scope_prefix='inner'.
# ---------------------------------------------------------------------------
via_wrapper = get_model_conversion_mapping(wrapped)
print("\n=== get_model_conversion_mapping(Wrapper(vit)) ===")
for t in renamings(via_wrapper):
    print(
        f"  {t.source_patterns} -> {t.target_patterns}  "
        f"scope_prefix={t.scope_prefix!r}"
    )

# ---------------------------------------------------------------------------
# 3. Concrete consequence: a real v4-style ViT checkpoint key goes through the
#    rename chain cleanly via the unwrapped model but is silently left
#    unchanged via the wrapper (because `_scoped_match` rejects it for not
#    starting with 'inner.').
# ---------------------------------------------------------------------------
example_key = "encoder.layer.0.attention.attention.query.weight"


def apply_renamings(key, conversions):
    """Mimic the per-key loop transformers/core_model_loading uses."""
    for t in renamings(conversions):
        key, _ = t.rename_source_key(key)
    return key


renamed_direct = apply_renamings(example_key, direct)
renamed_wrapper = apply_renamings(example_key, via_wrapper)

print(f"\nkey: {example_key}")
print(f"  via unwrapped model -> {renamed_direct}")
print(f"  via wrapper          -> {renamed_wrapper}")

# The unwrapped path normalises encoder.layer. -> layers. and attention.query
# -> q_proj, as registered for ViTModel in conversion_mapping.py:120-128.
assert renamed_direct == "layers.0.attention.q_proj.weight", (
    f"sanity check failed; unwrapped rename produced {renamed_direct!r}"
)

# The wrapper path leaves the key untouched — that's the bug. scope_prefix
# 'inner' makes every renaming a no-op for keys that don't carry that prefix.
assert renamed_wrapper == example_key, (
    "expected the bug to manifest as a silent skip; "
    f"wrapper produced {renamed_wrapper!r}"
)

print(
    "\nBug reproduced: every transform returned via the wrapper has "
    "scope_prefix='inner', so the rename chain silently skips ordinary "
    "checkpoint keys.\n"
    "\nProposed fix sketch (in get_model_conversion_mapping):\n"
    "  1. Find the topmost PreTrainedModel once:\n"
    "       root_path = next(name for name, m in model.named_modules()\n"
    "                       if isinstance(m, PreTrainedModel))\n"
    "  2. Re-base each subsequent `module_name` against `root_path` before\n"
    "     deciding scope:\n"
    "       if module_name == root_path: module_name = ''\n"
    "       elif module_name.startswith(root_path + '.'):\n"
    "           module_name = module_name[len(root_path) + 1:]\n"
    "     The rest of the loop is unchanged. When `model` is itself a\n"
    "     PreTrainedModel, `root_path == ''` and the diff is a no-op."
)

I can't say for sure if other packages may encounter issues similar to PEFT, as PEFT goes relatively low level on this. Not sure how public get_model_conversion_mapping should be considered to be.

@yonigozlan
Copy link
Copy Markdown
Member

Diving a bit more into the code, I found these lines in both peft and transformers: Peft, transformers

    new_weight_conversions = [
       WeightRenaming("base_model.model.model.", "model."),
       WeightRenaming("base_model.model.", ""),
   ]

If I understand correctly, this is because a peft adapter can either be loaded with PeftModel.from_pretrained(model, lora_id) like in peft's test_mixtral_v4_lora_weight_conversion_peft_model_from_pretrained or with model.load_adapter(adapter_name) like in transformers' test_mixtral_lora_conversion?

We didn't have a failure in transformers because we never have base_model.model.

In the peft test case, since these prefix changes are placed before any other conversions, they indeed break the scope_prefix. A solution would be to remove these patterns at the beginning of build_peft_weight_mapping

    new_weight_conversions = [
       WeightRenaming("base_model.model.model.", "model."),
       WeightRenaming("base_model.model.", ""),
   ]

And put this pattern at the very end of build_peft_weight_mapping (more robust pattern imo), so that there's no conflict with the patterns defined previously

    new_weight_conversions += [WeightRenaming(r"^base_model\.model\.", "")]

This seems to work both on transformers and peft sides when making the changes on both sides.

However I don't think we should have the pattern WeightRenaming(r"^base_model\.model\.", "") in build_peft_weight_mapping at all, because in the wrapper case, we basically remove the prefix (even if the target weights do have the prefix) just to add it again in convert_peft_adapter_state_dict_for_transformers with _convert_to_peft_serialized_keys
I think the prefix removal pattern belongs only in PeftAdapterMixin on the transformers side, unless I have missed something.

@BenjaminBossan
Copy link
Copy Markdown
Member Author

If I understand correctly, this is because a peft adapter can either be loaded with PeftModel.from_pretrained(model, lora_id) like in peft's test_mixtral_v4_lora_weight_conversion_peft_model_from_pretrained or with model.load_adapter(adapter_name) like in transformers' test_mixtral_lora_conversion?

Yes, correct.

We didn't have a failure in transformers because we never have base_model.model.

Yes, that part is only tested in PEFT. Moreover, PEFT integration tests in Transformers are currently all slow tests so it often happens that regressions are not caught (I opened a PR to change that).

A solution would be to remove these patterns at the beginning of build_peft_weight_mapping
And put this pattern at the very end of build_peft_weight_mapping (more robust pattern imo), so that there's no conflict with the patterns defined previously

However I don't think we should have the pattern WeightRenaming(r"^base_model\.model\.", "") in build_peft_weight_mapping at all, because in the wrapper case, we basically remove the prefix (even if the target weights do have the prefix) just to add it again in convert_peft_adapter_state_dict_for_transformers with _convert_to_peft_serialized_keys

Just in general, I'm fine with the solution proposed in the PR here and not asking for a different solution. My point was rather to give a heads up that other packages could also be affected in a similar way as PEFT was and that maybe there can be a Transformers-side fix to prevent that. This would need to be independent of the specific prefix. Maybe something like: if the model is not a PreTrainedModel, the scope_prefix is not checked against the root of the model but against the first child of the root that is an instance of PreTrainedModel (so for PEFT it would be model.base_model.model but for others it could be different). But again, I'm not knowledgeable enough about how this is handled in Transformers to understand if this could work or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants