FIX Transformers weight conversion PEFT model prefix#3230
FIX Transformers weight conversion PEFT model prefix#3230BenjaminBossan wants to merge 1 commit into
Conversation
In convert_peft_adapter_state_dict_for_transformers, we pass PEFT model to get_model_conversion_mapping. After the changes in huggingface/transformers#45661, a prefix check was introduced, which fails with PEFT models, since they have 'base_model.model' as an extra prefix. https://github.com/huggingface/transformers/blob/a25b8efa0a3220da89493a72f57081aa5720291f/src/transformers/conversion_mapping.py#L1110-L1115 However, this bug was not immediately visible because the prefixes to check (and submodules) were determined once at Transformers model creation (i.e. before the model is wrapped by PEFT) and stored inside of _named_pretrained_submodules. However, huggingface/transformers#45889 removed this attribute and now dynamically loops over the modules. At that point, the prefix error surfaces, which leads to failures in the PEFT CI. This PR now changes the PEFT call to pass the Transformers base model to get_model_conversion_mapping instead of the PEFT model. I could confirm that the tests now pass on main and also with Transformers from before 45661. Note that this error could also affect other models that are not PEFT and that wrap the Transformers module, if they use weight conversion. A more robust fix could thus be on the Transformers side to account for this possibility and make the scope_prefix check account for this possibility.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
@yonigozlan @Cyrilvallez could you please check if that makes sense? |
|
Hey @BenjaminBossan! Do you have an example of a failure that I could reproduce? The prefix check is there only to scope the conversion so it shouldn't matter if the model is wrapped in another or not |
|
I had actually asked Claude to generate a reproducer that doesn't involve PEFT, but I know too little about the weight conversion machinery to judge whether it makes sense or not, so I didn't post it. But perhaps it's helpful, so here it is. Reproducer"""
Reproducer for scope_prefix mis-attribution in
`transformers.conversion_mapping.get_model_conversion_mapping`.
The function iterates `model.named_modules()` and sets
`transform.scope_prefix = module_name` on every non-root `PreTrainedModel` match.
This is correct when `model` *is* the topmost `PreTrainedModel`. But when `model`
is any wrapper that holds a `PreTrainedModel` at a non-root attribute path, the
wrapper path bleeds into `scope_prefix`. The resulting transforms then silently
refuse to match any state-dict key that doesn't carry that wrapper path, because
`_scoped_match` requires the key to start with `f"{scope_prefix}."`.
Nothing in this file imports peft. The wrapper is a plain `nn.Module` that stands
in for any external integration that nests a `PreTrainedModel` under some
attribute (PEFT, accelerate hooks, RLHF reference-model holders, custom
distillation setups, ...).
Proposed fix (transformers side, around the `scope_prefix` mechanism): compute
`scope_prefix` relative to the topmost `PreTrainedModel` encountered during the
walk, not relative to `model` itself. When the caller hands in a wrapper, the
topmost `PreTrainedModel` sits at some non-root path; re-basing makes its
`scope_prefix` stay unset (which is what we want), while still letting any
genuine nested sub-`PreTrainedModel`s get correctly scoped relative paths.
"""
import torch
from torch import nn
from transformers import ViTConfig, ViTModel
from transformers.conversion_mapping import get_model_conversion_mapping
from transformers.core_model_loading import WeightRenaming
class Wrapper(nn.Module):
"""Minimal external wrapper that holds a PreTrainedModel at a non-root path.
Any integration that nests a PreTrainedModel like this triggers the bug.
"""
def __init__(self, inner: ViTModel):
super().__init__()
self.inner = inner
# Tiny ViT; we never run a forward, only inspect named_modules() / call the
# conversion-mapping helper. ViTModel is convenient because its registered
# conversion mapping is a clean chain of `WeightRenaming`s (no MoE fusion noise).
config = ViTConfig(
hidden_size=32,
num_hidden_layers=2,
num_attention_heads=4,
intermediate_size=64,
)
with torch.device("meta"): # avoid real allocation; structure is all we need
vit = ViTModel(config)
wrapped = Wrapper(vit)
def renamings(conversions):
return [t for t in conversions if isinstance(t, WeightRenaming)]
# ---------------------------------------------------------------------------
# 1. Baseline: call the helper with the actual PreTrainedModel as argument.
# scope_prefix stays None on every transform — they apply globally.
# ---------------------------------------------------------------------------
direct = get_model_conversion_mapping(vit)
print("=== get_model_conversion_mapping(vit) (unwrapped) ===")
for t in renamings(direct):
print(
f" {t.source_patterns} -> {t.target_patterns} "
f"scope_prefix={t.scope_prefix!r}"
)
# ---------------------------------------------------------------------------
# 2. Buggy: call the helper with a wrapper. The walk reports the ViTModel at
# path 'inner', so every transform gets scope_prefix='inner'.
# ---------------------------------------------------------------------------
via_wrapper = get_model_conversion_mapping(wrapped)
print("\n=== get_model_conversion_mapping(Wrapper(vit)) ===")
for t in renamings(via_wrapper):
print(
f" {t.source_patterns} -> {t.target_patterns} "
f"scope_prefix={t.scope_prefix!r}"
)
# ---------------------------------------------------------------------------
# 3. Concrete consequence: a real v4-style ViT checkpoint key goes through the
# rename chain cleanly via the unwrapped model but is silently left
# unchanged via the wrapper (because `_scoped_match` rejects it for not
# starting with 'inner.').
# ---------------------------------------------------------------------------
example_key = "encoder.layer.0.attention.attention.query.weight"
def apply_renamings(key, conversions):
"""Mimic the per-key loop transformers/core_model_loading uses."""
for t in renamings(conversions):
key, _ = t.rename_source_key(key)
return key
renamed_direct = apply_renamings(example_key, direct)
renamed_wrapper = apply_renamings(example_key, via_wrapper)
print(f"\nkey: {example_key}")
print(f" via unwrapped model -> {renamed_direct}")
print(f" via wrapper -> {renamed_wrapper}")
# The unwrapped path normalises encoder.layer. -> layers. and attention.query
# -> q_proj, as registered for ViTModel in conversion_mapping.py:120-128.
assert renamed_direct == "layers.0.attention.q_proj.weight", (
f"sanity check failed; unwrapped rename produced {renamed_direct!r}"
)
# The wrapper path leaves the key untouched — that's the bug. scope_prefix
# 'inner' makes every renaming a no-op for keys that don't carry that prefix.
assert renamed_wrapper == example_key, (
"expected the bug to manifest as a silent skip; "
f"wrapper produced {renamed_wrapper!r}"
)
print(
"\nBug reproduced: every transform returned via the wrapper has "
"scope_prefix='inner', so the rename chain silently skips ordinary "
"checkpoint keys.\n"
"\nProposed fix sketch (in get_model_conversion_mapping):\n"
" 1. Find the topmost PreTrainedModel once:\n"
" root_path = next(name for name, m in model.named_modules()\n"
" if isinstance(m, PreTrainedModel))\n"
" 2. Re-base each subsequent `module_name` against `root_path` before\n"
" deciding scope:\n"
" if module_name == root_path: module_name = ''\n"
" elif module_name.startswith(root_path + '.'):\n"
" module_name = module_name[len(root_path) + 1:]\n"
" The rest of the loop is unchanged. When `model` is itself a\n"
" PreTrainedModel, `root_path == ''` and the diff is a no-op."
)I can't say for sure if other packages may encounter issues similar to PEFT, as PEFT goes relatively low level on this. Not sure how public |
|
Diving a bit more into the code, I found these lines in both peft and transformers: Peft, transformers new_weight_conversions = [
WeightRenaming("base_model.model.model.", "model."),
WeightRenaming("base_model.model.", ""),
]If I understand correctly, this is because a peft adapter can either be loaded with We didn't have a failure in transformers because we never have In the peft test case, since these prefix changes are placed before any other conversions, they indeed break the scope_prefix. A solution would be to remove these patterns at the beginning of new_weight_conversions = [
WeightRenaming("base_model.model.model.", "model."),
WeightRenaming("base_model.model.", ""),
]And put this pattern at the very end of new_weight_conversions += [WeightRenaming(r"^base_model\.model\.", "")]This seems to work both on transformers and peft sides when making the changes on both sides. However I don't think we should have the pattern |
Yes, correct.
Yes, that part is only tested in PEFT. Moreover, PEFT integration tests in Transformers are currently all
Just in general, I'm fine with the solution proposed in the PR here and not asking for a different solution. My point was rather to give a heads up that other packages could also be affected in a similar way as PEFT was and that maybe there can be a Transformers-side fix to prevent that. This would need to be independent of the specific prefix. Maybe something like: if the model is not a |
In
convert_peft_adapter_state_dict_for_transformers, we pass PEFT model toget_model_conversion_mapping. After the changes in huggingface/transformers#45661, a prefix check was introduced, which fails with PEFT models, since they have'base_model.model'as an extra prefix.https://github.com/huggingface/transformers/blob/a25b8efa0a3220da89493a72f57081aa5720291f/src/transformers/conversion_mapping.py#L1110-L1115
However, this bug was not immediately visible because the prefixes to check (and submodules) were determined once at Transformers model creation (i.e. before the model is wrapped by PEFT) and stored inside of
_named_pretrained_submodules.However, huggingface/transformers#45889 removed this attribute and now dynamically loops over the modules. At that point, the prefix error surfaces, which leads to failures in the PEFT CI.
This PR now changes the PEFT call to pass the Transformers base model to
get_model_conversion_mappinginstead of the PEFT model. I could confirm that the tests now pass on main and also with Transformers from before 45661.Note that this error could also affect other models that are not PEFT and that wrap the Transformers module, if they use weight conversion. A more robust fix could thus be on the Transformers side to account for this possibility and make the
scope_prefixcheck account for this possibility.