[WIP]Full fine tune support#3
Conversation
There was a problem hiding this comment.
Code Review
This pull request integrates KTransformers (KT) support into Accelerate, introducing the KTransformersPlugin and logic to handle KT expert modules within FSDP2. It also implements optimized communication primitives for Neuron devices that use power-of-2 padding to stabilize tensor shapes and reduce recompilations. Feedback highlights several issues: setup.py incorrectly pins a non-existent Torch version and renames the package for a fork, while the code contains an invalid Torch version check (2.7.0). Furthermore, the FSDP2 module-ignoring logic fails if ignored_modules is a string, and the implementation uses the deprecated torch.ByteStorage instead of torch.frombuffer.
| "psutil", | ||
| "pyyaml", | ||
| "torch>=2.0.0", | ||
| "torch==2.9.1", |
There was a problem hiding this comment.
| name="accelerate-kt", | ||
| version="1.14.0.post1", | ||
| description="Accelerate", | ||
| long_description=open("README.md", encoding="utf-8").read(), | ||
| long_description_content_type="text/markdown", | ||
| keywords="deep learning", | ||
| license="Apache", | ||
| author="The Hugging Face team", | ||
| author_email="transformers@huggingface.co", | ||
| url="https://github.com/huggingface/accelerate", | ||
| url="https://github.com/kvcache-ai/accelerate", |
There was a problem hiding this comment.
| if self.state.fsdp_plugin.ignored_modules is None: | ||
| self.state.fsdp_plugin.ignored_modules = [] | ||
| for wrapper in kt_wrappers: | ||
| experts_attr = getattr(wrapper, "_experts_attr", "experts") | ||
| experts = getattr(wrapper, experts_attr, None) | ||
| if experts is not None and experts not in self.state.fsdp_plugin.ignored_modules: | ||
| self.state.fsdp_plugin.ignored_modules.append(experts) |
There was a problem hiding this comment.
The logic for updating ignored_modules assumes it is a list. However, FullyShardedDataParallelPlugin.ignored_modules can also be a string (regex pattern). If it is a string, line 1733 will raise a TypeError and line 1734 will raise an AttributeError. Please ensure ignored_modules is handled correctly when it is a string (e.g., by converting the regex to a list of modules first).
| if (not param.requires_grad) and (not param.is_floating_point()) and (not param.is_complex()): | ||
| incompatible_params4bit.add(param) | ||
|
|
||
| if incompatible_params4bit and is_torch_version(">=", "2.7.0"): |
| byte_storage = torch.ByteStorage._from_buffer(raw_bytes) | ||
| input_tensor = torch.ByteTensor(byte_storage).to(device) |
There was a problem hiding this comment.
torch.ByteStorage is deprecated. Consider using torch.frombuffer which is more idiomatic and avoids using internal methods like _from_buffer.
| byte_storage = torch.ByteStorage._from_buffer(raw_bytes) | |
| input_tensor = torch.ByteTensor(byte_storage).to(device) | |
| input_tensor = torch.frombuffer(raw_bytes, dtype=torch.uint8).to(device) |
What does this PR do?
support full finetune for kt
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.