[WIP]Full fine tune support by poryfly · Pull Request #3 · kvcache-ai/accelerate

poryfly · 2026-05-22T09:02:18Z

What does this PR do?

support full finetune for kt

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

gemini-code-assist

Code Review

This pull request integrates KTransformers (KT) support into Accelerate, introducing the KTransformersPlugin and logic to handle KT expert modules within FSDP2. It also implements optimized communication primitives for Neuron devices that use power-of-2 padding to stabilize tensor shapes and reduce recompilations. Feedback highlights several issues: setup.py incorrectly pins a non-existent Torch version and renames the package for a fork, while the code contains an invalid Torch version check (2.7.0). Furthermore, the FSDP2 module-ignoring logic fails if ignored_modules is a string, and the implementation uses the deprecated torch.ByteStorage instead of torch.frombuffer.

gemini-code-assist · 2026-05-22T09:05:01Z

        "psutil",
        "pyyaml",
-        "torch>=2.0.0",
+        "torch==2.9.1",


Pinning torch to version 2.9.1 is problematic as this version does not exist yet (current stable is 2.5.x). This will prevent the package from being installed in standard environments. Consider using a valid version range.

Suggested change

"torch==2.9.1",

"torch>=2.0.0",

gemini-code-assist · 2026-05-22T09:05:02Z

+    name="accelerate-kt",
+    version="1.14.0.post1",
    description="Accelerate",
    long_description=open("README.md", encoding="utf-8").read(),
    long_description_content_type="text/markdown",
    keywords="deep learning",
    license="Apache",
    author="The Hugging Face team",
    author_email="transformers@huggingface.co",
-    url="https://github.com/huggingface/accelerate",
+    url="https://github.com/kvcache-ai/accelerate",


Renaming the package to accelerate-kt and changing the URL to a fork repository appears to be intended for a private fork. If this pull request is intended for the main accelerate repository, these changes should be reverted to maintain consistency with the upstream project.

gemini-code-assist · 2026-05-22T09:05:02Z

+                if self.state.fsdp_plugin.ignored_modules is None:
+                    self.state.fsdp_plugin.ignored_modules = []
+                for wrapper in kt_wrappers:
+                    experts_attr = getattr(wrapper, "_experts_attr", "experts")
+                    experts = getattr(wrapper, experts_attr, None)
+                    if experts is not None and experts not in self.state.fsdp_plugin.ignored_modules:
+                        self.state.fsdp_plugin.ignored_modules.append(experts)


The logic for updating ignored_modules assumes it is a list. However, FullyShardedDataParallelPlugin.ignored_modules can also be a string (regex pattern). If it is a string, line 1733 will raise a TypeError and line 1734 will raise an AttributeError. Please ensure ignored_modules is handled correctly when it is a string (e.g., by converting the regex to a list of modules first).

gemini-code-assist · 2026-05-22T09:05:02Z

+            if (not param.requires_grad) and (not param.is_floating_point()) and (not param.is_complex()):
+                incompatible_params4bit.add(param)
+
+    if incompatible_params4bit and is_torch_version(">=", "2.7.0"):


Checking for torch version >= 2.7.0 seems incorrect as this version is not yet released. Please verify if this should be a lower version (e.g., 2.4.0 or 2.5.0) where FSDP2 features were stabilized.

gemini-code-assist · 2026-05-22T09:05:02Z

+    byte_storage = torch.ByteStorage._from_buffer(raw_bytes)
+    input_tensor = torch.ByteTensor(byte_storage).to(device)


torch.ByteStorage is deprecated. Consider using torch.frombuffer which is more idiomatic and avoids using internal methods like _from_buffer.

Suggested change

byte_storage = torch.ByteStorage._from_buffer(raw_bytes)

input_tensor = torch.ByteTensor(byte_storage).to(device)

input_tensor = torch.frombuffer(raw_bytes, dtype=torch.uint8).to(device)

full-fine-tune-support for kt

5981edd

gemini-code-assist Bot reviewed May 22, 2026

View reviewed changes

poryfly changed the base branch from main to sft-v5 May 22, 2026 09:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP]Full fine tune support#3

[WIP]Full fine tune support#3
poryfly wants to merge 1 commit into
kvcache-ai:sft-v5from
poryfly:full-fine-tune-support

poryfly commented May 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

gemini-code-assist Bot May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		byte_storage = torch.ByteStorage._from_buffer(raw_bytes)
		input_tensor = torch.ByteTensor(byte_storage).to(device)

	byte_storage = torch.ByteStorage._from_buffer(raw_bytes)
	input_tensor = torch.ByteTensor(byte_storage).to(device)
	input_tensor = torch.frombuffer(raw_bytes, dtype=torch.uint8).to(device)

Conversation

poryfly commented May 22, 2026

What does this PR do?

Before submitting

Who can review?

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant