refactor Flux transformer to use scanned blocks, dynamic checkpointing, and decoupled projections by prishajain1 · Pull Request #417 · AI-Hypercomputer/maxdiffusion

prishajain1 · 2026-06-12T06:20:03Z

Overview

This PR refactors the Flux model architecture in MaxDiffusion to support scanned blocks (nn.scan) for double and single blocks, implements configurable gradient checkpointing (rematerialization) policies, and updates the weights loader to support loading pretrained checkpoints under the scanned format.

Key Changes

Decoupled Fused Projections: Decoupled the projection layers (implementing the MlpAndOutputBlock wrapper) to eliminate redundant recomputation of attention and projection outputs.
QKV Slicing Refactoring: Refactored the QKV projection slicing logic to use jnp.split across Flux transformer blocks for cleaner layout constraints.
Scanned Block Architecture: Migrated Flux Double and Single Transformer Blocks to use nn.scan to optimize compiler tracing and step execution speed on TPUs.
Dynamic Gradient Checkpointing: Added FLUX_OPTIMIZED to GradientCheckpointType to allow configuring block-specific rematerialization policies dynamically via configuration files instead of being hardcoded.
Stacked Weights Loading: Updated the weights loader (util.py) to slice, group, and stack PyTorch checkpoint weights along axis 0 to match the expected format of nn.scan layers.

github-actions · 2026-06-12T06:20:12Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

github-actions · 2026-06-12T06:32:23Z

🤖 Hi @prishajain1, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-06-12T06:35:20Z

🤖 I'm sorry @prishajain1, but I was unable to process your request. Please see the logs for more details.

github-actions · 2026-06-13T15:07:18Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-06-13T17:02:39Z

🤖 Hi @Perseus14, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

github-actions · 2026-06-13T17:09:00Z

🤖 I'm sorry @Perseus14, but I was unable to process your request. Please see the logs for more details.

entrpn

@prishajain1 did you update the readme with the best values based on remat?

…ng, and weight loading improvements

prishajain1 · 2026-06-19T16:32:17Z

@prishajain1 did you update the readme with the best values based on remat?

Updated

prishajain1 requested a review from entrpn as a code owner June 12, 2026 06:20

prishajain1 marked this pull request as draft June 12, 2026 06:20

prishajain1 force-pushed the prisha/flux_training branch 2 times, most recently from 4696256 to 11ddfef Compare June 12, 2026 06:29

prishajain1 marked this pull request as ready for review June 12, 2026 06:31

prishajain1 added the gemini-review label Jun 12, 2026

prishajain1 removed the gemini-review label Jun 12, 2026

prishajain1 requested a review from Perseus14 June 12, 2026 08:59

Perseus14 requested changes Jun 12, 2026

View reviewed changes

prishajain1 force-pushed the prisha/flux_training branch 5 times, most recently from f58fb9e to b53d7d2 Compare June 13, 2026 13:40

prishajain1 requested a review from Perseus14 June 13, 2026 13:42

Perseus14 assigned prishajain1 Jun 13, 2026

Perseus14 added the gemini-review label Jun 13, 2026

Perseus14 added gemini-review and removed gemini-review labels Jun 13, 2026

Perseus14 previously approved these changes Jun 17, 2026

View reviewed changes

entrpn reviewed Jun 17, 2026

View reviewed changes

Comment thread src/maxdiffusion/configs/base_flux_dev.yml Outdated

github-actions Bot added the pull ready label Jun 17, 2026

Flux training: Implement scanned blocks, dynamic gradient checkpointi…

c7e492a

…ng, and weight loading improvements

prishajain1 dismissed Perseus14’s stale review via c7e492a June 19, 2026 16:31

prishajain1 force-pushed the prisha/flux_training branch from b53d7d2 to c7e492a Compare June 19, 2026 16:31

prishajain1 requested a review from entrpn June 20, 2026 05:36

Conversation

prishajain1 commented Jun 12, 2026

Overview

Key Changes

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

github-actions Bot commented Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

github-actions Bot commented Jun 13, 2026

Uh oh!

entrpn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

prishajain1 commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants