[Prototype] Masked Diffusion Training with Shift by nitsanluke · Pull Request #294 · ServiceNow/Fast-LLM

nitsanluke · 2025-06-10T19:42:23Z

✨ Description

This PR adds functionality to train a mask diffusion model. It sets up initial diffusion loss based on Llada with a shift of 1.

Resolving Prototype masked diffusion modeling support #208

🔍 Type of change

Select all that apply:

🐛 Bug fix (non-breaking change that addresses a specific issue)
🚀 New feature (non-breaking change that adds functionality)
⚠️ Breaking change (a change that could affect existing functionality)
📈 Performance improvement/optimization (improves speed, memory usage, or efficiency)
🛠️ Code refactor (non-functional changes that improve code readability, structure, etc.)
📦 Dependency bump (updates dependencies, including Dockerfile or package changes)
📝 Documentation change (updates documentation, including new content or typo fixes)
🔧 Infrastructure/Build change (affects build process, CI/CD, or dependencies)

📝 Changes

Added an additional LM-head for Masked diffusion
Setting up diffusion params for data sampling

✅ Checklist

Make sure the following tasks are completed before submitting the PR:

General

📜 I have read and followed the contributing guidelines.
🏷️ I am using a clear and descriptive PR title that summarizes the key change or feature introduced.
🎉 The functionality is complete, and I have tested the changes.
📝 I have updated the documentation if needed.
⚠️ The change does not introduce any new issues (e.g., runtime warnings, type checker errors, linting problems, unhandled edge cases).
🧩 I have commented my code, especially in hard-to-understand areas.

Dependencies and Configuration

🐋 I have updated the Docker configuration or dependencies, if applicable.
🔄 I have ensured compatibility with the existing setup after dependency changes.

Testing

🧪 I have added or updated tests to cover my changes.
✔️ New and existing tests pass locally with my changes.
🚦 I have tested these changes on GPUs and verified training stability.
🏋️ I have tested the changes on realistic training workloads, if applicable.

Performance Impact

📊 I have run benchmarks where applicable to evaluate the performance impact.
✅ The benchmarks show no performance regression.
🚀 The benchmarks indicate a potential performance improvement.
⚠️ The benchmarks indicate a potential performance degradation.
📈 I have provided benchmark results and detailed any performance impact below, if applicable.

📊 Performance Impact Details

If there is any impact on performance, describe it and provide benchmark results, if applicable:

🗒️ Additional Notes

This would be followed up with the initial Block diffusion style training. #312

gopeshh and others added 18 commits April 21, 2025 12:01

changes for basic LLaDA style diffusion masking support

db28a11

tests for masking and MLM loss

3d44671

temp fixes

46dd535

tmp fix

aa8ab4d

including masked diffusion training setup

9f348e7

adding weighted loss

cdc9c96

clean up

d71e693

add loss weight

072e6c4

adding updates to p_mask

6127544

update error mgs

1cf15a8

add comments and clean up

f7a46d7

Merge branch 'main' into luke/gopeshh/masked_diffusion

b80024e

fx merge errors

01a683b

fix merge issues

ba913e1

register mask config

6c0c72d

Merge branch 'main' into luke/gopeshh/masked_diffusion

26aa13a

fx merge issues

3245496

Merge branch 'main' into luke/gopeshh/masked_diffusion

5198310

nitsanluke changed the title ~~WIP: Masked diffusion~~ Masked diffusion Jun 23, 2025

nitsanluke changed the title ~~Masked diffusion~~ Masked Diffusion Training Jun 23, 2025

nitsanluke added 6 commits June 23, 2025 19:06

fix labels

4ad0bc1

drop old tests

acacfe3

tmp fix

2a06ed4

fx tests

dd68d28

update missing rotery export

e0a7c80

reset attention_factor to old behaviour

0306e36

nitsanluke force-pushed the luke/gopeshh/masked_diffusion branch from 3fdb901 to 0306e36 Compare June 25, 2025 22:32

nitsanluke added 3 commits June 27, 2025 14:58

setting attention to _flash_attn_func

6bcb38d

debug

093aa33

avg only non-zero loss

141ed88

nitsanluke added 3 commits June 28, 2025 14:24

debug remove

8bb00ed

remove non-zero weight

38737d4

revert to mean loss on all tokens

b043efe

nitsanluke changed the title ~~Masked Diffusion Training~~ Masked Diffusion Training with Shift Jul 4, 2025

nitsanluke added 8 commits July 4, 2025 16:27

tmp

0c221fd

adding fused attn

d29af35

include ar+masking

aa0d08c

Merge branch 'main' into luke/gopeshh/masked_diffusion

014b92e

main update cr loss

632dc7c

include ar+diff option as a seperate style

0b469fb

minor

068138f

attn verificiation checks

a573cd7

nitsanluke force-pushed the luke/gopeshh/masked_diffusion branch from 29057ae to a573cd7 Compare July 8, 2025 13:48

nitsanluke added 11 commits July 8, 2025 17:02

tmp updates

fa952fe

temp

1d687ea

adding updates loss avg on none-zero

2eb5e84

avg across all tokens

f11c07f

update loss

2afd95b

Merge branch 'main' into luke/gopeshh/masked_diffusion

e9af787

temp fixes

4fd1ba5

clean up

e92f567

adding tests and cleanup

a312607

adding masking test-case and data cleanup

e1fcc9f

cleaup and move attention to preprocessing

0e11f8b

jlamypoirier changed the title ~~Masked Diffusion Training with Shift~~ [Prototype] Masked Diffusion Training with Shift Nov 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Prototype] Masked Diffusion Training with Shift#294

[Prototype] Masked Diffusion Training with Shift#294
nitsanluke wants to merge 49 commits intomainfrom
luke/gopeshh/masked_diffusion

nitsanluke commented Jun 10, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nitsanluke commented Jun 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✨ Description

🔍 Type of change

📝 Changes

✅ Checklist

General

Dependencies and Configuration

Testing

Performance Impact

📊 Performance Impact Details

🗒️ Additional Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nitsanluke commented Jun 10, 2025 •

edited

Loading