Skip to content

Qualcomm AI Engine Direct - [LLM Quantization] Support dataloader-based prefill#20273

Open
DannyYuyang-quic wants to merge 1 commit into
pytorch:mainfrom
CodeLinaro:dev1/danny/remove_token_gen_from_calib
Open

Qualcomm AI Engine Direct - [LLM Quantization] Support dataloader-based prefill#20273
DannyYuyang-quic wants to merge 1 commit into
pytorch:mainfrom
CodeLinaro:dev1/danny/remove_token_gen_from_calib

Conversation

@DannyYuyang-quic

Copy link
Copy Markdown
Contributor

Summary

Calibration dataset:

  • Replace HF AutoModel token generation with direct tokenization of curated corpus (llm eval tasks or JSON samples)
  • Add default calibration samples: assets/samples/{text,vision,audio}.json
  • Support Dataloader-based calibration

Architecture:

  • Introduce PTQStrategy + DecoderInference as unified calibration forward-pass primitives; remove decoder_utils.graph_module_inference
  • Refactor dataset.py into dataset/ package: builders, collators, config, datasets, loaders, preprocessors, schema

Test plan

Test CI:

  • ExampleLLMScript
  • TestExampleMultimodalityScript

@pytorch-bot

pytorch-bot Bot commented Jun 15, 2026

Copy link
Copy Markdown

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20273

Note: Links to docs will display an error until the docs builds have been completed.

⚠️ 16 Awaiting Approval, 1 Pending

As of commit 01574e1 with merge base e88fd04 (image):

AWAITING APPROVAL - The following workflows need approval before CI can run:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 15, 2026
@DannyYuyang-quic

DannyYuyang-quic commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@psiddh Hi, this PR is to support Dataloader-based calibration in MLLMs. With this PR, LLMs can be calibrated using the full input sequence at once, eliminating the need for iterative autoregressive (AR) processing over long sequences. For example, instead of performing hundreds of iterations for a sequence length of 1024, calibration can now be completed in a single forward pass.

Below is a comparison between AR iterative calibration and dataloader-based calibration across different models:

MLLMs metrics

model name AR iterative calibration
Time(sec)/PPL
Dataloader-based calibration
Time(sec)/PPL
speedup
gemma-2b 1216 / 16.588 100 /16.609 12.16x
gemma2-2b 1827 / 11.504 123 / 11.517 14.85x
gemma3-1b 907 / 23.052 81 / 22.722 10.67x
glm-1_5b 963 / 20.180 85 / 20.041 11.32x
llama3_2-3b 2286 / 10.745 138 / 10.498 16.56x
phi_4_mini 2824 / 13.437 180 / 13.605 15.68x
qwen2_5-0_5b 486 / 13.951 77 / 13.813 6.31x
qwen2_5-1_5b 1068 / 9.714 116 / 9.669 9.2x
qwen3-1_7b 1478 / 14.756 111 / 14.913 13.31x
smollm2_135m 399 / 19.797 80/19.706 4.98x
smollm3-3b 2065 / 8.345 132 / 8.989 15.64x
smolvlm_500m_instruct 170 / - 86 / - 1.97x
internvl3_1b 170 / - 75 / - 2.26 x
granite_speech_3_3-2b 447 / - 179 / - 2.49x
llama3_2-1b 1237 / 14.973 883 / 15.647 1.4x
qwen3-0_6b 1013 / 19.740 408 / 19.912 2.48x

cc: @shewu-quic, @haowhsu-quic

@DannyYuyang-quic DannyYuyang-quic changed the title Qualcomm AI Engine Direct - Support dataloader-based prefill quantize Qualcomm AI Engine Direct - [LLM Quantization] Support dataloader-based prefill Jun 15, 2026
@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/remove_token_gen_from_calib branch from 28dd82b to d80b723 Compare June 15, 2026 07:46
Calibration dataset:
- Replace HF AutoModel token generation with direct tokenization of
  curated corpus (llm eval tasks or JSON samples)
- Add default calibration samples: assets/samples/{text,vision,audio}.json

Architecture:
- Introduce PTQStrategy + DecoderInference as unified calibration
  forward-pass primitives; remove decoder_utils.graph_module_inference
- Refactor dataset.py into dataset/ package:
  builders, collators, config, datasets, loaders, preprocessors, schema
@DannyYuyang-quic DannyYuyang-quic force-pushed the dev1/danny/remove_token_gen_from_calib branch from d80b723 to 01574e1 Compare June 15, 2026 07:54
@DannyYuyang-quic

Copy link
Copy Markdown
Contributor Author

@pytorchbot label "release notes: qualcomm"

@pytorch-bot pytorch-bot Bot added the release notes: qualcomm Changes to the Qualcomm backend delegate label Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: qualcomm Changes to the Qualcomm backend delegate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant