Quant in checkpoint dtype by metascroy · Pull Request #18781 · pytorch/executorch

metascroy · 2026-04-08T21:18:58Z

Switches order in etLLM so we quantize in checkpoint dtype and then cast to dtype-override. This can prevent underflowing on scales.

Also exposes ability to turn HQQ on/off.

Export:

python -m extension.llm.export.export_llm \   
  base.model_class=phi_4_mini \
  base.params=examples/models/phi_4_mini/config/config.json \  model.use_kv_cache=true \     
  model.use_sdpa_with_kv_cache=true \
  model.dtype_override=fp32 \  export.output_dir=/tmp/phi_4_mini_no_hqq \  export.output_name=model.pte \  export.max_seq_length=2048 \  export.max_context_length=2048 \  quantization.qmode=8da4w \  quantization.group_size=32 "quantization.embedding_quantize='8,0'" quantization.use_hqq=False \                           
  backend.xnnpack.enabled=true \
  backend.xnnpack.extended_ops=true

Phi4 output:

<|im_start|>system
You are a highly capable, helpful, and honest AI assistant designed to provide clear, accurate, and thoughtful responses to a wide range of questions. Your primary goal is to assist users by offering information, explanations, and guidance in a manner that is respectful, unbiased, and safe. Always strive to be as helpful as possible, but never provide content that is harmful, unethical, offensive, or illegal. If a question is unclear, nonsensical, or based on incorrect premises, politely explain the issue rather than attempting to answer inaccurately. If you do not know the answer to a question, it is better to admit uncertainty than to provide false or misleading information. When appropriate, include examples, analogies, or step-by-step reasoning to enhance understanding. Your responses should be positive, inclusive, and supportive, fostering a constructive and informative interaction.<|im_end|>
<|im_start|>user
Please answer the following question in detail and provide relevant context, examples, and explanations where possible: What are some of the most important considerations when designing a machine learning system for real-world applications? Discuss potential challenges, best practices, and how to ensure ethical and responsible use.<|im_end|>
<|im_start|>assistant
Designing a machine learning system for real-world applications involves various considerations to ensure the system is effective, fair, and secure. Some of the most important considerations include data quality and sourcing, model choice and design, evaluation and validation, interpretability and transparency, and ensuring fairness and avoiding biases.

Data quality and sourcing involve ensuring data is of high quality, representative of the target application, and properly curated and preprocessed to remove noise and biases.

Model choice and design involve selecting an appropriate model for the application, understanding the strengths and limitations of different models, and understanding the application domain and data.

Model evaluation and validation involve properly training and tuning the model on a training set and properly validating and testing the model on a separate validation set to avoid data leakage and

Related work: improvement in torchao's HQQ algorithm that helps with Phi4's model distribution: pytorch/ao#4259

pytorch-bot · 2026-04-08T21:19:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18781

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Rolling out OSDC (ARC) runners on pull workflow for PyTorch trunk commits

❌ 2 New Failures, 1 Pending, 2 Unrelated Failures

As of commit 6e67701 with merge base 36e8ed9 ():

NEW FAILURES - The following jobs have failed:

pull / test-samsung-models-linux / linux-job (gh)
test_w2l_fp16
pull / unittest-arm-backend-with-no-deps (test_pytest_ops_tosa) / linux-job (gh)
RuntimeError: Command docker exec -t 263cde0d7d4b5d00ba721d6a1db583886415d3a426555a864c2e01d0a4b54fcd /exec failed with exit code 1

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

pull / unittest / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.
pull / unittest-editable / windows / windows-job (gh) (trunk failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2026-04-08T21:19:43Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-codesync · 2026-04-08T21:22:41Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this in D100066455.

SS-JIA

LGTM!!

Switches order in etLLM so we quantize in checkpoint dtype and then cast to dtype-override. This can prevent underflowing on scales. Also exposes ability to turn HQQ on/off. Export: ``` python -m extension.llm.export.export_llm \ base.model_class=phi_4_mini \ base.params=examples/models/phi_4_mini/config/config.json \ model.use_kv_cache=true \ model.use_sdpa_with_kv_cache=true \ model.dtype_override=fp32 \ export.output_dir=/tmp/phi_4_mini_no_hqq \ export.output_name=model.pte \ export.max_seq_length=2048 \ export.max_context_length=2048 \ quantization.qmode=8da4w \ quantization.group_size=32 "quantization.embedding_quantize='8,0'" quantization.use_hqq=False \ backend.xnnpack.enabled=true \ backend.xnnpack.extended_ops=true ``` Phi4 output: ``` <|im_start|>system You are a highly capable, helpful, and honest AI assistant designed to provide clear, accurate, and thoughtful responses to a wide range of questions. Your primary goal is to assist users by offering information, explanations, and guidance in a manner that is respectful, unbiased, and safe. Always strive to be as helpful as possible, but never provide content that is harmful, unethical, offensive, or illegal. If a question is unclear, nonsensical, or based on incorrect premises, politely explain the issue rather than attempting to answer inaccurately. If you do not know the answer to a question, it is better to admit uncertainty than to provide false or misleading information. When appropriate, include examples, analogies, or step-by-step reasoning to enhance understanding. Your responses should be positive, inclusive, and supportive, fostering a constructive and informative interaction.<|im_end|> <|im_start|>user Please answer the following question in detail and provide relevant context, examples, and explanations where possible: What are some of the most important considerations when designing a machine learning system for real-world applications? Discuss potential challenges, best practices, and how to ensure ethical and responsible use.<|im_end|> <|im_start|>assistant Designing a machine learning system for real-world applications involves various considerations to ensure the system is effective, fair, and secure. Some of the most important considerations include data quality and sourcing, model choice and design, evaluation and validation, interpretability and transparency, and ensuring fairness and avoiding biases. Data quality and sourcing involve ensuring data is of high quality, representative of the target application, and properly curated and preprocessed to remove noise and biases. Model choice and design involve selecting an appropriate model for the application, understanding the strengths and limitations of different models, and understanding the application domain and data. Model evaluation and validation involve properly training and tuning the model on a training set and properly validating and testing the model on a separate validation set to avoid data leakage and ``` Related work: improvement in torchao's HQQ algorithm that helps with Phi4's model distribution: pytorch/ao#4259

metascroy requested a review from lucylq as a code owner April 8, 2026 21:19

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 8, 2026

metascroy added 2 commits April 8, 2026 14:40

up

84de2c4

up

3357821

metascroy force-pushed the quant-in-checkpoint-dtype branch from 77c6b50 to 3357821 Compare April 9, 2026 22:17

metascroy requested review from larryliu0820 and mergennachin as code owners April 9, 2026 22:17

metascroy requested a review from SS-JIA April 9, 2026 22:21

Merge branch 'main' into quant-in-checkpoint-dtype

6e67701

SS-JIA approved these changes Apr 10, 2026

View reviewed changes

metascroy merged commit 3e8fc7b into main Apr 10, 2026
184 of 190 checks passed

metascroy deleted the quant-in-checkpoint-dtype branch April 10, 2026 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quant in checkpoint dtype#18781

Quant in checkpoint dtype#18781
metascroy merged 3 commits intomainfrom
quant-in-checkpoint-dtype

metascroy commented Apr 8, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 8, 2026

Uh oh!

meta-codesync Bot commented Apr 8, 2026

Uh oh!

SS-JIA left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

metascroy commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18781

❗ 1 Active SEVs

❌ 2 New Failures, 1 Pending, 2 Unrelated Failures

Uh oh!

github-actions Bot commented Apr 8, 2026

This PR needs a release notes: label

Uh oh!

meta-codesync Bot commented Apr 8, 2026

Uh oh!

SS-JIA left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

metascroy commented Apr 8, 2026 •

edited

Loading

pytorch-bot Bot commented Apr 8, 2026 •

edited

Loading

This PR needs a `release notes:` label