Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/18781
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 1 Pending, 2 Unrelated FailuresAs of commit 6e67701 with merge base 36e8ed9 ( NEW FAILURES - The following jobs have failed:
BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
|
@metascroy has imported this pull request. If you are a Meta employee, you can view this in D100066455. |
77c6b50 to
3357821
Compare
Switches order in etLLM so we quantize in checkpoint dtype and then cast to dtype-override. This can prevent underflowing on scales. Also exposes ability to turn HQQ on/off. Export: ``` python -m extension.llm.export.export_llm \ base.model_class=phi_4_mini \ base.params=examples/models/phi_4_mini/config/config.json \ model.use_kv_cache=true \ model.use_sdpa_with_kv_cache=true \ model.dtype_override=fp32 \ export.output_dir=/tmp/phi_4_mini_no_hqq \ export.output_name=model.pte \ export.max_seq_length=2048 \ export.max_context_length=2048 \ quantization.qmode=8da4w \ quantization.group_size=32 "quantization.embedding_quantize='8,0'" quantization.use_hqq=False \ backend.xnnpack.enabled=true \ backend.xnnpack.extended_ops=true ``` Phi4 output: ``` <|im_start|>system You are a highly capable, helpful, and honest AI assistant designed to provide clear, accurate, and thoughtful responses to a wide range of questions. Your primary goal is to assist users by offering information, explanations, and guidance in a manner that is respectful, unbiased, and safe. Always strive to be as helpful as possible, but never provide content that is harmful, unethical, offensive, or illegal. If a question is unclear, nonsensical, or based on incorrect premises, politely explain the issue rather than attempting to answer inaccurately. If you do not know the answer to a question, it is better to admit uncertainty than to provide false or misleading information. When appropriate, include examples, analogies, or step-by-step reasoning to enhance understanding. Your responses should be positive, inclusive, and supportive, fostering a constructive and informative interaction.<|im_end|> <|im_start|>user Please answer the following question in detail and provide relevant context, examples, and explanations where possible: What are some of the most important considerations when designing a machine learning system for real-world applications? Discuss potential challenges, best practices, and how to ensure ethical and responsible use.<|im_end|> <|im_start|>assistant Designing a machine learning system for real-world applications involves various considerations to ensure the system is effective, fair, and secure. Some of the most important considerations include data quality and sourcing, model choice and design, evaluation and validation, interpretability and transparency, and ensuring fairness and avoiding biases. Data quality and sourcing involve ensuring data is of high quality, representative of the target application, and properly curated and preprocessed to remove noise and biases. Model choice and design involve selecting an appropriate model for the application, understanding the strengths and limitations of different models, and understanding the application domain and data. Model evaluation and validation involve properly training and tuning the model on a training set and properly validating and testing the model on a separate validation set to avoid data leakage and ``` Related work: improvement in torchao's HQQ algorithm that helps with Phi4's model distribution: pytorch/ao#4259
Switches order in etLLM so we quantize in checkpoint dtype and then cast to dtype-override. This can prevent underflowing on scales.
Also exposes ability to turn HQQ on/off.
Export:
Phi4 output:
Related work: improvement in torchao's HQQ algorithm that helps with Phi4's model distribution: pytorch/ao#4259