
As we build more capable and more personalized AI, reliability, security, and user protections are more important than ever.
Advanced models require an advanced approach to safety — one that scales with the technology. Today, we’re detailing that work: our updated Advanced AI Scaling Framework, our upcoming Safety & Preparedness Report for Muse Spark, and new advances in how our models reason about safety from the ground up, so that as our AI becomes more capable, our protections keep pace.
Today, we’re building on our original Frontier AI Framework and publishing a significantly updated and more rigorous version: the Advanced AI Scaling Framework. This update broadens the types of risks we evaluate, strengthens how we make deployment decisions, and introduces new Safety & Preparedness Reports. More specifically, this Framework outlines how we identify and assess the most severe and emerging risks, including chemical and biological, cybersecurity, and a new section to evaluate risks around loss of control. As models become more advanced, we’re evaluating how they perform when given greater autonomy and whether the controls around that behavior work as intended. These standards apply across our frontier deployments, whether they’re open, controlled API access, or closed models.
In practice, this also means mapping potential risks, evaluating models before and after safeguards are applied to confirm they work in the real world, and only deploying models when they meet the standards set by our Framework. For people who use Meta AI across our apps, this means the models powering their experience have been evaluated across a broad spectrum of risks before we make them available.
While our updated Framework strengthens the standards and safeguards for our most capable models, our new Safety & Preparedness Reports will show how we are meeting them. These reports will detail our risk assessments, evaluation results, the rationale behind our deployment decisions, and any limitations we’re still working to address. This transparency means we will share what we found, how we tested our models, where our evaluations fell short, and how we closed those gaps.
For Muse Spark, we conducted extensive safety evaluations before deployment. Because of its advanced reasoning capabilities, we evaluated the model before and after applying protections — testing not just for the most serious risks like cybersecurity and chemical and biological threats, but also against our long-standing safety policies, which are designed to prevent harms and misuse like violence, child safety violations, and criminal wrongdoing, and our policies to ensure ideological balance.
Our evaluation approach is multilayered by design, and it starts before a model is deployed. We test against thousands of scenarios specifically designed to find weaknesses, track how often those attempts succeed, and work to drive that number as low as possible. Because no evaluation is exhaustive, we also monitor live traffic with automated systems designed to spot unexpected issues so we can address them quickly. The results demonstrate strong safeguards across all the risk categories we measured. Our evaluations also showed that Muse Spark is at the frontier in avoiding ideological bias in model responses.
We also evaluated whether the model could act autonomously in ways that could be difficult to control, and our evaluations confirm it does not possess the level of autonomous capability needed to pose those risks. Our upcoming Safety & Preparedness Report details the specific evaluations behind this finding in addition to all of our evaluation results – what we tested, what we found, and we’ll be sharing more soon.
These protections are built in at every stage — from filtering the data the model learns from, to safety-focused training, to guardrails that run at the product level. And because our protections need to evolve as the sophistication of our models improves, this work will never be done.
In particular, Muse Spark is more capable than our previous generation of models, and that capability is what makes a fundamentally new approach to governing the model possible. Earlier approaches relied on teaching models to handle specific scenarios one by one, for instance, training them to refuse to respond or to redirect to a trusted source. That approach worked, but was difficult to scale. Because Muse Spark can reason, we’ve evolved our approach: we’ve translated our trust and safety guidelines across areas like content and conversational safety, response quality, and handling different viewpoints into clear, testable principles. We also trained the model on whysomething is safe – not just on the rules, but also the reasons behind the rules. This means the model is better equipped to handle novel situations that rules-based systems might have failed to anticipate.
This work doesn’t replace human oversight; it elevates it. Our teams design the principles that guide model behavior, rigorously validate these principles against real-world scenarios, and layer in additional guardrails to catch things the model may still miss. The result is protections that are applied more broadly and consistently, and that improve as the model’s reasoning improves.
As we make significant advancements to Meta AI and deploy our most capable models, Safety & Preparedness Reports will show how we’re evaluating and managing risk at every step. We’ll continue to invest in safeguards, testing, and research, so people can rely on an AI experience with built-in protections designed to help keep them safe.
Our approach
Latest news
Foundational models