Meta has defined its AI policy to govern its future development, saying it will stop developing AI models it deems a “critical risk.”
Meta has been quietly emerging as one of the leading AI companies, but it is taking a far different approach than many of its competitors. While most companies are releasing proprietary AI models, Meta has open-sourced its Llama family of models.
One of the biggest challenges facing Meta, as well as the rest of the industry, is how to develop AI models that are safe and cannot be used in a harmful manner. Meta is a signatory of the Frontier AI Safety Commitments, and its new Frontier AI Framework is aligned with that agreement.
In the new policy, Meta outlines what its goals are, what catastrophic outcomes it must work to prevent.
We start by identifying a set of catastrophic outcomes we must strive to prevent, and then map the potential causal pathways that could produce them. When developing these outcomes, we’ve considered the ways in which various actors, including state level actors, might use/misuse frontier AI. We describe threat scenarios that would be potentially sufficient to realize the catastrophic outcome, and we define our risk thresholds based on the extent to which a frontier AI would uniquely enable execution of any of our threat scenarios.
By anchoring thresholds on outcomes, we aim to create a precise and somewhat durable set of thresholds, because while capabilities will evolve as the technology develops, the outcomes we want to prevent tend to be more enduring. This is not to say that our outcomes are fixed. It is possible that as our understanding of frontier AI improves, outcomes or threat scenarios might be removed, if we can determine that they no longer meet our criteria for inclusion. We also may need to add new outcomes in the future. Those outcomes might be in entirely novel risk domains, potentially as a result of novel model capabilities, or they might reflect changes to the threat landscape in existing risk domains that bring new kinds of threat actors into scope. This accounts for the ways in which frontier AI might introduce novel harms, as well its potential to increase the risk of catastrophe in known risk domains.
An outcomes-led approach also enables prioritization. This systematic approach will allow us to identify the most urgent catastrophic outcomes – i.e., cybersecurity and chemical and biological weapons risks – and focus our efforts on avoiding them rather than spreading efforts across a wide range of theoretical risks from particular capabilities that may not plausibly be presented by the technology we are actually building.
Meta breaks down exactly how it defines a critical risk AI and what action it will take in response.

We define our risk thresholds based on the extent to which a frontier AI would uniquely enable execution of any of our threat scenarios. A frontier AI is assigned to the critical risk threshold if we assess that it would uniquely enable execution of a threat scenario. If a frontier AI is assessed to have reached the critical risk threshold and cannot be mitigated, we will stop development and implement the measures outlined in Table 1. Our high and moderate risk thresholds are defined in terms of the level of uplift a frontier AI provides towards realising a threat scenario. We will develop these models in line with the processes outlined in this Framework, and implement the measures outlined in Table 1.

Meta also says it will evaluate—and respond accordingly—to the possibility that high and moderate risk AI models could advance to a high-risk model.
We define our thresholds based on the extent to which frontier AI would uniquely enable the execution of any of the threat scenarios we have identified as being potentially sufficient to produce a catastrophic outcome. If a frontier AI is assessed to have reached the critical risk threshold and cannot be mitigated, we will stop development and implement the measures outlined in Table 1. Our high and moderate risk thresholds are defined in terms of the level of uplift a model provides towards realising a threat scenario. We will develop Frontier AI in line with the processes outlined in this Framework, and implement the measures outlined in Table 1. Section 3 on Outcomes & Thresholds provides more information about how we define our thresholds.
The AI Industry Needs More Open Safety Policies
Meta’s detailing and documenting its standards is a refreshing stance in an industry that appears to be recklessly rushing toward artificial general intelligence (AGI). Employees across the industry have warned there is not enough being done to ensure safe development.
By clearly defining what its safety goals are, and committing to halting development of critical risk models, Meta is setting itself apart as one of the few AI companies that is putting safety first and foremost, with Anthropic being another notable example.
Hopefully, other companies will take note and follow suit.