Mutually Assured AI Malfunction - The Opinionated Man

## A Concept Born From Fear There is a specific type of intellectual vertigo that arrives when you realize serious people are seriously proposing frameworks for preventing nation-states from blowing up each other's data centers to stop the emergence of superintelligent AI. Welcome to MAIM. Mutually Assured AI Malfunction. If that sounds like a concept ripped from a dystopian thriller, that is because it essentially is. Except it was authored by Dan Hendrycks, Eric Schmidt, and Alexandr Wang in a paper titled [*Superintelligence Strategy*](https://www.nationalsecurity.ai/). Schmidt ran Google for a decade. Wang built Scale AI. Hendrycks runs the Center for AI Safety. These are not fringe thinkers. They are the people the U.S. government actually listens to. And yet the concept they propose is radical enough to have split the AI safety and AI governance communities down the middle. Let me explain what MAIM is, where it came from, and why it matters. ## The Problem It Is Trying to Solve Before you can understand MAIM, you need to accept the premise it is built on. The premise: superintelligence, meaning AI that exceeds human cognitive capacity across virtually all domains, is likely coming. When it arrives, whichever nation controls it first will possess an advantage so overwhelming it effectively renders all other forms of power obsolete. That nation could dominate the global economy, neutralize existing military capabilities, and reshape international order unilaterally. If you are a nation-state that is not that nation, this is an existential threat. Not metaphorically. Literally. [As the *Superintelligence Strategy* paper puts it](https://www.nationalsecurity.ai/chapter/deterrence-with-mutual-assured-ai-malfunction-maim): if a rival state races toward a strategic monopoly, other states will not sit by quietly. If that rival loses control of its AI, survival is threatened. If it retains control, survival is equally threatened. Either way, you are in danger. The rational response, then, is to disable the threat before it materializes. ## What MAIM Actually Proposes MAIM is modeled explicitly on nuclear Mutual Assured Destruction. The logic runs as follows. During the Cold War, no nuclear power launched a first strike because every major power knew that doing so would trigger a retaliatory strike that would annihilate them. The threat of mutual destruction was so credible and so catastrophic that it produced an uneasy but durable equilibrium. Nobody wanted to blink first because blinking meant extinction. MAIM attempts to replicate that equilibrium for the AI race. [The original paper describes the deterrence dynamic](https://www.nationalsecurity.ai/chapter/deterrence-with-mutual-assured-ai-malfunction-maim) as follows: any state's aggressive bid for unilateral AI dominance would be met with preventive sabotage by rivals. Interventions could range from covert cyberattacks degrading training runs, to physical damage disabling AI infrastructure. The result is a standoff: no one dares sprint for the finish line because doing so paints a target on your data centers. The paper also argues MAIM is not merely aspirational. It is already, in embryonic form, the default strategic reality. The question is whether we formalize and stabilize it before it degenerates into chaos. To stabilize it, the authors propose clearly communicated escalation ladders, transparent AI infrastructure placement far from population centers, chip-level geolocation controls, and multilateral verification mechanisms analogous to nuclear inspections. ## The Three Pillars The broader *Superintelligence Strategy* framework rests on three legs: **Competitiveness.** The United States must remain at the technological frontier. This means domestic chip manufacturing, AI investment, and retaining the global talent advantage. This is the least controversial pillar and the most bipartisan. **Nonproliferation.** AI capabilities, particularly frontier model weights and high-end chips, must be kept out of the hands of non-state actors and rogue states. The authors propose treating advanced AI chips essentially as WMD inputs, subject to export controls, geofencing, and remote attestation. This is more contentious but enjoys broad support among national security professionals. **Deterrence through MAIM.** This is where the document becomes genuinely radical, and where serious disagreement begins. ## What the AI Safety Community Makes of It I have spent time across the key voices engaging with this framework, and the reaction is best described as: fascinated, alarmed, and deeply divided. **LessWrong's Zvi Mowshowitz**, one of the most rigorous independent analysts of AI risk, gave [a characteristically thorough breakdown](https://www.lesswrong.com/posts/kYeHbXmW4Kppfkg5j/on-maim-and-superintelligence-strategy). His core critique: MAIM depends on a chain of assumptions that all have to hold simultaneously. Every major player must recognize that superintelligence is imminent. Every player must believe rivals will actually escalate. Every player must believe that escalation would succeed or trigger total war. Miss any one of those, and the deterrence logic collapses. He notes that nuclear proliferation by North Korea happened precisely because states gambled, correctly, that no one would follow through. Our track record of credible deterrence in practice is, in his words, "highly spotty." He also raises a point I find deeply underappreciated: human sabotage will become less effective over time as AI systems themselves take on more of the targeted work. Cyberattacks get harder as both offense and defense are increasingly AI-assisted. The very technology you are trying to stop is eroding your ability to stop it. **The Machine Intelligence Research Institute (MIRI)** published a [detailed technical critique](https://intelligence.org/2025/04/11/refining-maim-identifying-changes-required-to-meet-conditions-for-deterrence/) arguing that MAIM struggles with unclear and unmonitorable red lines, questionable threat credibility (sabotage delays, it rarely denies), and a highly volatile deterrence calculus. Their assessment: MAIM as currently conceived is unlikely to provide stable deterrence. It needs significant refinement before it could serve as doctrine. **AI Frontiers** published a compelling piece arguing that [MAIM has a fundamental observability problem](https://ai-frontiers.org/articles/why-maim-falls-short-for-superintelligence-deterrence). In order to know when to intervene, states must effectively monitor each other's AI development. But frontier AI development is increasingly opaque, decentralized, and algorithmically complex. The result is two equally dangerous failure modes: missing critical signs of advancement until it is too late, or misinterpreting normal R&D activity as an existential threat and triggering unnecessary sabotage. **The RAND Corporation**, in a [March 2025 commentary](https://www.rand.org/pubs/commentary/2025/03/seeking-stability-in-the-competition-for-ai-advantage.html), identified what I consider the sharpest structural flaw in the MAIM framework: it inverts the logic of MAD entirely. MAD worked because *neither* side could strike first and survive. MAIM is premised on the ability to strike first, which creates first-strike incentives rather than eliminating them. Instead of preventing a race, it could accelerate one. ## The Uncomfortable Strategic Geometry Here is the part of this debate that I think gets insufficient attention. MAIM assumes that "intelligence recursion," meaning fully autonomous AI doing AI research and compounding its own capabilities at machine speed, constitutes the obvious red line that triggers a response. One state reaches that threshold, and rivals act. But as the LessWrong analysis makes vivid: who decides when that threshold has been crossed? How confident does a nation need to be before it launches a cyberattack, or a missile strike, on another nation's data centers? What is the acceptable false positive rate when the cost of being wrong is a major international incident, potentially a war? These are not rhetorical questions. They are operational planning questions with no clean answers. And they point toward a hair-trigger dynamic that RAND analysts describe as a potential "balance of AI terror" that is less stable, not more stable, than the Cold War ever was. There is also the private sector problem, which nobody in the deterrence community seems to fully reckon with. The Cold War nuclear balance was managed between states. AI in 2026 is being built by OpenAI, Anthropic, Google DeepMind, Mistral, Meta, and dozens of well-funded startups. These organizations operate across jurisdictions, release open-weight models, and are answerable to investors rather than defense ministries. A MAIM regime taken seriously would logically require nationalizing AI research. I doubt that is an outcome anyone in this conversation has genuinely modeled. ## My Take I do not dismiss MAIM the way some critics do. I think Hendrycks, Schmidt, and Wang have done something genuinely important: they have forced the AI policy world to confront the geopolitical endgame of the current race, and proposed a concrete framework rather than vague calls for international cooperation. That takes intellectual courage. Most of the AI governance discourse is long on urgency and short on specifics. But I share the RAND analysts' and MIRI's concern that the framework creates more instability than it resolves. Deterrence that depends on first-strike credibility is not deterrence. It is a countdown clock. And a countdown clock is tolerable when the underlying technology is static. It is not tolerable when the technology is recursively improving itself and every week of delay potentially changes the strategic picture. What I find most honest in the LessWrong analysis is this admission: even if you cannot make MAIM work as a stable doctrine, developing the *capability* to implement it gives you options that would otherwise not exist. Having the ability to credibly threaten sabotage, even if you never exercise it, changes adversarial calculus. That is worth something. The question is whether "worth something" is a high enough bar for a framework that, if misapplied, could trigger the conflict it was designed to prevent. I keep returning to Oppenheimer after Trinity. The relief. The horror. The immediate realization that the thing you built was not the last word in the story, but the first sentence of a much longer and more dangerous one. We are writing that first sentence again, right now. Are we paying enough attention to what comes next?