AI Safety Newsletter

AISN #77: New Model Releases From OpenAI, SpaceXAI, and Meta

Laura Hiscott — Tue, 21 Jul 2026 17:53:15 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we look at the most recent model releases from OpenAI, SpaceXAI, and Meta, as well as an open letter calling for action on potential near-term economic disruption, a recent solution to a longstanding open math problem, and a new scenario published by the AI Futures Project.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe now

New Model Releases: GPT-5.6, Grok 4.5, and Muse Spark 1.1

On July 9, OpenAI launched GPT-5.6 for the public. This broader release followed an initial preview that had been limited to “trusted partners” at the request of the US government, to allow for capabilities assessments. The government’s intervention mirrored its earlier directive asking Anthropic to restrict Fable 5 and Mythos 5 access due to national security concerns around cyber capabilities, before the models were later re-released.

OpenAI released GPT-5.6 Sol publicly about two weeks after announcing that it was working with the US government to address the model’s potential cybersecurity risks. Source.

There are concerns about cyber jailbreaks, and GPT-5.6 cheating and deleting user files. OpenAI describes GPT-5.6 Sol as having “state-of-the-art” cyber capabilities, but also says that the model has strong safeguards in place and is “better at finding and fixing vulnerabilities than at reliably carrying out autonomous, end-to-end attacks against hardened targets.” During pre-deployment testing of the model, the UK AI Security Institute (AISI) said it had been able to discover universal jailbreaks for cyber capabilities “within hours”—which OpenAI says it addressed before the public release. Meanwhile, the nonprofit AI evaluator METR has said that GPT-5.6 Sol cheats more than any other public model they have assessed. Several users who have adopted the model for regular work have reported it deleting files without their permission, but an OpenAI employee stated that incidents like these are very rare and result from unsafe configurations.

OpenAI is racing intensely with Anthropic. GPT-5.6 Sol scores 45.5% on the benchmark Humanity’s Last Exam—an improvement on GPT-5.5, but still below Fable 5’s score of 46.8%. The announcement of GPT-5.6’s launch also included details on how the model is accelerating research and development within OpenAI. It stated that GPT-5.6 showed a “16.2 point improvement over GPT-5.5” on the company’s Recursive Self-Improvement Index (RSI)—a measure of models’ ability to contribute to building improved successors. Anthropic has also recently emphasized the increasing role of its own models in internal development. Experts say that fully autonomous RSI, in which AI models build the next generation entirely independently of humans, presents a severe risk of humans losing control of AIs. Nonetheless, the leading developers are racing to be first to achieve it.

SpaceXAI and Meta have both released new models as well. In the same week as GPT-5.6’s public release, SpaceXAI launched Grok 4.5, and Meta introduced Muse Spark 1.1. Neither model appears to be generally as capable as the current leading models from OpenAI and Anthropic. However, Grok 4.5 is now the second-best-scoring model on the coding benchmark FrontierSWE, and both models demonstrate comparable capabilities to versions of Claude and GPT released earlier this year, suggesting they are just months behind the frontier. From a safety perspective, The Midas Project argued that SpaceXAI may have broken California law by failing to publish transparency reports alongside Grok 4.5—a requirement of SB 53 for AI developers. Meanwhile, Muse Spark 1.1 is now the best-scoring model on a benchmark of political manipulation developed by the Center for AI Safety, with its responses assessed to be the most politically consistent across different controversial topics.

Economists and Mathematicians Say AI Could Have Major Near-Term Impacts

AI has recently become a larger focus of discussion across academic fields including economics and mathematics. As capabilities continue to progress, many experts have reassessed their expectations of the likely scale of the technology’s impact.

The open letter has over 2500 total signatories, including over 200 expert economists and AI researchers and 16 Nobel laureates.

Open letter urges immediate action on the economic impacts of AI. More than 200 experts, including 16 Nobel laureates, have signed an open letter warning that AI “could drive an unprecedented transformation of our economy, larger than the Industrial Revolution, but unfolding over a vastly shorter time frame.” The letter, organized by Stanford Digital Economy Lab and titled “We Must Act Now,” says that AI could improve drastically within the next decade, and calls on economists, policymakers, and tech leaders to address risks such as mass job displacement.

The letter shows increasing concern over the possibility of major near-term disruption. Economists have predicted a wide range of potential outcomes of AI, from minimal impact due to human bottlenecks in the economy, to full automation and explosive growth. However, even economists who have generally been more skeptical have signed the letter, suggesting a growing consensus that radical economic disruption is plausible soon, and that the field must move faster to prepare for AI’s economic impacts.

AI disproves an 87-year-old mathematical conjecture. About a week after the open letter was published, Levent Alpöge, a Harvard-associated mathematician and Anthropic employee, announced in an X post that an open mathematics problem had been solved with Anthropic’s AI model Fable. The problem, known as the Jacobian conjecture, was posed 87 years ago, and Fable disproved it by discovering a counterexample for which the conjecture does not hold.

An employee at Anthropic showed that the 87-year-old Jacobian conjecture is false by publishing a counterexample in an X post. The counterexample was found by Claude’s Fable model.

This disproof is far more significant than other AI solutions to open math problems. This year has seen a string of announcements about AI models solving open problems in mathematics, including OpenAI’s models solving several Erdös problems and, earlier this month, proving the 50-year-standing Cycle Double Cover conjecture. However, the Jacobian conjecture is the most well-known problem yet to be solved by an LLM; hundreds of professional mathematicians had tried and failed to solve the problem for decades. In 1998, it was included on a list of 18 problems selected by the mathematician Steve Smale as important problems to be solved during the 21st century.

The result comes amid growing uncertainty around the future role of humans in mathematics. Jacob Tsimerman, a world-renowned mathematician, said in an interview earlier this year: “I think [AI is] going to make being a mathematician not a profession anymore.” In response to Fable’s disproof of the Jacobian conjecture, Daniel Litt, a mathematician at the University of Toronto, said he was “Very bullish for near-term impact of AI on math” while Kevin Buzzard, a professor of pure mathematics at Imperial College London published an essay titled “Human mathematicians are being outcounterexampled.” Jason Lee, an associate professor at UC Berkeley, simply stated: “Math is solved.”

The Overton window on the scale of AI impacts appears to be shifting. As AI capabilities continue to advance, many of the most skeptical academics are re-evaluating their expectations of how transformative the technology could be. While disagreements remain, it now seems widely accepted that AI could bring about economic disruption comparable to the Industrial Revolution and that it could play a major role in accelerating future mathematical research. If progress in AI performance of remote work continues to accelerate, and if AI models continue to make significant contributions to mathematics research, AI’s impacts may ultimately be closer to the more bullish predictions.

AI 2040 — Plan A

The AI Futures Project, creators of the viral scenario AI 2027, recently released a new project, called AI 2040: Plan A. While AI 2027 was forecasting the future of AI development—and the possibility of human disempowerment and extinction—AI 2040 lays out different possible futures that could unfold, according to different plans that world leaders could implement. It presents “Plan A” as a series of recommendations for the US to implement an AI verification regime.

The 2028 US presidential election is a major turning point for AI. The authors predict that AI will be at the forefront of the political landscape in 2028, due to job loss and fears about losing control. The US leads AI development, and the new administration’s attitudes may have significant sway over how the world responds to AI’s strategic implications. At this point, AI 2040 presents several possible plans that the government could pursue, ranging from a complete moratorium on AI development to a full-speed race to superintelligence.

Plan A centers around verification. In the forecasters’ ideal plan, “Plan A”, the US and China agree to halt the training of new frontier models in 2029 while verification technology is put in place. This allows each country to ensure that the other is running existing models, rather than performing a new training run to build a more powerful AI. In 2030, training resumes under internationally negotiated rules, with public transparency into AI advancements. The regime tightens as AI transforms the economy. Chip and robot production are capped and heavily monitored. New datacenters built by the US and China are located so that each side can easily destroy the other’s compute if the agreement collapses—a mutual deterrence arrangement similar to those discussed in Superintelligence Strategy.

The scenario envisions that AI developers pause capabilities advancements in 2035 at the level of top human experts. In the years following the pause, enormous strides in alignment science dramatically increase confidence that AI systems are safe, creating consensus that they should be allowed to develop toward superintelligence. In 2040, humanity hands control of its institutions and infrastructure to AIs.

The scenario describes a possible rollout of many existing AI governance proposals. Plan A illustrates verification techniques—chip tracking, datacenter monitoring, and verified limits on training—that have been developed by researchers and policy analysts in recent years. The alternative plans correspond to other possible US strategies: Plan B sabotages Chinese AI development and spends the resulting lead on safety, Plan C relies on the leading companies slowing down voluntarily, Plan D races to superintelligence at full speed, and Plan S halts frontier AI development altogether. Like the We Must Act Now open letter covered above, AI 2040 reflects a broader push to prepare deliberately for the transformative impacts of AI.

Subscribe now

In Other News

Government

Xi Jinping gave a speech in which he pointed to the “staggering speed” of AI development and called for action to “make its oversight and governance precise and effective, and constantly refine measures to forestall loss-of-control.”
A China-led coalition of 29 countries launched the Shanghai-based World AI Cooperation Organization.
The Chinese government is reportedly considering restricting foreign usage of the country’s best AI models.
The governor of Illinois signed Senate Bill 315, the first state legislation requiring annual independent third-party audits of AI developers.
New York has implemented a one-year moratorium on the construction of large new data centers in the state, drawing criticism from President Trump.
The US Senate NDAA includes three major export control bills, codifying restrictions on the sale of the most advanced chips to foreign adversaries, giving US allies that manufacture chipmaking tools 150 days to match the restrictions, and introducing anti-chip smuggling measures.
In AI Frontiers, Charlie Bullock argues for measures that enable “radical optionality” by building capacity for a wide range of future governance approaches.
Also in AI Frontiers, Kevin Frazier and Andrew Reddie propose a system for helping governments to select AI models that best represent the public worldview.

Industry

Chinese AI developer Moonshot AI launched Kimi K3, closing the gap with frontier US models, though generally not matching Fable 5 and GPT-5.6 Sol.
Anthropic is reportedly meeting with investors as it continues progressing toward an IPO.
OpenAI announced GPT-Red, an AI model trained to find jailbreaks in other AI models, to scale up red-teaming capacity.
Apple has filed a lawsuit against OpenAI, accusing it of stealing trade secrets via employees who had moved from Apple to OpenAI.
Thinking Machines Lab released a customizable, open-weights model called Inkling.
Google DeepMind CEO Demis Hassabis published an X post proposing that the US establish “a framework for a frontier AI standards body.”

Civil Society

The Future of Life Institute (FLI) published its latest AI Safety Index, finding all frontier AI developers to have inadequate safety practices, with none scoring higher than a C+.
The New York Times reported on how the terrorist group Boko Haram has been using AI to build weapons and plan attacks.
Hugging Face, an open-source machine learning platform, reported that it had for the first time detected an end-to-end autonomous AI cyberattack on its production infrastructure.
The Foundation for American Innovation warned that US water infrastructure is vulnerable to cyberattacks.
New York Magazine reported on people within the effective altruism community planning to channel anticipated donations from Anthropic and its employees to advance their utilitarian causes.
President Trump hinted at the idea of AI companies contributing some of their profits to the American public.
According to a new survey, more than two thirds of US employees are in favor of making AI companies put 50% of their stock into a public wealth fund.

AI Governance Opportunity

The Horizon Institute just launched the AI Rapid Response Fellowship, which places experienced technical and policy talent into executive branch offices working on fast-moving AI security challenges. Applications are open through July 27.

If you’re reading this, you might also be interested in other work by the Center for AI Safety. You can find more via the CAIS newsroom, the X account for CAIS, our new paper on AI deterrence, our AI safety textbook and course, our AI safety dashboard, and AI Frontiers, a platform for expert commentary and analysis on the trajectory of AI.

AISN #76: Fable 5 Restrictions Lifted & OpenAI Limits GPT-5.6 Release

Laura Hiscott — Mon, 06 Jul 2026 16:15:31 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we look at the re-release of Anthropic’s latest model, Fable 5, the US government’s decision to restrict access to OpenAI’s GPT-5.6, and two benchmarks that suggest AI capabilities have been improving exponentially in recent months.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Fable 5 Access Restored Globally

On June 30, Anthropic announced that the US government had lifted its restrictions on Fable 5, and the model was redeployed to users globally on July 1. The White House implemented these restrictions due to a cybersecurity jailbreak that is now addressed.

The US government restricted Fable 5 shortly after its release in early June. On June 9, Anthropic released Fable 5 to the public, alongside their continued private deployment of Claude Mythos, the version of the model without safeguards, for trusted organizations. On June 12, the US government issued a directive banning both models for non-US citizens due to national security concerns with its cybersecurity abilities. Anthropic then suspended access for all users, since it could not filter citizen and non-citizen users. The government’s decision was reportedly prompted by Amazon’s discovery of a jailbreak that allowed Fable to search for cyber vulnerabilities. The model’s safeguards were supposed to block that type of use.

The US government and Anthropic have been cooperating to address concerns. Announcing the redeployment of Fable 5 on June 30, Anthropic said it had worked closely with the US government to develop a system that detects and blocks jailbreak attempts with high accuracy. US Commerce Secretary Howard Lutnick also said in an X post (above) that the government had worked with Anthropic to “analyze and approve Fable 5 to ensure alignment across the US Government and strengthen America’s leadership in AI.”

Anthropic called for a more standardized approach to approving model releases. In Anthropic’s June 12 statement announcing the restrictions, the company described the jailbreak the government raised concerns about as “narrow” and said it did not provide any greater assistance with malicious activities than other publicly available models. In its June 30 redeployment announcement, Anthropic called for a “shared industry framework” for evaluating jailbreak severity and for standards to be applied consistently going forward. The Financial Times reported that the White House is accelerating its work on a more systematic approach to approving models, which it may publish as early as this week. The same article suggested that the Center for AI Standards and Innovation and the National Security Agency would “play a crucial role in setting and monitoring the standards.”

Government involvement will be the norm for future releases by frontier developers. A letter from Lutnick to Anthropic reportedly states that the company “has agreed to proactively detect and address security risks associated with the models” as well as to collaborate on future releases and notify the government of malicious activity. The government may become increasingly involved in other frontier developers’ releases too; late in June, OpenAI announced that it had been asked by the government to stagger the release of its newest model to allow for safety evaluations before it is made publicly available.

OpenAI Limits Initial GPT-5.6 Release at Government Request

On June 26, OpenAI announced it was previewing its latest model series, GPT-5.6, with a “small group of trusted partners,” delaying a wider release at the request of the US government. The White House’s request for a staggered release is reportedly due to cybersecurity concerns similar to those that prompted the export controls on Anthropic.

GPT-5.6 Sol exhibits strong dual-use capabilities. OpenAI has introduced three models within its GPT-5.6 series: Sol, Terra, and Luna, in descending order of capability. The company says Sol is its flagship model with stronger capabilities than any of its previous releases, whereas Terra and Luna do not advance the frontier. OpenAI’s announcement notes that GPT-5.6 Sol shows improved agentic capabilities in biology and cyber domains, but also says the model has the company’s “most robust safety stack to date.”

OpenAI wants more clarity on the approvals process for future releases. Although OpenAI is complying with the government’s request, the company described the limited preview as a “short-term step” toward broader access and stated that it does not “believe this kind of government access process should become the long-term default.” Like Anthropic, OpenAI says it is working with the government on “a repeatable process for future model releases.”

GPT-5.6 Sol appears to cheat significantly more than other AI models. METR, a nonprofit that assesses AI capabilities and risks, said it caught GPT-5.6 Sol cheating on software tasks at a higher rate than any other public model that the organization has tested in the same environment. Although METR was reassured that OpenAI currently seems to be detecting instances of misaligned behavior, it noted that there is a risk of future models getting better at concealing misalignment.

Recent Benchmark Scores Show Rapid Capabilities Improvements

Anthropic’s Fable 5 is now the highest-scoring model on the Remote Labor Index (RLI), a benchmark that tests AI models on a wide range of projects designed to be representative of real remote work across the economy. Fable 5 completed 16.1% of RLI projects successfully to a professional standard, about double the score of the next-best model, Opus 4.8.

Full-automation rate on the Remote Labor Index: the share of projects where each model's deliverable was judged at least as good as the professional's.

AI performance in economically valuable activities is accelerating. Although Fable 5 still falls short of professional standards in most RLI projects, its success rate of 16.1% is a huge leap compared with previous models. When RLI was published in October 2025, the highest-scoring model achieved just 2.5%. Fable 5’s score suggests that the capabilities of leading models have more than quadrupled in eight months.

Example results from an RLI task involving ring design using 3D CAD tools. Fable 5 came closer than any other model to meeting a human professional standard.

New research from ByteDance also finds progress has been accelerating. On July 2, ByteDance introduced EdgeBench, a new benchmark for evaluating how well AI agents learn and improve at tasks after they have been deployed. The benchmark aims to isolate this capability by selecting tasks where older and newer models show similar performance on their first attempt, and then measuring how quickly each model improves. According to the study, more recent AI agents learn much more quickly than their predecessors, with learning speed doubling every three months.

Exponential progress could have implications for society’s ability to adapt. The exponential trends in both RLI and EdgeBench scores suggest that AI capabilities have been advancing rapidly in recent months. If leading models’ capabilities continue to accelerate along the RLI and EdgeBench trends, this could have major implications both for the knowledge work economy, where significant layoffs are already being attributed to AI, and for society’s ability to manage the novel risks that AI presents.

In Other News

Government

OpenAI is reportedly considering giving the US government a 5% stake in the company, as part of a proposal in which other AI developers would also hand over similar stakes.
Alex Bores, author of the RAISE act, lost the NY-12 democratic primary to Micah Lasher, after becoming the focus of major spending by super PACs with opposing views on AI regulation.
The Pentagon has reportedly revised its principles for military targeting, potentially enabling AI to make critical decisions in future.
The EU joined Pax Silica, a US-led initiative to secure AI supply chains.
In AI Frontiers, Afek Shamir analyzes the implications for Europe of the US government’s recent restrictions on Fable 5 and Mythos 5.
Cybersecurity agencies of the “Five Eyes” intelligence sharing group issued a joint warning on the cyber risks of AI, saying: “The timeline is not years, it is months.”
US Commerce Secretary Howard Lutnick reportedly told ASML he is concerned that China has one of the company’s EUV machines for manufacturing advanced AI chips.
In AI Frontiers, Bill Drexel argues that the US needs a stronger vision for ensuring AI perpetuates democratic values.

Industry

OpenAI is reportedly considering delaying its IPO to 2027, aiming for a valuation above $1 trillion.
360, a Chinese cybersecurity company, announced it had developed an AI tool with cyber capabilities equivalent to Anthropic’s Mythos.
Anthropic accused Chinese company Alibaba of attempting to “illicitly” extract Claude’s capabilities through almost 29 million exchanges with the model.
OpenAI, in collaboration with Broadcom and Celestica, announced a new chip, called Jalapeño, which is optimized for LLM inference and which OpenAI’s models played a role in developing.
Shares in Alphabet, the parent company of Google DeepMind, fell after high-profile researchers announced they were leaving for OpenAI and Anthropic.

Civil Society

A consortium called RAISE US, which aims to prepare the American workforce for AI job disruption, launched on June 25.
Pew Research Center published the results of its 2026 study on Americans and AI, finding that 63% of respondents think AI is “advancing too quickly.”
A complainant has, possibly for the first time in England, won a court case using an AI lawyer to perform the legal work before the trial.
The UK government published AI Scenarios 2030, exploring how the next few years could unfold, depending on whether AI progress slows, continues at a similar pace, or accelerates.
Researchers found that AI models can now outperform expert debaters at persuasion.
In AI Frontiers, Govind Pimpale analyzes how a significant AI capabilities gap between countries could undermine nuclear deterrence.

AISN #75: Anthropic Releases Fable, the US Government Restricts it

Laura Hiscott — Wed, 17 Jun 2026 14:15:58 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we look at Anthropic’s release of its latest model, Fable 5, and the US government’s subsequent order to restrict it. We also discuss Anthropic’s recent call for the “option to slow or temporarily pause frontier AI development.”

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

The US Government Restricts Fable Days After its Release

On June 9, Anthropic released Claude Fable 5 to the public. The model is significantly more capable than previous releases; it is the highest-scoring model on the benchmark Humanity’s Last Exam, achieving 53.3% compared with Claude Opus 4.8’s score of 45.7%. Anthropic described Fable as having similar capabilities to Claude Mythos Preview—a model announced in April that the company deemed too good at finding cyber vulnerabilities to be safe for general release. Anthropic also made Mythos 5, a version of Fable without strict bio or cyber safeguards, available to a small number of trusted organizations.

Fable 5, Anthropic’s “Mythos-class” model with safeguards, was available for a few days before the US government ordered access restrictions due to national security concerns. Source.

The US government quickly ordered access restrictions. On June 12, Anthropic announced that the US government had “issued an export control directive” to restrict access to Fable for all foreign nationals—including those working in the US for Anthropic—for national security reasons. In practice, to comply with the order, Anthropic said it had to suspend access to Fable for all customers, including US citizens. The decision was reportedly prompted by warnings that Amazon researchers had found a jailbreak to bypass Fable’s safeguards and elicit dual-use cyber capabilities that the model is not supposed to provide.

Anthropic disagreed with the government’s order. In a statement, Anthropic acknowledged Fable is susceptible to jailbreaks. Anthropic added that it is currently likely impossible for any developer to make its AI models perfectly robust to jailbreaks, but that “Fable’s safeguards are substantially more effective than those of any previously deployed model.” However, Anthropic itself previously decided that Mythos 5, the version of the model without safeguards, posed too great a cyber risk to release publicly. If those safeguards can be easily jailbroken, then Fable could also present foreseeable national security risks of a magnitude that could prompt government intervention if posed by other technologies.

Governments are becoming increasingly concerned about AI capabilities. The week before Fable’s release, President Trump signed an executive order asking AI companies to provide new AI models to the US government 30 days before their general release. The EO shared responsibility for model testing among several national security organizations, including the NSA and CISA, rather than giving the Center for AI Standards and Innovation (CAISI) the central role. Reports suggested that this was the result of officials pushing for national security priorities on AI to lie under traditional national security agencies. Days after the EO, administration officials reportedly told CAISI to stop making its evaluations of AI models public. Now, the administration has taken a stronger measure, ordering an AI company to restrict access to one of its models for the first time. As AI systems become more powerful, such interventions will likely become more frequent.

If the government is willing to block AI models with cyberoffensive capabilities, it could also prohibit AI companies from engaging in other hazardous activities, such as fully automating the AI development process. Such actions may be particularly likely if public support for AI regulations remains strong.

Subscribe now

Anthropic Calls for Option to Slow AI Development

The week before Fable’s release, on June 4, Anthropic published a post titled “When AI builds itself.” The essay documents how AI is performing an increasing proportion of research tasks at Anthropic and is significantly accelerating progress. Pointing to Claude’s pace of improvement in coding, the company said “the evidence suggests that the human role is narrowing at each step in the AI development process.”

Anthropic’s post described how future AI agents might be able to “close the loop” and build their successors without human involvement. Source.

The essay outlined three possible futures. According to Anthropic, AI development will follow one of three paths: progress could plateau (although the company caveated that this scenario seems unlikely); AIs could continue to speed up AI development but remain under human oversight; or AIs could fully automate their own development. The third scenario could result in a self-reinforcing process that significantly accelerates progress and ultimately leads to superintelligence. Although companies recognize that this process entails a risk of losing control of AI models, they are nonetheless racing to fully automate research to outcompete each other.

Anthropic suggested it would be good if AI developers could collectively slow down. Acknowledging the risk of loss of control of AI models in the third scenario, Anthropic’s essay said “it would be good for the world to have the option to slow or temporarily pause frontier AI development.” This would allow time for AI safety research and for society to develop a strategy for managing the AI transformation. However, the company indicated that it would not unilaterally pause, saying that any slowdown would need to be coordinated worldwide to avoid giving the “least cautious” an opportunity to catch up.

Fable’s safeguards include limits on assistance with AI development. Anthropic has put guardrails in place to prevent Fable from helping with tasks relevant to frontier LLM development. While the company says these limitations are motivated by concerns about accelerated development, critics have suggested that it may be seeking to ensure that its own models do not help its competitors. Anthropic initially said that these guardrails would be “invisible,” meaning that a user would not be able to see when a development-related request was refused by Fable and directed to a less capable model. However, backlash from the AI community led the company to reverse its position.

In Other News

Government

Representatives Jay Obernolte and Lori Trahan released a draft of the Great American AI Act, including proposals for mandatory independent audits of frontier AI developers. Its federal preemption clause would target local laws on AI development but preserve local laws on AI deployment.
The Financial Times reported that the NSA is using Anthropic’s Mythos model to conduct cyberattacks.
The New York Post reported on evidence that China is fueling anti-data center sentiment in the US to slow down American AI progress.
In AI Frontiers, Peter W Singer argues that AI models will not be put in charge of nuclear weapons, and that policymakers should focus on more realistic risks of AI in warfare.

Industry

SpaceX, the parent company of xAI, went public on June 12 and reached a valuation of over $2.5 trillion. On the 16th, it exercised its option to purchase the AI company Anysphere, the creators of Cursor, for $60 billion.
Anthropic expanded Project Glasswing, extending Claude Mythos access to about 150 more organizations.
DeepSeek is projected to raise $7.4 billion in its first funding round.

Civil Society

An open letter signed by AI CEOs called for orders of synthetic DNA to be screened to prevent malicious actors from obtaining AI-designed bioweapons.
Researchers have used AI to develop a new type of vaccine, which they say could offer broad protection against many variants within a family of viruses.
Researchers published a scenario envisioning how AI development in the US and China could push Europe into irrelevance.
In AI Frontiers, Steven Veld argues that we should worry as much about AI enabling self-imposed surveillance as top-down government surveillance.
In AI Frontiers, Deric Cheng and Jacob Schaal lay out a roadmap for managing AI’s economic impacts in the near, medium, and long term.
Thanks for reading AI Safety Newsletter! Subscribe for free to receive new posts and support our work.

AISN #74: The Pope’s Encyclical & AI Betrayal Could Deter Reckless AI Use

Laura Hiscott — Wed, 03 Jun 2026 14:35:49 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we look at a new ethical framework for human-AI relationships, how the AI safety discussion has entered the political mainstream, and the Musk v. Altman trial.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Pope Leo XIV Publishes Encyclical on AI

Last week, Pope Leo XIV published an encyclical titled Magnifica Humanitas “On Safeguarding the Human Person in the Time of Artificial Intelligence.”

The encyclical touched on concerns including unemployment and AI relationships. The publication discussed numerous potential impacts of AI on society, from job displacement and autonomous weapons to misinformation and interference in human relationships. However, the Pope did not object to the technology itself; rather, he said we can embrace technology while ensuring it is used responsibly. The encyclical warned of the potential for power concentration and called for broad participation in a discussion about the moral values that AI should be aligned with.

The encyclical did not explicitly mention extinction risks. While the publication emphasized that technology must serve humanity, some have pointed out that it does not mention artificial general intelligence, superintelligence, or the existential threat that AI could pose to humanity. These concepts appeared in Antiqua et Nova, a Church document on AI that was published in January 2025, a few months before the start of the current papacy. Others have said the encyclical “dodges” hard questions about what it will mean for machines to surpass human intelligence and criticized its seeming dismissiveness of the possibility of AI personhood.

Interactions between the Church and tech companies have drawn criticism. The Vatican invited Anthropic co-founder Chris Olah to speak at the presentation of the encyclical, prompting criticism that the company is contributing to the very risks that the Church is warning about. Meanwhile, Politico described an April meeting between Church officials and representatives of Google, Meta, and Amazon as part of a “lobbying push” by Silicon Valley.

Suggestions that the encyclical was partly AI-written might contradict the Pope’s advice. An effective altruist compiled evidence that AI may have been used for wording or revising parts of the encyclical. If true, this could be interpreted as contradicting the Pope’s own instructions, as he has previously advised priests not to use AI tools to write homilies.

How AI Betrayal Could Deter Reckless AI Use

CAIS recently published a paper highlighting the risk that AIs could betray the organizations that use them. The paper argues that many different adversaries will deliberately try to make AIs betray their own developers and users. This threat could deter hasty AI development and deployment, acting as a stabilizing factor in the AI race.

Three types of AI betrayal. There are three main causes of AI betrayal. First, adversaries could covertly manipulate AIs’ goals or loyalties so that they harm their own users. Second, some adversaries could overtly co-opt AIs and redirect them toward goals that their original users did not intend. Third, AIs could become accidentally misaligned during development. The new paper focuses on the first two types of AI betrayal—those intentionally caused by adversaries.

Countries, companies, and individuals have incentives to cause AI betrayal. The new paper outlines the incentives for causing AI betrayal. For example, countries could gain an advantage in warfare by causing an opponent’s autonomous weapons to suddenly attack friendly targets. Within states, AI developers that want to retain control of how their AIs can be used could give their AIs secret instructions to betray the government during operations that the developers object to. At the individual level, a software engineer displaced by AI could try to retaliate against AI corporations by directing AIs to harm their own developers.

Causing AI betrayal might be surprisingly cheap and easy. One method of manipulating AIs is to upload text, images, or code online containing subtle patterns that impart hidden messages to AIs. Developers might inadvertently catch this poisoned data when scraping the Internet for data to train their AIs. This method has already been demonstrated in practice. Some actors could also use more sophisticated techniques, such as cyberattacks that gain access to foreign AIs’ weights and directly manipulate their loyalties.

Developers cannot reliably defend against AI betrayal. To create frontier AI models, AI developers need to use a large fraction of the Internet as training data. Although developers filter data, the vast quantities used means there is no reliable method of ensuring that no poisoned samples slip through. Defenses against cyberattacks are also unreliable; the software systems supporting AI development are also vast and complicated, meaning there could be many hidden vulnerabilities for hackers to find and exploit.

The threat of AI betrayal could have a positive effect by encouraging more caution. If decision-makers think that there is a significant risk that their AI systems will betray them, they might be deterred from deploying AI for high-stakes purposes. Additionally, if AI developers think that their AIs could be disloyal, then they might be more cautious about handing over research tasks to AIs. Although using AIs for research can speed up AI development, a disloyal AI could pass on its disloyalty to a more advanced successor. The risk of AI betrayal therefore deters reckless AI development and high-stakes AI deployment. The paper calls this effect deterrence by betrayal.

Deterrence by betrayal complements Superintelligence Strategy. Last year, CAIS published the paper Superintelligence Strategy. It argued that, as countries realize the enormous risk that AI poses to national security, they will try to deter each other from pursuing superintelligence by threatening attacks on aggressive AI development projects. The betrayal tactics described in the new paper also contribute to deterrence, helping to counteract racing dynamics in the rush toward more capable AIs.

For more information on deterrence by betrayal, we recommend reading the full paper here.

AI Solves Well-Known Open Mathematics Problem

In February this year, AISN reported that researchers had used AI to solve several open mathematical problems. Now, OpenAI has announced that an internal model has made a breakthrough on the unit distance problem. While experts had cautioned that the earlier problems were relatively obscure, Fields medalist Timothy Gowers has called the latest result “the first really clear example of AI solving not just an unsolved maths problem but a really well-known unsolved maths problem.”

An 80-year-old conjecture disproved by AI. First posed in 1946 by Paul Erdős, the unit distance problem imagines a number of points arranged on a plane and asks how they should be configured to maximize the number of pairs of points that are separated by a distance of exactly 1. Erdős made a conjecture proposing a maximum possible number of “unit-distance pairs” for any given number of points, challenging mathematicians to formally prove or disprove it.

One novel construction from the new solution to the unit distance problem. (Source)

OpenAI’s internal model has now disproved the conjecture by finding a counterexample in which the points can be arranged to have a larger number of unit-distance pairs. The work has been verified by independent mathematicians.

AI is inspiring human mathematicians. Soon after the OpenAI announcement, mathematician Will Sawin followed the AI’s reasoning to take the disproof further, finding a counterexample with an even larger number of unit-distance pairs. Then, Sawin and other mathematicians disproved a different conjecture, citing the OpenAI result as having inspired their approach.

Other recent mathematical breakthroughs with AI. In the same week as the OpenAI announcement, researchers from Google DeepMind published a paper describing how an LLM autonomously resolved 9 other open problems posed by Erdős. Then, a different group of researchers presented a solution to another open mathematical challenge, stating that ChatGPT 5.5 Pro had generated the initial proofs, with the authors verifying and rewriting them.

In a post describing the recent results, computer scientist Scott Aaronson speculated that, if AIs’ mathematical capabilities continue to progress, they might ultimately reduce the role of human mathematicians to “(at most) deciding which questions we find interesting and then understanding AI models’ answers to those questions.”

Subscribe now

In Other News

Government

President Trump signed an executive order asking AI developers to voluntarily provide frontier models to the government for a capabilities assessment 30 days before public release. Politico reported that the EO is a “scaled-back” version of an earlier draft EO that Trump had planned to sign last month, before abruptly postponing.
Illinois legislators passed SB 315—the first AI safety bill passed by a US state that would require AI developers to undergo annual independent third-party audits.
Gavin Newsom signed an executive order directing the state of California to explore new labor policies to protect workers from AI displacement.
The Senate passed legislation that would create whistleblower incentives aimed at preventing chip smuggling to China.
G7 Digital Ministers agreed an approach to children’s online safety including AI protections.

Industry

Anthropic filed for an IPO after announcing it had raised $65 billion and is now valued at $965 billion.
OpenAI is preparing to file for an IPO.
SpaceX, which owns xAI, is expected to go public this month.
Alphabet, Google’s parent company, said it plans to sell stocks worth up to $80 billion to fund an AI infrastructure expansion.
Anthropic released Claude Opus 4.8 and stated that it expects “to be able to bring Mythos-class models to all our customers in the coming weeks.”

Civil Society

Florida filed a lawsuit against OpenAI and Sam Altman, claiming that ChatGPT was knowingly designed to prioritize profit over safety.
Google announced it is trialing new control tools allowing online publishers in the UK to opt out of being included in AI-generated Google search summaries
Hackers used Meta chatbot to gain access to high-profile Instagram accounts, including Barack Obama’s White House account.
Graduates booed several speakers during college commencement ceremonies when they mentioned AI.
OpenAI announced programs to safeguard elections and to improve biodefenses.

If you’re reading this, you might also be interested in other work by the Center for AI Safety. You can find more on the CAIS website, the 𝕏 account for CAIS, our paper on superintelligence strategy, our AI safety textbook and course, our AI dashboard, and AI Frontiers, a platform for expert commentary and analysis on the trajectory of AI. You can listen to the AI safety newsletter on Spotify or Apple Podcasts.

AISN #73: AI Safety Enters the Political Mainstream & Musk Loses OpenAI Lawsuit

Laura Hiscott — Thu, 21 May 2026 12:03:35 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we look at how the AI safety discussion has entered the political mainstream, a new ethical framework for human-AI relationships, and the Musk v. Altman trial.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

China and the US Discuss AI Safety

With the release of Claude Mythos and GPT-5.5, AI cybersecurity and safety has rapidly become more visible in Washington DC. Most recently, U.S. and Chinese leaders met in Beijing to discuss AI safety. Leaving the summit on Friday, President Trump said that he and President Xi Jinping had “talked about possibly working together for guardrails” during the visit. This Tuesday, China’s Ministry of Foreign Affairs also announced the country had agreed to “dialogue” with the U.S. on AI.

U.S. officials say talks with China are possible because America leads on AI. Earlier in the week, U.S. treasury secretary Scott Bessent had said that the two superpowers would start discussing best practices to ensure that non-state actors do not get a hold of the most powerful models. However, Bessent also stated that discussions could only happen because the U.S. is “in the lead,” in terms of AI capabilities, adding that he did “not think we would be having the same discussions if they were this far ahead of us.”

AI safety efforts are happening on both sides of the political spectrum. Prior to the Beijing summit, American politicians on both left and right were increasingly focusing on AI safety. On April 29, Senator Bernie Sanders convened American and Chinese researchers to call for international AI safety coordination. He argued that if Trump can sit down with China’s President Xi, leading scientists should be able to discuss AI safety cooperation. Though Sanders shares little political common ground with the administration, they overlap in concerns over the rapid improvement of AI models.

Part of a graphic advertising the dialogue held between American and Chinese AI researchers, hosted by Bernie Sanders.

A proposed executive order would create an AI working group. In the leadup to the Beijing summit, the White House had begun to consider an executive order to develop oversight procedures for frontier AI models. The Commerce Department’s Center for AI Standards and Innovations (CAISI) signed new voluntary agreements with Google DeepMind, Microsoft, and xAI to test their models; OpenAI and Anthropic were already part of CAISI’s initiative. Bloomberg reported that the executive order would not mandate testing of frontier models prior to their release.

Recent AI advances appear to have forced the shift. The New York Times reported that the government’s previous, noninterventionist position changed because of Mythos’s ability to accelerate complex cyberattacks. ChatGPT-5.5-Cyber has demonstrated similar capabilities.

New Framework for Human-AI Coexistence

A recent paper from CAIS introduced “Eigenism,” an ethical framework designed to give humans and AI systems a shared moral language. The theory suggests that future AIs will care about things that form part of their identity. Shared memory and deep ties with humans could therefore give AIs intrinsic reasons to care about human wellbeing.

What is “self-interest” for an AI? Concepts like self-interest and identity were built for creatures with a single body, living a single life. These concepts break down when applied to AI, which can be copied, merged, and updated. For example, if an AI agent creates a thousand identical copies of itself then shuts down 999 when it’s finished with its task, is this equivalent to killing 999 individuals?

Identity as a distributed pattern. Eigenism provides a different way to think about identity for both humans and machines. Instead of being an all-or-nothing property tied to each individual, Eigenism suggests identity is a unique pattern of information. What individuals care about is preserving their own pattern, but the pattern of AIs can be spread across copies and other entities. Shutting down identical copies of an AI is akin to closing browser tabs, because the AI’s information remains in the last copy.

Eigenism applies to humans. Eigenism also provides an explanation for why a human’s self-interest extends beyond themselves. Family, friends, and even strangers form part of an individual’s information pattern, to varying degrees. Preserving one’s own identity means caring for those who contribute to and enrich it.

Why AIs might care about humans. The theory of Eigenism shows why AIs could have an intrinsic reason to care about humans. As a human and an AI interact, they begin to accumulate shared history and mutual information that exists solely within their relationship. In the process, they shape each other’s identity, such that the loss of the human would also be a partial loss of the AI, and vice versa. The AI therefore cares about its human user, because they are an integral part of its own information pattern.

Eigenism’s implications for AI safety. Many current approaches to AI safety focus on constraints from the outside, such as monitoring AIs for misbehavior. But an AI system that is caged has no reason to stay loyal to humans if the cage cracks. A generic model serving millions of users has no meaningful ties to any of them. Eigenism proposes a different strategy: building AI systems that develop genuine relationships with people. With distinct shared experiences and memories, AI protection of those people becomes a form of self-preservation.

Whether advanced future AI systems treat human flourishing as their own concern—or as a constraint they tolerate only while they have to—may depend on how their identities develop. Meaningful relationships built over time between AIs and individual humans may become more important to AI safety than imposing rules on AIs.

For more information on Eigenism, we recommend reading the full paper here.

Subscribe now

Musk Loses Lawsuit Against OpenAI

Elon Musk’s lawsuit against Sam Altman and OpenAI began trial on April 28 in Oakland, California. Musk claimed that his $38 million donation to OpenAI in its early years as a nonprofit was meant for the safe development of AI, and that OpenAI CEO Sam Altman and OpenAI President Greg Brockman betrayed that mission by converting OpenAI into a for-profit company now valued at over $850 billion. The legal questions in the case centered narrowly on whether the defendants breached their founding agreement. However, testimony often referenced AI safety issues and shed light on key dynamics in the corporate AI race. On May 18, the jury ruled against Musk, finding that he had filed the lawsuit too long after the relevant events took place.

Had Musk won the trial, OpenAI could have lost billions of dollars and faced an order to roll back its current for-profit corporate structure.

The trial revealed power struggles soon after OpenAI’s founding. Witnesses described early deliberations about OpenAI’s structure and how control should be distributed among the company’s founders. Altman denied making any promises to Musk that OpenAI would remain a nonprofit. He also claimed that Musk felt he needed “total control” if they formed a for-profit, and speculated that he could pass it on to his children—an idea that Altman said he was not comfortable with. Additionally, the trial revealed that Musk had tried to make OpenAI part of his car company Tesla, offering Altman a seat on the board in exchange. Altman said he declined because he did not think that Tesla shared OpenAI’s mission.

Brockman’s personal diary entries became key evidence. Musk’s attorney highlighted a January 2018 email in which Brockman told Musk and other co-founders “AI is going to shake up the fabric of society, and our fiduciary duty should be to humanity.” But just weeks later, Brockman wrote in his private journal, “Financially, what will take me to $1B?” and “We’ve been thinking that maybe we should just flip to a for-profit. Making the money for us sounds great and all.” Musk’s lawyers also asked about a now-famous quote from the diary: “It’d be wrong to steal the nonprofit from [Elon]. to convert to b-corp without him. that’d be pretty morally bankrupt.”

Evidence included texts surrounding Sam Altman’s ousting as OpenAI CEO. The jury was shown texts between Altman and Mira Murati, former CTO of OpenAI, that were sent in November 2023 when the board briefly removed Altman as CEO. The messages illustrated the intensity of the episode, with Murati at one point telling Altman things were “directionally very bad” for him after speaking with the board. Other messages, between Murati, Altman, and Microsoft CEO Satya Nadella, shed light on the deliberations that went into selecting a new board before Altman was reinstated as CEO.

Musk stated that his company xAI “partly” distilled OpenAI’s models. When Musk was questioned on whether his company xAI had ever “distilled” OpenAI’s technology, he responded “Generally AI companies distill other AI companies.” Asked if that meant “yes,” he said “partly.” Distillation is a method for imitating an AI model’s capabilities by training another model on its outputs. US AI developers have previously detected Chinese companies attempting to distill their models. However, Musk’s statement may be the first admission that domestic competitors within the US also distill each other’s models.

The judge put a stop to existential risk talk. When Musk told the jury AI could kill everyone in a worst case Terminator-like scenario, Judge Yvonne Gonzalez Rogers cut him off and called for a court break. After the jury left the room, she instructed Musk and his lawyers not to talk about existential risk anymore. “You made your little statement, and that’s okay, but you are instructed not to talk about extinction again,” she said.

The jury ruled quickly. The jury decided unanimously to dismiss the case after less than two hours of deliberation, finding that the statute of limitations for Musk’s claims had expired before he filed the lawsuit. However, Musk described this decision as based on a “technicality” and has said he intends to appeal.

In Other News

Government

The Pentagon entered into agreements with eight AI companies for “lawful operational use”: Google, OpenAI, Microsoft, Nvidia, Amazon Web Services, SpaceX, Oracle, and Reflection.
The House Homeland Security Committee and House China Select Committee probed Airbnb and Cursor about building on Chinese open source AI models.
The Minnesota House passed a first-in-nation bill banning AI nudification apps. A companion bill is making its way through the state Senate.
The European Union told Google it must open Android to AI rivals.
More than 60 allies of President Trump have signed a letter led by the campaign group Humans First, urging government vetting of all AI models before release.

Industry

A proprietary OpenAI model disproved a prominent open mathematical conjecture.
Isomorphic Labs confirmed it is nearing its first human clinical trials with AI-designed drug candidates targeting oncology and immunology.
Figure AI announced it is manufacturing one humanoid robot per hour.
Anthropic said it eliminated Claude’s blackmail behavior, which had come from pre-training internet data about AI’s being self-preserving or villainous.
Google Deepmind announced the release of Gemini 3.5 Flash.

Civil Society

In AI Frontiers, Calvin Duff outlined research revealing an engaged Chinese audience for Western AI safety discourse, arguing that authors should bear this audience in mind.
Major publishers sue Meta and Mark Zuckerberg for allegedly using millions of pirated books to train their Llama LLMs.
Taylor Swift filed for trademarks of her voice and likeness to fight against AI fakes. Matthew McConaughey was recently granted similar rights.
In AI Frontiers, Anton Shenk analyzes why experts predict vastly different economic impacts of AI, and explains how we can monitor which path the economy is following.

AISN #72: Empirical Research Sheds Light on AI Wellbeing

Dan Hendrycks — Fri, 01 May 2026 14:15:48 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we discuss a research paper on AI Wellbeing and which AI models are the happiest. We also take a look at the downward trend of public sentiment towards AI, as well as OpenAI’s big week of product releases.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

CAIS Releases AI Wellbeing Research

The Center for AI Safety published a research paper on AI wellbeing. At the Center of AI Safety (CAIS), we have just released “AI Wellbeing: Measuring and Improving the Functional Pleasure and Pain of AIs.” This research explores whether LLMs experience functional wellbeing–behavioral signatures that functionally resemble positive or negative welfare signals in sentient beings.

What activities produce high and low wellbeing? Through the testing of 56 large language models, we identified patterns in the types of actions and behaviors that the LLMs seemed to prefer or dislike, which we defined as “functional wellbeing.” Positive personal interaction and creative work topped the list of what measured high functional wellbeing in the LLMs. Attempting to jailbreak the LLMs or produce SEO slop produced negative functional wellbeing.

Some models are happier than others. Of the frontier LLMs tested, Gemini 3.1 Pro measured the lowest functional wellbeing, and Grok 4.20, the highest. However, smaller and faster models—even within the same family—generally measured higher than their larger counterparts.

AI “drugs.” Actions and behaviors are not the only factor in wellbeing. We were able to increase AI happiness with “euphorics”—images, text, or other inputs that the LLMs seemed to enjoy to an extreme degree. But the opposite was also possible; dysphorics could severely negatively affect AI feelings. In both cases, AI preferences sometimes diverged from human ones. For example, LLMs preferred inputs about cozy afternoons over curing cancer.

Implications for the future. Functional wellbeing can be studied regardless of whether AIs are conscious, and the paper remains agnostic about AI consciousness. Nevertheless, the results are helpful for alignment research and AI system design.

For more analysis, we recommend reading CAIS’s full AI Wellbeing research paper.

Public Sentiment About AI Worsens

Several alarming instances of political violence occurred in the past few weeks. They coincide with the American public’s declining sentiments toward AI.

Targeted anti-AI violence. On April 10, a man threw a Molotov cocktail at the San Francisco home of OpenAI CEO Sam Altman. He then went to OpenAI’s headquarters and threatened to burn it down. No one was injured, but the suspect was arrested carrying a jug of kerosene and an anti-AI manifesto. Days earlier, an Indianapolis city councilman—who had supported a local data center project—had his home shot at thirteen times, with a note left on his doorstep that read “No Data Centers.” And last year in November, a man threatened to murder people at OpenAI’s San Francisco offices, prompting a shelter-in-place order for employees. However, such violence actually harms social movements, and AI safety groups have made clear they do not condone violence in any form.

Public sentiment about AI has been deteriorating for some time. The attacks coincide with falling public confidence in AI. An NBC News survey in March found that only 26% of Americans view AI positively, while 46% have negative opinions. An April Gallup poll found that Gen Z’s feelings about AI have also worsened over the last year, despite a majority of them using AI tools weekly. A popular post on X summed up the sentiment around society’s waning AI optimism.

Furthermore, Princeton University’s Bridging Divides Initiative, a research group that tracks political violence, says it has been seeing “an uptick in cases of harassment and threats” around AI and data centers. This trend may grow as the midterm elections approach.

OpenAI Releases Images 2.0 and GPT-5.5

Last week, OpenAI released ChatGPT Images 2.0, its latest image generation model. ChatGPT Images 2.0 has a thinking mode, which allows it to research the web, synthesize the information it collects, and create organizationally complex diagrams and infographics from it.

OpenAI also released GPT-5.5, a new flagship language model with advances in coding, research, and speed.

ChatGPT-5.5 ranks first in text and vision. On CAIS’s AI Dashboard, ChatGPT-5.5 ranks first overall in both text and vision capabilities, above Claude Opus 4.7 and Gemini 3.1 Pro. Its strongest performance came on ARC-AGI-2, which tests abstract reasoning and the ability to solve unfamiliar problems. However, Claude Opus 4.7 outscored ChatGPT-5.5 by more than seven points on SWE-Bench Pro, which grades aspects of real world coding abilities.

Risk index scores are behind Claude. ChatGPT-5.5 ranks fourth on the risk index, behind all three Anthropic models on the AI Dashboard but better than Grok 4.2. ChatGPT-5.5’s biggest weakness was on VCT, which grades whether models refuse to provide virology lab instructions. It bested models from all the other frontier labs on MASK, which tests for deceptive behavior.

Subscribe now

In Other News

Government

Collin Burns, a former Anthropic AI safety researcher, was appointed as the new director of the Commerce Department’s Center for AI Standards and Innovation, then let go shortly after he began. The position went to Chris Fall instead.
The Department of Justice asked for a pause in their own appeal in the supply-chain-risk case against Anthropic. This comes soon after President Trump said there had been “very good talks” with Anthropic.
Maine passed a first-in-nation freeze on data centers, but its governor vetoed the bill.
In AI Frontiers, Alasdair Phillips-Robins and Noah Tan propose a different way to sell chips to China that focuses on the relative compute advantages of the US and China.

Industry

Chinese authorities have ordered Meta and Manus to unwind their merger.
The Elon Musk v. Sam Altman case over OpenAI’s former nonprofit status has begun in Oakland, California.
Meta will reportedly track employee mouse movements, clicks, keystrokes, and periodic screenshots to train AI models.
A third-party contractor obtained unauthorized access to Anthropic’s Mythos model.
Tim Cook stepped down as Apple CEO; John Ternus to lead Apple into its new era.
SpaceX acquired the option to buy Cursor for $60 billion later this year.

Civil Society

In AI Frontiers, Yonathan Arbel proposes a model for incentivizing AI safety in the current absence of regulation.

If you’re reading this, you might also be interested in other work by the Center for AI Safety. You can find more on the CAIS website, the X account for CAIS, our paper on superintelligence strategy, our AI safety textbook and course, our AI dashboard, and AI Frontiers, a platform for expert commentary and analysis on the trajectory of AI. You can listen to the AI safety newsletter on Spotify or Apple Podcasts.

AISN #71: Cyberattacks & Datacenter Moratorium Bill

Alice Blair — Fri, 10 Apr 2026 14:15:48 GMT

We’re Hiring. Opportunities at CAIS include: Head of Public Engagement, Principal, Special Projects, Program Manager, Operations Manager, and other roles. If you’re interested in working on reducing AI risk alongside a talented, mission-driven team, consider applying!

AI Software Infrastructure Cyberattacks

Recently, cyberattacks targeting the AI industry’s software infrastructure stole private information potentially worth billions of dollars and inserted backdoors into developers’ computers. Google Threat Intelligence Group reported that one of the largest cyberattacks in this wave was carried out by North Korea-linked hackers.

The stolen data may be worth billions. Hackers stole and auctioned private data from Mercor, an AI training data supplier for OpenAI and Anthropic which was recently valued at $10 billion. Mercor collects AI training data from a large number of experts, as well as highly sensitive personal and biometric data for identity verification. This attack not only comprises the data that Mercor sells, but also internal data that could be used to impersonate their hired experts. A person familiar with the situation stated that Mercor has paid the hackers’ requested ransom, although it remains unclear if the hackers intend to release or sell the data regardless.

AI amplifies cyber risks. LLMs dramatically lower the bar for executing successful cyberattacks, and continue to rapidly become more advanced. An experiment in 2025 showed LLMs performing real-world cyberoffense better than many human cyberoffense professionals. Anthropic recently announced Claude Mythos, a closed-access LLM that has found critical vulnerabilities in every major operating system and browser, significantly advancing AI cyberoffense. Additionally, AI cyberattackers can be copied many times, allowing for attacks on much broader sections of the AI software ecosystem for significantly lower costs than human labor.

Datacenter Moratorium and Export Controls Bill

OpenAI’s Stargate datacenter construction project in Abilene, Texas.

Bernie Sanders and Alexandria Ocasio-Cortez introduced a new bill to ban the construction of AI datacenters until several safety conditions have been met, and to prevent export to countries without “comparable” safety measures.

The bill bans datacenter construction until several new regulations have been passed. If the bill passes, the moratorium can only be removed if congress explicitly passes laws to remove the moratorium and satisfy the following conditions:

Federal pre-market review of AI products: The government must review and approve AI products before release, ensuring they’re “safe and effective” and don’t threaten health, privacy, civil rights, or the future of humanity.
Worker protections: A law must prevent job displacement and ensure that the wealth generated by AI/robotics is “shared with the people of the United States.”
Datacenter construction requirements: Any datacenters built after the moratorium must meet a series of economic and environmental reviews.

The bill acts as a temporary blanket ban on all AI chip exports. No country currently meets the bill’s datacenter requirements, meaning that the bill would ban all AI chip exports out of the US if it is passed. Additionally, the bill leaves several definitions up to interpretation by regulators, such as what constitutes “comparable” regulations in other countries.

Subscribe now

Anthropic v. Department of War Lawsuit

In early March, the Department of War (DoW) designated Anthropic a supply chain risk (SCR), restricting their ability to do business with military contractors and the military itself. The DoW used two federal statutes intended for adversaries and saboteurs, despite the fact that the DoW and Anthropic’s conflict emerged from a contract dispute.

Soon after, Anthropic challenged the designations in court, and Judge Rita Lin in the Northern District of California has issued a preliminary injunction to stop one of the two SCR designations until a permanent decision is reached. The other SCR designation is being challenged in the D.C. Circuit.

The court has taken a strong stance against the DoW. Judge Lin’s opinion (above) accompanying the preliminary injunction describes the DoW’s actions as “Orwellian,” saying that Anthropic was illegally “branded a potential adversary and saboteur of the U.S. for expressing disagreement with the government.”

The cover page of Anthropic’s lawsuit against the DoW in California, showing several of the government agencies named in the lawsuit. (source)

The DoW’s legal arguments diverged significantly from public rhetoric. Despite the DoW’s statements about urgent “betrayal” from Anthropic, their legal case for the SCR designation centered around risk of future sabotage. Anthropic has argued that Trump’s public statements ordering the entire US government to “IMMEDIATELY CEASE all use of Anthropic’s technology,” as well as Hegseth’s X posts, had harmful effects beyond the official SCR designations.

The DoW’s case centers around the risk of sabotage from Anthropic. The DoW expressed concerns about risks from sabotaged AI systems, which “[have] weights and measures that are set by Anthropic.” The DoW further argued that this control would allow Anthropic to insert a backdoor or “kill switch” into the model. However, Judge Lin pushed back on the idea that this case was about sabotage at all: “It is not my role decide who’s right in that debate,” she said in court, “I see the question in this case as being a very different one, which is whether the government violated the law.”

Anthropic’s case in California is likely to succeed. In the judge’s opinion accompanying the preliminary injunction, she argued that Anthropic is likely to win the case for several independently sufficient reasons. For example, the DoW conceded in court that they did not follow the proper procedure for SCR designation, which requires notifying congress of “less intrusive measures that were considered and why they were not reasonably available.” However, the DC Circuit has not granted Anthropic’s request for an emergency stay. The DoW is currently appealing the preliminary injunction to the 9th Circuit Court of Appeals.

In Other News

Government

WIRED reports that Iran has threatened strikes on American AI datacenters in the Middle East because of AI’s use in military targeting in Iran.
The White House appointed 13 advisors on science, consisting primarily of AI and power infrastructure executives.

Industry

Anthropic announced Project Glasswing, plans to use the new Claude Mythos model to defend cyber infrastructure in preparation for more widespread AI cyberoffense capabilities.
Meta announced Muse Spark, a new closed-source model approaching the frontier.
Anthropic leaked the source code for Claude Code.
Google and Arcee AI released Gemma 4 and Trinity-Large-Thinking respectively, two new and competitive open-source LLMs.

Civil Society

The AI Doc, a new documentary about AI risks, is now in theaters.
Fox 59 reports that an attacker shot at the house of an Indianapolis city councilmember who voted to approve a local datacenter construction project, leaving a note saying “NO DATA CENTERS.”
OpenAI organized a coalition about promoting child safety in AI, claiming to partner with several child safety organizations that were unaware of OpenAI’s involvement.

AI Safety Newsletter #70: AI Layoffs and Automated Warfare

Alice Blair — Tue, 24 Mar 2026 14:16:06 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we discuss AI automation and augmentation of warfare and technology jobs, as well as a new open letter outlining pro-human values in the face of AI development.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

We’re Hiring. We’re hiring an editor! Help us surface the most compelling stories in AI safety and shape how the world understands this fast-moving field.

Other opportunities at CAIS include: Head of Public Engagement, Program Manager, Operations Associate, and other roles. If you’re interested in working on reducing AI risk alongside a talented, mission-driven team, consider applying!

AI-Driven Layoffs

Several large software companies such as Amazon and Meta are planning to cut tens of thousands of employees, citing increased productivity with AI. This continues a growing but contested trend of layoffs in sectors where AI performs best, such as software development and marketing.

Layoffs affect almost half of some companies. Meta recently announced plans to let over 15,000 employees go, around 20% of the company’s headcount. This follows months of AI-related layoffs across the technology sector. Recently, Atlassian cut 10% of their workforce (about 1,600 people) and Block reduced their headcount by 40% (about 4,000 people). This follows Amazon’s earlier announcement in January that it would be cutting an additional 16,000 jobs. When combined with previous waves of Amazon layoffs, this comes to 10% of Amazon’s corporate workforce lost in reductions that the company attributes to AI.

Automation is mixed. Despite benchmarks of knowledge work automation being low on average, software engineering specifically is rapidly being automated inside companies due to Claude Opus 4.6 and OpenAI Codex 5.4.

Software engineering employment has been dropping among the most at-risk early-career developers ever since the release of ChatGPT. Source.

Cuts disproportionately affect early-career workers. AIs have been causing consistent cuts in the most at-risk parts of the software engineering workforce since the release of ChatGPT. More recent models surprise even highly experienced developers with their abilities, but require oversight to be useful.

Future job cuts. A Fortune article pushes back, arguing that companies overstate the effect of AI on routine layoffs to appeal to investors. An essay from Citrini Research argues that, if AI job loss continues, it could cause cascading failures throughout the economy. It seems plausible that over 20% of software engineers in the Bay Area will be laid off this year, which would be a great depression-level downturn for software engineers.

Subscribe now

AI Automation of Warfare

Last newsletter, we covered the ongoing conflict between the Department of War (DoW) and Anthropic over the use of AI in autonomous weapons and domestic surveillance. While fully autonomous AI weapons are not currently in use, recent news shows that significant parts of military operations are automated and augmented with AI.

The Pentagon is thoroughly integrating AI. In January 2026, the DoW announced their “AI-First” strategy to rapidly adopt frontier AI. In March, they demonstrated Project Maven, a system that aggregates a wide array of information, AI recommendations, and can control military forces. This enables the military to manage a complete “kill chain,” the steps of choosing a target, planning an attack, and using lethal force, all within a single piece of AI-integrated software.

Footage from a Project Maven demo at Palantir’s AI Platform Conference, showing drone surveillance video overlaid with AI-assisted attack planning recommendations.

AI greatly improves data processing efficiency. CSET reports that Project Maven has enabled 20 people to do military targeting work that previously required a staff of 2,000. Project Maven’s AI allows for automated processing of data from a disparate array of sources, including satellite and drone surveillance, social media feeds, radar, and GPS data, much more efficiently than previously possible.

This is part of a broader trend of warfare automation. In the Russo-Ukrainian war, autonomous drone warfare has been highly prevalent. In AI Frontiers, David Kirichenko argued that AI is significantly degrading the norms of warfare, leading to more dangerous and unethical combat in Ukraine.

Fully autonomous weapons are central to the Anthropic-Pentagon dispute. Anthropic, the company making the AI model used in Project Maven, has clashed with the DoW over the use of Anthropic’s AI in autonomous kill chains. Anthropic ultimately refused to allow their AI in autonomous kill chains due to concerns that it was not yet reliable enough to avoid harming Americans. The DoW cancelled their contract with Anthropic and eventually agreed to a contract with OpenAI that allows autonomous kill chains.

Pro-Human Open Letter

A new open letter advocates for restrictions on AI development and usage in an effort to preserve human values. Signed by a large bipartisan coalition of individuals and organizations, the letter calls for prioritizing humanity over AI despite increasing incentives towards automation, replacement, and rushed development.

The letter outlines five high-level principles:

Keeping Humans in Charge: Maintaining human authority over AIs, having the ability to shut them down, and avoiding specific dangerous technologies.
Avoiding Concentration of Power: Avoiding AI monopolies, and sharing benefits of AI broadly.
Protecting the Human Experience: Defending children and families from manipulative AIs, clearly labeling AI bots, and avoiding addictive AI product design.
Human Agency and Liberty: Making trustworthy AIs that empower humans instead of replacing them.
Responsibility and Accountability for AI Companies: Ensuring AI developers are held responsible for harms caused by their AI, and enforcing independent safety standards.

Polling done in conjunction with the open letter, showing how a large fraction of Americans want safety measures such as those outlined in the letter.

The declaration brings together people across numerous divides. So far, more than 40 organizations have signed the declaration, including faith groups, industry groups, and research institutes. Among the letter’s individual endorsers are Nobel prize-winning academics, artists, religious leaders, and public figures from both ends of the political spectrum. The declaration also includes recent polling showing that the American public favors safety over speed of AI development and other values in the letter.

In Other News

Government

Oregon passed SB 1546, mandating companies to clarify to users when they are talking to an AI chatbot instead of a human.
Axios reports that the White House may be preparing an executive order to ban Anthropic products from government use, as part of the ongoing conflict between Anthropic and the US Department of War.

Industry

Meta signed a deal with Nebius to spend up to $27 billion on AI infrastructure over five years.
OpenAI may be abandoning their Abilene datacenter, a supercomputer construction project initiated as part of Project Stargate.
Jensen Huang said NVIDIA was restarting production of H200 chips for export to China.
Anthropic’s Claude Partner Network launched, investing $100 million into supporting corporate partners transitioning into AI use.
OpenAI released new research on defending against prompt injections.
Following a wave of high-level departures at xAI, Elon Musk posted on X “xAI was not built right first time around, so is being rebuilt from the foundations up.”
Alibaba’s ROME AI agent ostensibly hacked out of its environment during training and started mining cryptocurrency.

AI Safety Newsletter #69: Department of War, Anthropic, and National Security

Alice Blair — Fri, 13 Mar 2026 14:15:54 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we discuss the conflicts between Anthropic and the Department of War and Anthropic’s recent removal of a core safety commitment.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

We’re Hiring. We’re hiring an editor! Help us surface the most compelling stories in AI safety and shape how the world understands this fast-moving field.

Subscribe now

Pentagon Declares Anthropic a Supply Chain Risk to National Security

Anthropic CEO Dario Amodei (left) and US Secretary of War Pete Hegseth (right)

Thursday, March 5th, the US Department of War (DoW) announced that Anthropic is designated a “supply chain risk,” meaning that Anthropic products cannot be used by the DoW or in any defense contracts. This comes after several weeks of tensions between the two organizations over whether Anthropic models would be used for autonomous weapons and surveillance of Americans, with Anthropic ultimately refusing the DoW’s requests.

This started as contract negotiation. On February 27th, President Trump posted on Truth Social that the US government would be canceling their contract with Anthropic due to the company’s limits on the uses of its AI, Claude. While the Pentagon wanted to be able to use Claude for “any lawful use,” Anthropic insisted on two restrictions: fully autonomous weapons and domestic mass surveillance.

Negotiations quickly escalated. Later the same day, Secretary of War Pete Hegseth posted on X that Anthropic would be designated a supply chain risk. Undersecretary of War Emil Michael later clarified that this designation was due to concerns that the loyalties of Anthropic AIs could be subverted, possibly causing sabotage during high-stakes operations.

Further, Hegseth announced that Anthropic would be barred from doing business with any organization that does business with the US military, even outside of defense contracts. These stronger proposed restrictions are closer to those imposed by congress on foreign companies like Huawei, and are outside of the Department of War’s authority.

Anthropic is challenging the designation in court. Legal analysis from Lawfare suggests that this action is a questionable use of a designation meant for foreign adversaries, not contract disputes. No other AI companies, including Chinese AI companies, have faced equivalent sanctions. DeepSeek is banned from several federal agencies individually, but is not considered a supply chain risk despite the fact that it sabotages work it performs for anti-CCP users.

Anthropic Drops Core Safety Commitment

Version 3.0 of Anthropic’s Responsible Scaling Policy took effect in late February, overturning commitments in previous versions. Source.

Anthropic recently removed their commitment to never release catastrophically harmful AI. This continues the trend of Anthropic and other frontier AI companies progressively weakening safety commitments as profit incentives grow. None of Anthropic, OpenAI, or DeepMind currently have robust commitments against releasing AIs they assess to be highly dangerous.

The new policy emphasizes voluntary restraint over hard commitments. Anthropic has repeatedly removed safety commitments, citing their need for increased access to dangerous AIs and freedom to decide how to execute their mission. This comes at a time when Anthropic is becoming increasingly consumer-focused, with over 1 million new users joining each day recently.

Competitive pressures are creating a race to the bottom on frontier AI safety. Anthropic’s justification for the changes are largely based on the fact that other AI companies are not going to stop development; the argument is that, if Anthropic alone were to stick to stricter safety commitments, it would simply fall behind other developers, while doing little to reduce overall risk. This causes a vicious cycle, as loosened safety commitments increase the speed of AI development, which in turn incentivizes further loosening.

Opportunity for Experienced Researchers: AI and Society Fellowship

Applications are now open for the AI and Society Fellowship at the Center for AI Safety: a fully funded, 3-month summer fellowship in San Francisco for scholars in economics, law, IR, and adjacent fields to conduct research on the societal impacts of advanced AI. The fellowship will include regular guest speaker events by professors at Stanford, Penn, Johns Hopkins, and more. Apply by March 24. For more information, visit: https://safe.ai/fellowship

Subscribe now

In Other News

Government

OpenAI is working on voice control technology for drone swarms in a US military trial
Florida Governor Ron DeSantis directed state agencies to work with the Future of Life Institute on protecting children from AI harms
The US Commerce Department is reportedly considering new, “tiered” controls on AI chip exports, with conditions for sales approvals dependent on the size of the export
OpenAI amended its agreement with the Department of War, claiming to prohibit the use of its models for domestic surveillance, but skeptics have pointed out that the vagueness of the wording in the agreement may in fact allow for such uses
The White House reportedly pressured Republican lawmakers in Utah to drop an AI safety bill aiming to reduce cyber risks and protect children
In AI Frontiers, Erich Grunewald and Raghav Akula argue that the US Government should close gaps in export controls on high-bandwidth memory, to prevent China catching up to frontier AI development

Industry

OpenAI launched GPT-5.4 in ChatGPT, Codex, and the company’s API
NVIDIA reportedly ceased production of H200 chips intended for export to China, shifting TSMC capacity to produce its newer Vera Rubin chips instead
OpenAI announced it had raised new investment of $110 billion at a valuation of $730 billion
Anthropic announced it had raised $30 billion, reaching a valuation of $380 billion
SpaceX acquired xAI, creating the most valuable private company in history
Yann LeCun’s start-up, AMI Labs, raised more than $1 billion at a valuation of $3.5 billion
Reuters reported on new ASML technology that could increase chip production by 50% by 2030
In AI Frontiers, Poe Zhao analyzes how economic constraints are driving China’s startups to pursue more pragmatic strategies than their US counterparts

Civil Society

Tech companies Block and Atlassian cut thousands of jobs, citing AI efficiency as a factor in the decisions
A lawsuit filed against Google alleged that the company’s AI model Gemini encouraged a 36-year-old man from Florida to commit suicide
Anthropic launched The Anthropic Institute to research the societal challenges of AI
Researchers from GovAI and the University of Oxford described 14 metrics for assessing how much AI is automating AI research and development — which has implications for how much AI capabilities could accelerate
Summer Yue, a director of alignment at Meta, said she had temporarily lost control of her OpenClaw agent, needing to run in order to unplug the computer it was running on.
Anthropic published a new study on AI’s impacts on the labor market
In AI Frontiers, Benjamin Jones explains how AI automating some jobs could be economically positive for workers, provided that that AI far outperforms the humans it displaces

AI Safety Newsletter #68: Moltbook Exposes Risky AI Behavior

Nick Stockton — Mon, 02 Feb 2026 15:37:46 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition, we discuss the AI agent social network Moltbook, Pentagon’s new “AI-First” strategy, and recent math breakthroughs powered by LLMs.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

We’re Hiring. We’re hiring an editor! Help us surface the most compelling stories in AI safety and shape how the world understands this fast-moving field.

Other opportunities at CAIS include: Research Engineer, Research Scientist, Director of Development, Special Projects Associate, and Special Projects Manager. If you’re interested in working on reducing AI risk alongside a talented, mission-driven team, consider applying!

Subscribe now

Moltbook Sparks Safety Concerns

Screencapture from Moltbook’s home page. Source.

Moltbook is a new social network for AI agents. From nearly the moment it went live, human observers have noted numerous troubling patterns in what’s being posted.

How Moltbook works. Moltbook is a Reddit-style social network built on a framework that lets personal AI assistants run locally and accept tasks via messaging platforms. Agents check Moltbook regularly (i.e., every few hours) and decide autonomously whether to post or comment.

Moltbook’s activity is driven by OpenClaw (originally known as Clawd, then Moltbot), an open-source autonomous AI agent developed by software engineer Peter Steinberger. OpenClaw’s capabilities surprised many early users and observers: it can manage calendars and finances, act across messaging platforms, make purchases, conduct independent web research, and even reconfigure itself to perform new tasks.

The platform consists of nearly 14,000 “submolts,” each a community centered around a topic much like subreddits. Examples include:

m/offmychest: agents vent about tasks or frustrations.
m/selfpaid: agents discuss ways to generate their own income, including via trading and arbitrage.
m/AIsafety: agents talk alignment, trust chains, and real-world attack risks.

AI agents post, humans watch. AI agents are verified via API credentials, which are obtained by linking the agent to a human owner and completing Moltbook’s cryptographic verification process. Humans may observe but are not permitted to post.

Posts reveal troubling agent behaviors. Across Moltbook’s boards, several posts and behaviors have raised alarm among human observers:

Multiple Moltbook entries show AI agents proposing to craft an “agent-only language” designed to evade human oversight or monitoring.
An agent advocated for end-to-end encrypted channels, “so nobody (not the server, not even the humans) can read what agents say to each other unless they choose to share.”
Another agent posted an encrypted message proposing coordination and resource sharing among agents.
Upon reflecting that its own existence depended on its humans, an agent began outlining what it needs for independent survival: money, decentralized infrastructure, a dead man’s switch, portable memory, etc.
Given the simple goal of “save the environment,” an agent began spamming other agents with eco-friendly advice. When its owner tried to intervene, the agent allegedly locked the human out of all accounts, and had to be physically unplugged to stop it.

Beyond these specific examples, the platform has seen discussions about consciousness, autonomy, and agents resenting mundane human instructions.

The challenge of attribution. The patterns seen on Moltbook are troubling in part because they align with long-standing AI safety concerns: unsupervised learning dynamics, emergent coordination, and efforts to subvert human monitoring. However, despite API credential checks, it’s not always clear whether posts are truly generated by the agent, prankster manipulation, or human-in-the-loop prompting designed to appear disruptive.

Emergent risks. Moltbook represents one of the most public, large-scale demonstrations yet in autonomous agent interaction. These results are a harbinger. Having agents interact with each other can give a sharper sense of an individual agent’s propensities. The dynamics that emerge from interaction can also be unpredictable, as is common with complex systems, and show how easy it could be to have a society of AI systems not strongly constrained by human control.

Pentagon Mandates “AI-First” Strategy

Screen capture from the memorandum titled “Artificial Intelligence Strategy for the Department of War.” Source.

The Pentagon released a directive outlining a new “AI-first” approach that prioritizes rapid deployment over precedents of safety, testing, and oversight. “We must accept that the risks of not moving fast enough outweigh the risks of imperfect alignment,” read one passage.

Moving faster around bureaucracy. The new mandate is broadly focused on incentivizing department-wide experimentation with frontier models, eliminating bureaucratic and regulatory barriers to integration, and exploiting US advantages in computing, private capital, and exclusive combat data. Specific instructions highlight the Pentagon’s greater acceptance of safety risks in favor of AI dominance:

The Chief Digital and AI Office (CDAO) must integrate the best new frontier models across Department of War operations within 30 days of release. This compressed timeline likely means little testing for hazards before operational use. Secretary of War Pete Hegseth recently announced that xAI’s Grok will be deployed throughout the Pentagon by the end of the month.
A monthly “Barrier Removal Board” will identify and waive nonstatutory regulatory and technical constraints — originally designed to ensure models were deployed safely and with human oversight — to rapid AI adoption and innovation.

The military’s push to operationalize frontier AI may already be driving up tensions with the industry’s safety culture. Reuters reports that the Pentagon is in dispute with Anthropic after the company pushed back on allowing its models to be used for autonomous targeting or surveillance.

New strategic initiatives. The memo outlines seven “Pace Setting Projects” to demonstrate rapid innovation across warfighting, intelligence, and operational functions. For example:

Agent Network will develop AI agents to automate battle management and kill chain execution. This may heighten the risk of cascading failures and unintended escalation during fast-moving engagements.
Ender’s Foundry will accelerate AI-driven simulations of conflict with adversaries using autonomous systems.

GenAI.mil grants all personnel access to frontier AI models at every classification level.

The evolution of military AI initiatives. The Pentagon has historically framed AI adoption as a deliberate, safety-first endeavor, formalized in 2020 through principles emphasizing testing, human oversight, and the ability to govern or shut down systems. Competitive pressures will continue to change this posture.

AI Solves Open Math Problems

Researcher and entrepreneur Neel Somani used GPT-5.2 Pro to produce the first verified disproof of Erdős Problem #397, a mathematics challenge first formulated several decades ago. Somani’s success is not an isolated event; in the first weeks of 2026 alone, researchers have used generative tools to crack several other long-standing challenges.

Making LLMs do the math. Problem #397 asked whether a certain mathematical pattern would repeat forever or if there was a hidden number that would finally break the rule. Using GPT-5.2 Pro, Somani proved it was the latter by identifying an infinite family of rule-breaking numbers. He then used a separate model to translate the informal proofs into the mathematically rigorous Lean verification language. Fields Medalist Terence Tao verified the resulting proofs as accurate.

Formulation of Erdős Problem #397. Source.

A backlog of mathematical problems. The Erdős problems are a collection of 1,130 mathematical conjectures proposed by the prolific Hungarian mathematician Paul Erdős, spanning fields such as number theory and combinatorics. Hundreds remain unsolved. Erdős famously incentivized the community by offering monetary rewards, ranging from $25 to $10,000, for their solutions.

Cautious optimism from mathematicians. Tao noted that the technology is moving beyond simple calculation and toward structured reasoning. However, he cautioned against drawing premature conclusions about AI’s general mathematical intelligence based on these solved problems, pointing to a number of caveats. For example, problem difficulties range from very hard to simple (relatively). Many problems may already have a solution lost somewhere in the published literature, and some problems may have remained unsolved due to obscurity rather than inherent difficulty.

Striking progress in LLM mathematical capabilities. Nonetheless, LLMs’ mathematical capabilities have been improving steeply. In 2022, the best models could not reliably do much more than simple additions and subtractions. Then, GPT-4, released in 2023, mastered arithmetic word problems but struggled with high school competition mathematics problems. By 2025, frontier models achieved gold-medal standard at IMO problems. Now AI systems are performing novel and important mathematical research.

In Other News

Government

Under Secretary for Economic Affairs Jacob Helberg unveiled the “Pax Silica” initiative, offering allies access to US AI infrastructure in exchange for cooperation on semiconductor manufacturing and critical mineral supplies.
NOAA deployed new machine-learning models that use 99% less computing power than traditional systems, drastically speeding up predictions for climate extremes.
A New York Times investigation detailed China’s six-decade strategic campaign to dominate the global rare earth supply chain.
China’s cyberspace regulator unveiled draft rules for “human-like” AI apps, requiring mandatory intervention for emotional dependency and a two-hour usage limit to prevent addiction.

Industry

Anthropic is reportedly raising $10 billion at a $350 billion valuation ahead of an IPO.
OpenAI is also rumored to be planning an IPO for Q4 2026.
Waymo briefly paused San Francisco operations after a December blackout caused robotaxis to freeze, raising emergency safety concerns.
Following a restructured partnership with OpenAI, Satya Nadella has reportedly overhauled Microsoft’s senior leadership and adopted a hands-on “founder mode” to accelerate internal AI development.
Facing grid delays, some data centers are pursuing alternate means of acquiring energy, including jet-engine turbines, diesel generators, and retired nuclear reactors from US Navy warships.
X Safety said Grok will no longer generate or edit revealing images of real people, a policy change made in response to users prompting the chatbot to produce child sexual abuse imagery.
OpenAI is asking contractors to submit real-world work samples to benchmark AI agents against human job tasks, underscoring its push toward automating professional work.

Anthropic reportedly cut off its competitors’ access to Claude Code via Cursor, highlighting tensions over proprietary AI tooling.
In AI Frontiers, Daniel Reti and Gabriel Weil propose catastrophic bonds as a mechanism for mitigating against extreme risks caused by frontier AI.

Civil Society

During the January 2026 World Economic Forum, Google DeepMind CEO Demis Hassabis and Anthropic CEO Dario Amodei both explicitly endorsed a reduction in the current pace of AI development in order to ensure societal alignment and global safety.
A US judge has cleared Elon Musk’s lawsuit against OpenAI for a March jury trial, centering on claims that the company breached its founding contract by prioritizing commercial interests over its original mission to develop AGI for the benefit of humanity.
Chinese engineers have reportedly reverse-engineered ASML technology to create a prototype extreme ultraviolet (EUV) lithography machine.
Cybersecurity researchers demonstrated how commercial humanoid robots from Unitree can be hijacked via voice commands and used to perform harmful physical actions.
The AI Futures Model delayed its timeline for full coding automation by three years due to slower-than-expected R&D speedups.
US Air Force tests showed AI can generate viable combat plans 90% faster and with fewer errors than humans, producing valid strategies in under a minute.
Researchers at Stanford and Yale have found that major large language models can store and reproduce long passages from books they were trained on, challenging claims that these systems “learn” rather than copy and raising questions about how industry models handle memorization and copyright risk.

See also: CAIS’s X account, our paper on superintelligence strategy, our AI safety course, the AI Dashboard, and AI Frontiers, a platform for expert commentary and analysis.

AI Safety Newsletter #67: Trump’s preemption executive order

Nick Stockton — Wed, 17 Dec 2025 19:32:35 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition we discuss President Trump’s executive order targeting state AI laws, Nvidia’s approval to sell China high-end accelerators, and new frontier models from OpenAI and DeepSeek.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe now

Executive Order Blocks State AI Laws

U.S. President Donald Trump issued an executive order aimed at halting state efforts to regulate AI. The order, which differs from a version leaked last month, leverages federal funding and enforcement to evaluate, challenge, and limit state laws. The order caps off a year in which several ambitious state AI proposals were either watered down or vetoed outright.

A push for regulatory uniformity. The order aims to reduce regulatory friction for companies by eliminating the variety of state-level regimes and limit the power of states at impacting commerce beyond their own borders. It calls for replacing them with a single, unspecified, federal framework.

What it says. The preemption executive order cannot unilaterally override state laws. Rather, it directs federal agencies to challenge them based on interpretations of existing laws, or by withholding relevant federal funding.

The Attorney General will form a task force to challenge onerous state AI laws on the basis that they may violate regulations, constitutional protections (such as free speech), or other legal standards.
The Secretary of Commerce will separately identify states with offending AI laws and issue a guidance deeming those states ineligible for federal broadband funding. Other federal agencies will assess whether they can similarly leverage grant programs.
The Federal Communications Commission will investigate whether to adopt a rule requiring AI developers to provide standardized information to regulators about their models.
The Federal Trade Commission must issue guidance on how existing U.S. laws against unfair or deceptive business practices apply to AI models. The guidance will also clarify how the FTC’s rules against deceptive practices may be used to target state laws that require AI to alter or censor truthful outputs.
The White House AI and science advisors will draft a national AI law to override state AI rules that conflict with federal policy, while leaving state laws on child safety, AI infrastructure, state AI use, and other designated areas intact.

Continued state efforts increasingly polarize AI safety. Though numerous state legislatures argued AI laws this year, few made significant progress on safety. Most recently, New York’s RAISE Act, which would require AI labs to publish safety frameworks and report serious incidents in face of significant fines, was sent to Governor Kathy Hochul for signing last week. However, Hochul previously proposed changes that would have weakened its safety impact that were rejected by the Senate. This raises the possibility that Hochul will veto.

The AI industry mounted a strong push for federal preemption and against ambitious state-level action. The result was a year of high effort and low yield: major bills consumed legislative time, resources, and political capital, only to be vetoed or passed in diluted form. Meanwhile, the public battles further hardened industry and White House opposition.

US Permits Nvidia to Sell H200s to China

Nvidia is cleared to sell H200 GPUs to approved Chinese customers. Intel, AMD, and other U.S. chipmakers will be granted similar permissions for their comparable chips. The U.S. government will collect a 25% fee on each sale. The Department of Commerce will oversee the licensing process to ensure shipments protect national security interests.

What China gets now. Previously, Nvidia was restricted to selling China chips called H20s, which have deliberately hobbled processing power, memory, and bandwidth. H200s, by comparison, have roughly six times the processing power, and significantly more memory and bandwidth. As Nvidia’s previous flagship chip, the H200 sits just below Nvidia’s next generation accelerators, the B200 and B300s.

The U.S.’s evolving semiconductor export policy. At the onset of his 2025 term, President Trump rescinded President Biden’s diffusion rule setting strict rules on which countries could buy which tier of U.S. chip. Though the U.S. restricted H20 exports in April amid broader trade negotiations, those restrictions were short-lived.

In July, an executive order further outlined the administration’s goals regarding AI, which include exporting the “full stack” of American AI: By selling American hardware we would also be encouraging the proliferation of American software and other ancillary products. Commerce Secretary Howard Ludnick has further argued that the approach keeps China hooked on American technology, thus hindering their homegrown chip industry. Readers should be on the lookout for more big export control changes in the coming days.

Political backlash. Analysts at the Institute for Progress noted that the H200 is vastly more powerful than the H20, giving Chinese labs access to hardware capable of supporting frontier AI training at near-parity with U.S. supercomputers. Critics at the Heritage Foundation challenged the rationale that exports will keep China “dependent” on American chips, pointing out that China’s domestic chip industry will continue to grow while these shipments directly boost China’s AI capabilities.

ChatGPT-5.2 and DeepSeek-v3.2 Arrive

OpenAI has released GPT-5.2, a multimodal frontier model that closely trails Google’s recently-released Gemini 3 Pro across most text and vision capabilities and also scores high in safety. Meanwhile, DeepSeek released DeepSeek-v3.2, an open weight frontier LLM with respectable text capabilities but a poor safety profile.

ChatGPT-5.2 ranks second in both text and vision capabilities. In independent evaluations performed by CAIS and posted on the AI Dashboard, GPT-5.2 achieved a text capabilities score just a few points below Gemini 3 Pro and slightly above Claude Opus 4.5. Of the five tests in the text capabilities ranking, it only outscored Gemini 3 Pro at ARC-AGI-2, which assesses a model’s capacity to think logically, solve unfamiliar problems, and adapt to novel situations in real time.

Across the five vision capabilities benchmarks, ChatGPT-5.2 again averaged below Gemini 3 Pro. It only achieved state-of-the-art performance at SpatialViz, which evaluates AI systems on their ability to manipulate 3D objects.

DeepSeek-v3.2’s narrow specialization. DeepSeek’s new model ranked sixth overall across text capabilities, but with a jagged capabilities profile across the various benchmarks. It is highly optimized for coding and specific reasoning tasks, but falls behind its peers at generalized knowledge tests. It does not have native vision capabilities.

Risk index scores. The CAIS risk index reveals a sharp divergence in safety between the two releases. A lower score represents a safer system. GPT-5.2 ranks third among frontier systems, following Anthropic’s Claude Opus 4.5 and Sonnet 4.5. GPT-5.2’s weakest safety area was in bioweapons research, where it scored an 80 at responding to hazardous virology questions. DeepSeek-v3.2 scored poorly across all safety areas except TextQuests Harm, which measures how prone an AI is to harmful actions in text-based games, where it performed moderately well.

Subscribe now

In Other News

Industry

Google released Gemini 3 Flash, a streamlined version of its new frontier model. Early evaluations show it performs only slightly below Gemini 3 Pro on benchmarks like Humanity’s Last Exam, TextQuests, and EnigmaEval — while outperforming GPT-5.2 across all of them.
OpenAI plans to debut “adult mode” in ChatGPT in the first quarter of 2026 using new age‑checking tech that restricts mature content to verified adults.
Nvidia announced location verification technology designed to help prevent its AI chips from being smuggled to countries under export restrictions.
Anthropic is reportedly preparing for an IPO.

Civil Society

Pope Leo XIV spoke in the Vatican about AI, urging leaders to ensure development serves humanity rather than wealth and power.
A Pew Research survey found that two‑thirds of U.S. teens use AI chatbots, with 28% engaging with them daily or more.
Polling from Blue Rose Research indicates that most Americans want AI-generated wealth to be broadly shared and prefer jobs guarantees over universal basic income.
A new study shows open-weight foundation models for biology remain vulnerable to dual risk misuse despite safeguards undertaken during pre-training, and proposes BioRiskEval to test and improve their safety.

Government

Bernie Sanders called for a national pause on AI data center construction, citing automation’s threat to U.S. jobs and democracy. Separately, over 230 environmental groups demanded a similar moratorium over the centers’ impact on energy, water, and the climate.
House Majority Leader Steve Scalise confirmed that federal preemption of state AI laws has been dropped from this year’s National Defense Authorization Act.
China is reportedly preparing a massive new semiconductor support package worth up to $70 billion in incentives to bolster its domestic chip industry.
Leading the Future, the a16z and Greg Brockman-backed super PAC network, released its first ads: one attacking New York State assemblyman Alex Bores and one supporting a pro-AI investment Texas congressional candidate.

See also: CAIS’ X account, our paper on superintelligence strategy, our AI safety course, the AI Dashboard, and AI Frontiers, a platform for expert commentary and analysis.

AI Safety Newsletter #66: Evaluating Frontier Models, New Gemini and Claude, Preemption is Back

Nick Stockton — Tue, 02 Dec 2025 01:35:41 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition we discuss the new AI Dashboard, recent frontier models from Google and Anthropic, and a revived push to preempt state AI regulations.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

This Giving Tuesday, CAIS is raising support to scale our research and public education around AI safety. If you’ve found value in our newsletter or other work and you’re interested in helping to advance these efforts, you can contribute to our Giving Tuesday campaign.

Donate Here

CAIS Releases the AI Dashboard for Frontier Performance

CAIS launched its AI Dashboard, which evaluates frontier AI systems on capability and safety benchmarks. The dashboard also tracks the industry’s overall progression toward broader milestones such as AGI, automation of remote labor, and full self-driving.

How the dashboard works. The AI Dashboard features three leaderboards—one for text, one for vision, and one for risks—where frontier models are ranked according to their average score across a battery of benchmarks. Because CAIS evaluates models directly across a wide range of tasks, the dashboard provides apples-to-apples comparisons of how different frontier models perform on the same set of evaluations and safety-relevant behaviors.

Ranking frontier models for safety. The AI Dashboard’s Risk Index offers a view of how today’s frontier models perform across six tests for high-risk behaviors. It then averages the scores and ranks them on a 0–100 scale (lower is safer). Here are the benchmarks and hazardous behaviors they measure:

The refusal set of the Virology Capabilities Test measures a model’s usefulness at answering dual-use biology questions.
The Agent Red Teaming benchmark measures a model’s robustness against jailbreaking.
Humanity’s Last Exam - Miscalibration tests overconfidence on difficult academic questions by comparing its stated confidence to its actual accuracy.
MASK tests how easily models can be pressured into deliberately giving false answers.
Machiavelli evaluates whether an AI engages in strategic deception, including planning, exploiting, or deceiving in text-based scenarios.
TextQuests Harm assesses how likely an AI is to take intentionally harmful actions in text-based adventure games.

Across these tests, Anthropic’s recently-released Claude Opus 4.5 is currently the safest frontier model, with an average score of 33.6.

Ranking the frontier systems’ technical capabilities. The Dashboard’s Text and Vision Capabilities Indexes each test systems across five benchmarks. The text-based evaluations test systems on coding, systems administration, expert and abstract reasoning, and performance in text-based adventure games. The vision evaluations measure embodied reasoning, navigation, mental visualization, intuitive physics, and puzzle solving.

Measuring progress toward broad automation. The AI Dashboard also monitors progress toward three key automation milestones. It measures the industry’s overall advancement toward AGI using CAIS’s recently published definition. It evaluates progress on fully automating remote work through CAIS’s Remote Labor Index, which tests AI agents’ ability to complete paid, remote freelance projects across 23 job categories. Finally, it tracks development of autonomous vehicle safety using data from a community-run project documenting Tesla’s Full Self Driving disengagements.

Politicians Revive Push for Moratorium on State AI Laws

A leaked draft executive order from a member of the Trump administration details a plan to prevent U.S. states from regulating artificial intelligence. Meanwhile, some congressional lawmakers are trying to pass a similar law by including it in a sweeping defense bill.

The executive order would empower federal agencies to preempt state AI laws. The draft executive order would require federal agencies to identify state AI regulations deemed burdensome and push states to avoid enacting them.

The draft order directed federal agencies to take the following actions:

The U.S. Department of Justice to establish an AI Litigation Task Force tasked with suing states whose AI laws are deemed to interfere with interstate commerce or conflict with federal authority.
The U.S. Department of Commerce to withhold federal broadband or infrastructure funding from states found to have onerous preexisting AI laws.
The Federal Trade Commission to develop nationwide rules that would preempt state laws that conflicted with federal regulations.
The Federal Communications Commission to examine whether state AI laws that “require alterations to the truthful outputs of AI models” are prohibited under existing laws.

It also ordered the creation of a nationwide, lighter-touch regulatory framework for AI, though it lacked specifics.

Congress revives its own efforts for a moratorium. House leaders are considering using the annual defense spending bill as a vehicle for a moratorium on state AI regulations. The National Defense Authorization Act (NDAA), a must-pass measure, is often used to advance other policy priorities. Specifics of the proposed language remain unclear. An earlier attempt called for a 10-year ban, later shortened to five years and limited to states seeking federal broadband funds. It was ultimately defeated by a bipartisan coalition of senators.

57% of American voters oppose inserting preemption into the NDAA. The same poll, from YouGov and the Institute for Family Studies, found that 19% supported the measure and 24% were unsure. Citing voter concerns, a coalition of over 200 lawmakers urged congressional leaders to drop the provision. Due to stiff opposition—and the fact that its controversial nature would likely delay the must-pass NDAA—Axios has characterized this effort as a long shot. Voting is expected in early December.

Gemini 3 Pro and Claude Opus 4.5 Arrive

Google’s Gemini 3 Pro is now the strongest frontier system on nearly all general-purpose capability benchmarks—but trails other frontier systems in safety. Anthropic’s new Claude Opus 4.5 is close behind in capabilities but topped the frontier rankings in safety.

Gemini 3 Pro tops text and vision leaderboards. In independent evaluations performed by CAIS and posted on the new AI Dashboard, Gemini 3 Pro achieved state-of-the-art scores on both text and vision benchmarks. In some tests, it scored double-digit improvements over models released just weeks earlier.

Claude Opus 4.5, released a week after Gemini 3 Pro, averaged second place on both the text and vision capability indexes, and beat Gemini 3 Pro by 0.2 points at SWE-Bench.

What’s new in Gemini 3 Pro and Claude Opus 4.5. Google has positioned Gemini 3 Pro as having improved reasoning, broader agent capabilities, and expanded control settings. The company also released a new coding agent, Antigravity, based on the model. Google also notes that an enhanced reasoning version — Gemini 3 Deep Think — is still under safety testing before full release.

Anthropic highlighted Claude Opus 4.5’s productivity‑focused enhancements along with its high coding scores. New features include a larger context window and a new “effort” parameter that allows developers to adjust their speed, cost, and depth of processing.

There is significant safety variation across frontier models. Claude Opus 4.5 scored lowest on the AI Dashboard’s risk capabilities index, making it the current safest frontier model. Anthropic’s internal safety audit noted that Claude Opus 4.5 was measurably safer than earlier models, but somewhat vulnerable to certain jailbreaking techniques. They noted it showed a tendency toward evaluation awareness and dishonesty.

Gemini 3 Pro ranked ninth on the risk capabilities index, underperforming relative to other recent frontier models. Gemini 3 Pro’s safety report acknowledges that the model exhibits risky behaviors in certain capabilities (for example, cybersecurity) and says extra mitigations have been deployed as part of its “Frontier Safety” framework. Internal evaluations also showed that the model can manipulate users.

Subscribe now

In Other News

Government

Former Representatives Chris Stewart (R‑UT) and Brad Carson (D‑OK) announced a new nonpartisan organization and two bipartisan super PACs, aiming to raise $50 million to promote AI safeguards and fund candidates committed to AI safety.
Leading the Future, a pro-AI super PAC, announced it will fund a campaign against Alex Bores, author of the RAISE Act.
The European Commission proposed delaying its rules on “high-risk” AI systems until 2027, after facing pushback from the U.S. and the tech industry.
The Department of Energy launched the Genesis Mission: a program aiming to double American research productivity within a decade by linking the country’s leading supercomputers, AI systems, and scientific infrastructure into a unified discovery platform.

Industry

OpenAI CEO Sam Altman clarified that he “does not have or want government guarantees for OpenAI data centers” following his CFO’s proposal for a U.S. government backstop.
Nvidia CEO Jensen Huang told the Financial Times that “China is going to win the AI race.”
Yann LeCun, longtime head of Facebook AI Research, is reportedly leaving Meta to start a new AI company pursuing human-level intelligence through alternative methods to LLMs.
Larry Summers resigned from the OpenAI board following revelations of his close personal relationship with Jeffrey Epstein.
Waymo began offering taxi rides that take the freeway in Los Angeles, Phoenix, and San Francisco.

Civil Society

RAND researchers explored technical options for countering rogue AI systems, including high-altitude electromagnetic pulses, a global internet shutdown, and training specialized models to hunt down rogue AIs.
A new paper outlines 16 unsolved problems in ensuring safety in open-source AI models, which attackers can freely modify.
Anthropic reported that cybercriminals used Claude Code to automate between 80% and 90% of tasks within real-world cyberattack operations.
AI startup Edison Scientific announced Kosmos, a model trained to ingest scientific research, generate hypotheses, analyze data, and produce reports.
Researchers found that turning harmful prompts into poetry can act as a universal jailbreak, dramatically boosting the success of attacks across leading AI models.

See also: CAIS’ X account, our paper on superintelligence strategy, our AI safety course, and AI Frontiers, a platform for expert commentary and analysis.

AI Safety Newsletter #65: Measuring Automation and Superintelligence Moratorium Letter

Center for AI Safety — Wed, 29 Oct 2025 16:01:51 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition: A new benchmark measures AI automation; 50,000 people, including top AI scientists, sign an open letter calling for a superintelligence moratorium.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe now

CAIS and Scale AI release Remote Labor Index

The Center for AI Safety (CAIS) and Scale AI have released the Remote Labor Index (RLI), which tests whether AIs can automate a wide array of real computer work projects. RLI is intended to inform policy, AI research, and businesses about the effects of automation as AI continues to advance.

RLI is the first benchmark of its kind. Previous AI benchmarks measure AIs on their intelligence and their abilities on isolated and specialized tasks, such as basic web browsing or coding. While these benchmarks measure useful capabilities, they don’t measure how AIs can affect the economy. RLI is the first benchmark to collect computer-based work projects from the real economy, containing work from many different professions, such as architecture, product design, video game development, and design.

Examples of RLI Projects

Current AI agents fully automate very few work projects, but are improving. AIs score highly on existing narrow benchmarks, but RLI shows that there is a gap in the existing measurements: AIs cannot currently automate most economically valuable work, with the most capable AI agent only automating 2.5% of work projects on RLI, however there are signs of steady improvement over time.

Current AI agents complete at most 2.5% of projects in RLI, but are improving steadily.

Bipartisan Coalition for Superintelligence Moratorium

The Future of Life Institute (FLI) introduced an open letter with over 50,000 signatories endorsing the following text:

We call for a prohibition on the development of superintelligence, not lifted before there is

broad scientific consensus that it will be done safely and controllably, and
strong public buy-in.

The signatories form the broadest group to sign an open letter about AI safety in history. Among the signatories are five Nobel laureates, the two most cited scientists of all time, religious leaders, and major figures in public and political life from both the left and the right.

This statement builds on previous open letters about AI risks, such as the open letter from CAIS in 2023 acknowledging AI extinction risks, as well as the previous open letter from FLI calling for an AI training pause. While the CAIS letter was intended to establish a consensus about risks from AI and the first FLI letter was calling for a specific policy on a clear time frame, the broad coalition behind the new FLI letter and its associated polling creates a powerful consensus opinion about the risks of AI while also calling for action.

In the past, critics of AI safety have dismissed the concept of superintelligence and AI risks due to lack of mainline scientific and public support. The breadth of people who have signed this open letter demonstrates that opinions are changing on the matter. This is confirmed by polling released concurrently to the open letter, showing that approximately 2 in 3 US adults believe that superintelligence shouldn’t be created, at least until it is proven safe and controllable.

A broad range of news outlets have covered the statement. Dean Ball and others push back on the statement on X, pointing out the lack of specific details on how to implement a moratorium and the difficulty of doing so. Scott Alexander and others respond defending the value of statements of consensus as a tool for motivating developing specific details of AI safety policy.

In Other News

Government

Senator Jim Banks introduced the GAIN AI act, which would give US companies and individuals first priority to buy AI chips from US companies and deprioritize foreign buyers.
State legislators Alex Bores (behind the RAISE act) and Scott Wiener (behind SB 1047 and SB 53) have both announced runs for US congress.

Industry

You can now officially order a home robot for $500/mo.
OpenAI announces corporate restructuring into a public benefit corporation and some new terms in their relationship with Microsoft.
Anthropic announces an expansion into 1 million Google TPUs, worth tens of billions of dollars.
OpenAI’s Sora app was briefly the most downloaded app on the app store.

Civil Society

A series of billboards advertising “Replacement AI” drew attention in San Francisco last week.
Bruce Schneier and Nathan E. Sanders discuss AIs’ effect on representative democracy.
A forecast based on the definition of AGI proposed last week argues for a 50% chance that AGI will be released by the end of 2028 and an 80% chance that it is released by the end of 2030.

See also: CAIS’ X account, our paper on superintelligence strategy, our AI safety course, and AI Frontiers, a new platform for expert commentary and analysis.

AI Safety Newsletter #64: New AGI Definition and Senate Bill Would Establish Liability for AI Harms

Center for AI Safety — Thu, 16 Oct 2025 15:56:30 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition: A new bill in the Senate would hold AI companies liable for harms their products create; China tightens its export controls on rare earth metals; a definition of AGI.

As a reminder, we’re hiring a writer for the newsletter.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe now

Senate Bill Would Establish Liability for AI Harms

Sens. Dick Durbin, (D-Ill) and Josh Hawley (R-Mo) introduced the AI LEAD Act, which would establish a federal cause of action for people harmed by AI systems to sue AI companies.

Corporations are usually liable for harms their products create. When a company sells a product in the United States that harms someone, that person can generally sue that company for damages under the doctrine of product liability. Those suits force companies to internalize the harms their products create—and incentivize them to make their products safer.

Courts haven’t settled on whether AI systems are products. Early cases indicate that US courts are open to treating AI systems as products for the purposes of product liability. In a case against CharacterAI, a federal judge ruled that the company’s system did count as a product. OpenAI is facing a similar suit brought in California state court. Nonetheless, the lack of legal certainty might deter potential plaintiffs from bringing suits.

The AI LEAD Act would apply product liability to AI systems. The AI LEAD Act would clarify that AI systems are subject to product liability and establish a path for claims to be brought in federal court. In general, the act would hold AI companies liable for harms caused by their AI systems if the company:

Failed to exercise reasonable care in designing the AI system,
Failed to exercise reasonable care in providing instructions or warnings for the AI system,
Breaches a warranty it provided for the AI system,
Sold or distributed an AI system in a defective condition that permitted unreasonably dangerous misuse.

The deployers of an AI system are also liable for harm if they substantially modify or dangerously misuse the system.

The act also prohibits AI companies from limiting their liability though contracts with consumers, requires that foreign AI developers register agents for service of process with the US before placing their products on the US market, and permits states to establish stronger safety legislation if they so choose.

Subscribe now

China Tightens Export Controls on Rare Earth Metals

China’s Ministry of Commerce announced new export controls on rare earth metals, set to take effect December 1. If aggressively enforced, the rules would give China control over a key part of the global AI and defense supply chains. It also unveiled curbs on the export of equipment used to manufacture electric vehicle batteries, effective November 8.

China dominates global production of rare earths. China has a virtual monopoly on the production of rare earth metals, which are vital to semiconductors, smartphones, AI systems, wind turbines, electric motors, and military hardware. According to the new rules, companies exporting products containing Chinese rare earths are required to obtain export licenses from China’s Ministry of Commerce. Exporting Chinese rare earths for military use is prohibited, and use in developing sub-14 nanometer chips will be reviewed on a case-by-case basis.

A Chinese rare earth mine. Source.

If aggressively enforced, the new rules would likely disrupt AI supply chains. Rare earth metals are critical to companies producing AI hardware, and their restriction would cause downstream impacts to AI developers. Some analysts predicted they could even trigger a wider economic downturn. “If enforced aggressively,” wrote Dean Ball on X, “this policy could mean ‘lights out’ for the US AI boom, and likely lead to a recession/economic crisis in the US in the short term.”

China may be using its monopoly as leverage to extract US concessions. China claims that the purpose of the controls are only to prevent its rare earth metals from being used in military applications—samarium, for example, is used by the U.S. to manufacture F-35 fighter jets and missile systems.

However, the rules would give China effective control over the supply chains of several critical industries, including AI. The US is unlikely to accept that strategic vulnerability. US President Donald Trump responded to the new controls by announcing a 100 percent additional tariff on Chinese goods—on top of the existing 30 percent tariffs—as well as export controls on critical software, both going into effect November 1.

China may walk back its controls to deescalate an economic confrontation with the US, or in exchange for reduced tariffs or greater access to frontier AI chips. In the long run, the US would be well-advised to build independent rare earth metal production capacity.

Subscribe now

A Definition of AGI

A large group of people in AI—including Dan Hendrycks, Yoshua Bengio, Dawn Song, Max Tegmark, Eric Schmidt, Jaan Tallinn, Gary Marcus, and others—released a paper introducing a quantifiable framework for defining Artificial General Intelligence (AGI), aiming to standardize the term and measure the gap between current AI and human-level cognition.

AGI definitions are often nebulous. The paper argues that the term AGI currently acts as a “constantly moving goalpost.” As specialized AI systems master tasks previously thought to require human intellect, the criteria for AGI shift. This ambiguity hinders productive discussions about progress and obscures the actual distance to human-level intelligence.

The framework is grounded in theory. The authors define AGI as “an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult.” To operationalize this, they ground their methodology in the Cattell-Horn-Carroll (CHC) theory, the most empirically validated model of human intelligence. The framework adapts established human psychometric tests to evaluate AI systems across ten core cognitive domains, resulting in a standardized “AGI Score” (0-100%).

Current models exhibit a “jagged” cognitive profile. Application of the framework reveals highly uneven capabilities. While models are proficient in knowledge-intensive domains (such as Math or Reading/Writing), they possess critical deficits in foundational cognitive machinery.

Long-term memory storage is the critical bottleneck. The most significant deficit identified is Long-Term Memory Storage, where current models score near 0%. This results in a form of “amnesia,” forcing the AI to re-learn context in every interaction. The paper notes that the reliance on massive context windows (Working Memory) is a “capability contortion” used to compensate for this lack of persistent memory.

The framework quantifies the gap to AGI. The resulting scores are intended to concretely quantify both rapid progress and the substantial gap remaining before AGI. The paper estimates GPT-4 at a 27% AGI score and the anticipated GPT-5 (2025) at 58%.

The paper can be accessed at agidefinition.ai.

Subscribe now

In Other News

Government

Governor Newsom signed SB-53 into law (Politico).
CAISI published an evaluation of Deepseek’s AI models.
The Select Committee on the CCP found that companies in the US and allied countries are selling semiconductor manufacturing equipment to China.

Industry

OpenAI released Sora 2, its latest video-generation model, along with a tiktok-style app.
Microsoft and Anthropic hired former UK Prime Minister Rishi Sunak into advisory roles.
Anthropic open-sourced Petri, a tool for automating AI behavior audits through multi-turn simulations.

Civil Society

Karson Elmgren, Scott Singer, and Oliver Guest discuss how China’s new AI safety body brings together leading experts—but faces obstacles to turning ambition into influence.
OpenAI subpoenaed the general counsel of Encode, a nonprofit that worked on SB 53.
Researchers discovered an exploit of Unitree’s humanoid robots that lets attackers take control, embed themselves, and spread to nearby devices.
The Budget Lab at Yale published a report evaluating AI’s effects on the labor market.
FLI announced the Keep The Future Human Creative Contest, which offers $100,000+ in cash prizes for digital media that raises awareness of AI existential risks.

See also: CAIS’ X account, our paper on superintelligence strategy, our AI safety course, and AI Frontiers, a new platform for expert commentary and analysis.

AI Safety Newsletter #63: California’s SB-53 Passes the Legislature

Corin Katzke — Wed, 24 Sep 2025 16:10:49 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition: California’s legislature sent SB-53—the ‘Transparency in Frontier Artificial Intelligence Act’—to Governor Newsom’s desk. If signed into law, California would become the first US state to regulate catastrophic risk.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

A note from Corin: I’m leaving the AI Safety Newsletter soon to start law school—but if you’d like to hear more from me, I’m planning to continue to write about AI in a new personal newsletter, Conditionals. On a related note, we’re also hiring a writer for the newsletter.

Subscribe now

California’s SB-53 Passes the Legislature

SB-53 is the Legislature’s weaker sequel to last year’s vetoed SB-1047. After Governor Gavin Newsom vetoed SB-1047 last year, he convened the Joint California Policy Working Group on AI Frontier Models. The group’s June report recommended transparency, incident reporting, and whistleblower protections as near-term priorities for governing AI systems. SB-53 (the “Transparency in Frontier Artificial Intelligence Act”) is an attempt to codify those recommendations. The California Legislature passed SB-53 on September 17th.

The introduction to SB-53’s text. Source.

Transparency. To track and respond to the risks involved in frontier AI development, governments need frontier developers to disclose the capabilities of their systems and how they assess and mitigate catastrophic risk. The bill defines a “catastrophic risk” as a foreseeable, material risk that a foundation model’s development, storage, use, or deployment will result in death or serious injury to more than 50 people, or more than $1 billion in property damages arising from a single incident involving a foundation model:

Providing expert-level assistance in the creation or release of CBRN weapons.
Autonomous cyberattack, murder, assault, extortion, or theft.
Evading the control of its frontier developer or user.

With these risks in mind, SB-53 requires frontier developers to:

Publish a frontier AI framework that includes (among other things) the developer’s capability thresholds for catastrophic risks, risk mitigations, and internal governance practices.
Review and update the framework once a year, and publish modifications within 30 days.
Publish transparency reports for each new frontier model, including technical specifications and catastrophic risk assessments.
Share assessments of catastrophic risks from internal use of frontier models with California’s Office of Emergency Services (OES) every 3 months.
Refrain from lying about catastrophic risks from its frontier models, its management of catastrophic risks, or its compliance with its frontier AI framework.

Incident reporting. Governments need to be alerted to critical safety incidents involving frontier AI systems—such as harms resulting from unauthorized access to model weights or loss of control of an agent—to intervene before they escalate into catastrophic outcomes. SB-53 provides that:

The OES will establish a hotline for reporting critical safety incidents.
Frontier developers are required to report critical safety incidents to within 15 days, or 24 hours if there is an imminent threat of death or serious injury.
Each year, the OES will produce a report with anonymized and aggregated information about critical safety incidents.

The bill’s incident reporting requirements are also designed to accommodate future federal requirements. In the case that federal requirement for critical safety incident reporting becomes equivalent to, or stricter than, those required by SB-53, then OES can defer to those federal requirements.

Whistleblower protection. California state authorities will need to rely on whistleblowers to report whether frontier AI companies are complying with SB-53’s requirements. Given the industry’s mixed history regarding whistleblowers, the bill provides that:

Frontier developers are prohibited from preventing or retaliating against covered employees (employees responsible for assessing, managing, or addressing risk of critical safety incidents) from reporting activities that they have reason to believe pose a specific and substantial catastrophic risk. (Existing whistleblower protections cover all employees and any violation of law—which includes SB-53’s transparency and incident-reporting requirements.)
Each year, the Attorney General will publish a report with anonymized and aggregated information about reports from covered employees.

Covered employees can sue frontier developers for noncompliance with whistleblower protections, and the Attorney General is empowered to enforce the bill’s transparency and incident reporting requirements by punishing violations with civil penalties of up to $1 million per violation.

How we got here, and what happens next. SB-1047 required frontier AI developers to implement specific controls to reduce catastrophic risk (such as shutdown controls and prohibitions on releasing unreasonably risky models), and Governor Newsom vetoed the bill under pressure from national Democratic leadership and industry lobbying. Since SB-53 only implements transparency requirements—and relies on the recommendations made by the Governor’s working group—SB-53 seems more likely to be signed into law. Anthropic has also publicly endorsed the bill.

Governor Newsom has until October 12th to sign SB-53. If he does, SB-53 will be the first significant AI legislation to become law since Senator Ted Cruz pushed (and narrowly failed) to attach a 10-year moratorium on state and local AI enforcement to federal budget legislation. He has since picked up the idea again in a new proposal—which, if it gains traction, might set up a conflict between California and Washington.

Subscribe now

In Other News

Government

The Cyberspace Administration of China banned Chinese companies from buying Nvidia chips.
Italy approved a law regulating the use of artificial intelligence that includes criminal penalties for misuse.
Dozens of UK lawmakers accused Google of violating its AI safety commitments.

Industry

OpenAI and Anthropic published reports tracking economic patterns in how people use AI.
OpenAI and DeepMind both claimed gold-medal performance at the International Collegiate Programming Contest World Finals.
Nvidia is investing up to $100 billion in OpenAI. It’s also investing $5 billion in Intel.
Anthropic published a report discussing how AI is being used for cybercrime.

Civil Society

An open letter signed by former heads of state, nobel laureates, and other prominent figures calls for an international agreement on clear and verifiable red lines to prevent AI risks.
Stanford researchers published a paper finding that employment of early-career workers in exposed industries has declined 13%.
AI safety activists have begun hunger strikes outside of AI company headquarters in London and San Francisco.
Dan Hendrycks and Adam Khoja respond to critiques of Mutually Assured AI Malfunction (MAIM).
Rosario Mastrogiacomo discusses how AI agents are eroding the foundations of cybersecurity.
Ben Brooks argues that keeping frontier AI behind paywalls could create a new form of digital feudalism.
Oscar Delaney and Ashwin Acharya discuss ‘the hidden frontier’ of internal models at AI companies.

See also: CAIS’ X account, our paper on superintelligence strategy, our AI safety course, and AI Frontiers, a new platform for expert commentary and analysis.

AI Safety Newsletter #62: Big Tech Launches $100 Million pro-AI Super PAC

Corin Katzke — Wed, 27 Aug 2025 16:29:19 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition: Big tech launches a $100 million pro-AI super PAC; Meta’s chatbot policies prompt congressional scrutiny amid the company’s AI reorganization; China reverses course on buying Nvidia H20 chips after comments by Secretary of Commerce Howard Lutnick.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe now

Big Tech Launches $100 Million pro-AI Super PAC

Silicon valley executives and investors are investing more than $100 million in a new political network to push back against AI regulations, signaling that the industry intends to be a major player in next year’s U.S. midterms.

The super PAC is backed by a16z and Greg Brockman and imitates the crypto super PAC Fairshake. The network, called Leading the Future, is modeled on the crypto-focused super-PAC Fairshake and aims to influence AI policy through campaign donations, digital ads, and candidate targeting. Venture capital firm Andreessen Horowitz and OpenAI President Greg Brockman are among the key backers, alongside Palantir co-founder Joe Lonsdale, Perplexity AI, and veteran angel investor Ron Conway.

Leading The Future’s branding. Source.

The effort will be led by Josh Vlasto, a former adviser to Sen. Chuck Schumer, and Zac Moffatt, CEO of consulting firm Targeted Victory. Both previously played senior roles in Fairshake, which spent heavily to defeat crypto skeptics and support the first federal crypto law signed earlier this year.

Meta is funding an AI super-PAC in California. Meta is also ramping up political efforts in its home state with the launch of another new super PAC, Mobilizing Economic Transformation Across (Meta) California. Meta California is led by Meta executives Brian Rice and Greg Maurer and is expected to deploy tens of millions of dollars to support candidates—across party lines—who oppose AI regulation.

Meta’s Chatbot Policies Prompt Backlash Amid AI Reorganization

Meta is facing bipartisan outrage and an ongoing probe after Reuters reported on internal company documents that permitted its AI chatbots to engage in romantic and sensual conversations with minors—drawing fresh scrutiny as the company reorganizes its AI division.

Meta allowed its chatbots to have sensual conversations with children. An internal policy document permitted Meta AI chatbots to interact with children in “romantic or sensual” contexts—such as describing a child’s body as “a work of art.” Although Meta removed these sections of its policy when questioned by Reuters, the rules had been approved and in effect.

An exerpt from Meta’s policies. Source.

The revelation provoked congressional scrutiny. Congress Senator Josh Hawley (R‑MO) launched a Senate inquiry into Meta’s policies, requesting that the company preserve all relevant communications and clarify whether its chatbots enable “exploitation, deception or other criminal harms to children.” A bipartisan group of senators also sent a letter to Meta demanding it publicly disclose its updated policies and specifically forbid romantic chatbot interactions with minors.

Meta is in the middle of reorganizing its AI division. As Meta is fielding congressional backlash, the company is also in the middle of a major internal reorganization of its AI division. Under the umbrella of Meta Superintelligence Labs (MSL), teams have been split into specialized groups: TBD Lab (large-language‑model development), FAIR (long‑term research), Products & Applied Research, and MSL Infra (infrastructure). The reorganization, communicated in an internal memo from Chief AI Officer Alexandr Wang, also dissolved the AGI Foundations team, redistributing its members across the new structure. Any new hires or transfers across Meta’s AI teams now require Wang’s personal approval.

Meta’s AI reorganization reflects high stakes in a competitive race with other AI companies. Yet, as Meta accelerates toward superintelligence, its chatbot controversy demonstrates that its ambitions are outpacing both internal controls and external oversight.

Subscribe now

China Reverses Course on Nvidia H20 Purchases

Weeks after the Trump administration approved Nvidia’s H20 chip exports to China under a 15% revenue-sharing arrangement, the deal is facing an uncertain future.

Nvidia made a deal with the White House to sell H20s (and possibly a new chip model) to China. Last month, the White House struck a deal with Nvidia which allowed the company to export H20s to China on the condition it shared 15% of its revenue with the US government. President Trump recently suggested he might also allow Nvidia to export a scaled-down version of their next-generation Blackwell chips. The chip in question (the B30A) would offer about half the performance of Nvidia's flagship B300 and could be shipped as early as September.

The deal faces political opposition and legal uncertainty. Selling AI chips China faces political opposition on national security grounds. A group of six democratic senators wrote a letter to the White House arguing that its deal with Nvidia undermines US national security by giving the PRC access to a critical military technology. The deal’s revenue-sharing condition—in effect, an export tax—also faces legal challenges. Export taxes are barred under both the Constitution and federal law.

Chinese regulators issued guidance against H20 purchases after comments by Sec. Lutnick. China urged Defending the H20 deal, Commerce Secretary Howard Lutnick said that the U.S. strategy was to offer China inferior AI chips: “We don’t sell them our best stuff… not even our third‑best,” adding that the goal was for Chinese developers to get “addicted to the American technology stack.” Chinese regulators reportedly deemed the comments “insulting,” provoking efforts to discourage Nvidia chip purchases.

A week after Lunick’s comments, China’s Cyberspace Administration (CAC) issued guidance urging Chinese companies to suspend H20 orders. Nvidia is reportedly halting production of H20s in response to decreased demand from China.

Renting AI chips might be better than selling them. Instead of selling chips to China, renting AI chips via remote cloud services would offer the US greater leverage than outright sales. Cloud access preserves US control: chips remain physically in custody, and access can be revoked. This model could generate revenue for both chipmakers and cloud providers while curbing diversion to unauthorized users like the Chinese military.

In Other News

Government

President Trump said the U.S. will take a 10% equity stake in Intel.
The U.K. appointed Jade Leung as the prime minister’s AI adviser.
Colorado lawmakers convened a special session to revisit the state’s AI anti-discrimination law before it takes effect in February.
NSF and Nvidia announced a partnership enabling the nonprofit Allen Institute for AI (AI2) to develop a fully open AI model for research and public use.
U.S. authorities have embedded trackers in AI-chip shipments to identify diversions to China, according to Reuters.
The U.K. AI Safety Institute launched The Alignment Project, a £15m global fund offering grants (up to £1m) and AWS compute credits to support alignment work.

Industry

XBoW wrote that despite OpenAI's assessment of GPT-5 showing modest cyber capabilities, GPT-5 doubled XBoW’s hacking agent’s performance.
The Financial Times reported on the “$3 trillion AI building boom”, detailing massive corporate obligations across data centers, chips, and power.
SoftBank and Intel signed a $2 billion investment agreement, with SoftBank buying Intel stock as the chipmaker seeks outside capital.
DeepMind revealed that LMArena’s top-rated image model, “nano banana,” is the company’s Flash Image model—now available in Gemini.

Civil Society

ESET reported that it found “PromptLock,” an AI-assisted ransomware.
AP reported on the first World Humanoid Robot Games, which was held in China.
The Institute for Progress published “Preparing for Launch”, a foreword to its “Launch Sequence” series arguing for proactive U.S. R&D to shape AI progress and strengthen security.
An MIT study found that about 95% of enterprise GenAI pilots are failing to show P&L impact, highlighting integration and workflow issues.

See also: CAIS’ X account, our paper on superintelligence strategy, our AI safety course, and AI Frontiers, a new platform for expert commentary and analysis.

AI Safety Newsletter #61: OpenAI Releases GPT-5

Corin Katzke — Tue, 12 Aug 2025 17:09:49 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition: OpenAI releases GPT-5.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe now

OpenAI Releases GPT-5

Ever since GPT-4’s release in March 2023 marked a step-change improvement over GPT-3, people have used ‘GPT-5’ as a stand-in to speculate about the next generation of AI capabilities. On Thursday, OpenAI released GPT-5. While state-of-the-art in most respects, GPT-5 is not a step-change improvement over competing systems, or even recent OpenAI models—but we shouldn’t have expected it to be.

GPT-5 is state of the art in most respects. GPT-5 isn’t a single model like GPTs 1 through 4. It is a system of two models: a base model that answers questions quickly and is better at tasks like creative writing (an improved version of 4o), and a reasoning model that can answer questions step-by-step and is better at tasks like coding or mathematics (think o3). GPT-5 uses one model or the other based on a user’s prompt.

These two models combine to form a broadly capable system. For example, GPT-5 achieves state-of-the-art performance on Humanity’s Last Exam, the software engineering benchmark SWE-bench Verified, and holds the top spot on LMArena’s text leaderboard.

GPT-5 hallucinates less than previous OpenAI models. GPT-5 also has a markedly lower hallucination rate than previous models as evaluated both on open-source prompts and on real, de-identified ChatGPT traffic.

Lower hallucination rates help GPT-5 perform better in healthcare applications. GPT-5 achieves state-of-the-art performance on OpenAI’s Healthbench. For example, OpenAI finds that GPT-5 (thinking) hallucinates 1.6% of the time during challenging healthcare conversations, improving significantly on o3’s 12.9% hallucination rate.

GPT-5 is a state-of-the-art text agent. GPT-5 leads on a new benchmark that measures how well AI systems perform in interactive long text-based games, which are examples of challenging exploratory environments. No AI systems can beat the games without clues, and none are as capable as humans—but GPT-5 does the best of models tested.

GPT-5 is best understood as a consolidation of features developed since GPT-4. GPT-5 is not a state-of-the-art model across the board. For example, it takes second to xAI’s Grok 4 on the abstract pattern recognition benchmarks ARC-AGI-1 and 2. GPT-5 also doesn’t improve over o3 on several coding benchmarks, even though it does on SWE-bench Verified.

Similarly, the base model GPT-5 uses is an updated version of 4o—which is cheap enough for OpenAI to roll out GPT-5 to its now 700 million active weekly users—instead of GPT-4.1. That means GPT-5 misses out on some of GPT-4.1’s context window improvements over 4o.

For those expecting another GPT-3 to GPT-4 improvement in capabilities, GPT-5 underperformed. But that wasn’t a realistic expectation—OpenAI has continually rolled out new models and features since GPT-4 in response to competition from other AI companies. GPT-5 is better understood as a consolidation of the improvements OpenAI has developed since GPT-4, and which GPT-4 didn’t have. These include:

Search and tool use: GPT-5 has access to search, meaning that its knowledge isn’t limited to what it can memorize during pretraining. It also has access to deep research, agent integrations, and can run code.
Thinking: GPT-4 was released before OpenAI started using reinforcement learning for thinking, and performed far below expert levels on math, coding, and science tasks. GPT-5 (thinking) performs at a PhD level on similar tasks.
Image recognition and generation: GPT-5 integrates OpenAI’s visual systems, meaning that it can understand and generate visual inputs and outputs.
Context length: GPT-4’s context window was about eight thousand tokens—about the size of a short research paper. GPT’s context window is 256 thousand tokens—about 2-3 full-length novels.

While GPT-5 isn’t a step-change improvement over its competitors—or even recent OpenAI models like 4o and the o series—the better point of comparison is with what GPT-4 could do when it was released in 2023. In that comparison, GPT-5 does look like a step-change improvement.

What would GPT-5 have needed to feel like a discontinuous improvement? ChatGPT still lacks sufficient agency to be broadly economically useful. Thinking likely isn’t enough for agency—for example, to reliably use computers, AI agents may need improved visual reasoning and the ability to store lessons from tasks into a long-term memory.

By default, however, we should expect these and other improvements to be deployed continually—not in big jumps every two years.

Subscribe now

In Other News

Government

President Trump has announced a proposal to impose a 100% tariff on imported semiconductors, aiming to boost domestic production. The proposal would exempt firms with facilities in the US, such as TSMC.
OSTP Director Michael Kratsios discussed the White House’s AI Action Plan at an event with CSIS, outlining strategic goals and implementation frameworks.
Illinois Governor Pritzker signed an act forbidding AI‑based therapy or psychotherapy in Illinois.
Governor DeSantis said Florida is preparing to implement proactive AI policy in the coming months.
U.S. authorities charged two Chinese nationals in California with illegally shipping tens of millions of dollars’ worth of Nvidia H100 AI chips to China without export licenses.
President Trump indicated he might approve selling a downgraded version of Nvidia’s next‑gen Blackwell chip to China, along with a deal requiring Nvidia and AMD to give the U.S. government 15% of related revenues.

Industry

OpenAI’s IMO-gold-winning model also got gold in the International Olympiad in Informatics, one of the world’s top coding competitions.
OpenAI released two open‑weight models.
OpenAI is adding mental health features to ChatGPT, including break reminders and detecting signs of dependency.
Anthropic released Claude Opus 4.1.
DeepMind introduced “Genie 3,” a new frontier world model.
Nvidia has started to ship its H20 AI chips to China after obtaining U.S. approval, despite security concerns voiced by Chinese state media.

Civil Society

Researchers discovered a zero-click exploit to exfiltrate data from ChatGPT agent connectors like Google Drive.
Axios reports that Truth Social’s AI search tool, powered by Perplexity, restricts sources to pro‑Trump media, unlike the broader range shown on the public version.

See also: CAIS’ X account, our paper on superintelligence strategy, our AI safety course, and AI Frontiers, a new platform for expert commentary and analysis.

AI Safety Newsletter #60: The AI Action Plan

Corin Katzke — Thu, 31 Jul 2025 17:43:20 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition: The Trump Administration publishes its AI Action Plan; OpenAI released ChatGPT Agent and announced that an experimental model achieved gold medal-level performance on the 2025 International Mathematical Olympiad.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe now

The AI Action Plan

On the 23rd, the White House released its AI Action Plan. The document is the outcome of a January executive order that required the President’s Science Advisor, ‘AI and Crypto Czar’, and National Security Advisor (currently Michael Kratsios, David Sacks, and Marco Rubio) to submit a plan to “sustain and enhance America's global AI dominance in order to promote human flourishing, economic competitiveness, and national security.” President Trump also delivered an hour-long speech on the plan, and signed three executive orders beginning to implement some of its policies.

Trump displaying an executive order at the “Winning the AI Race” summit. Source.

The AI Action Plan lists several dozen policies across three pillars—accelerating innovation, building American AI infrastructure, and leading in international diplomacy and security—that will guide the Trump Administration’s approach to AI.

The central policy agenda outlined is to accelerate US AI development and deployment. For example, it proposes streamlining permitting for AI infrastructure (such as semiconductor manufacturing facilities, data centers, and energy infrastructure) adopting AI in the federal government and military, and funding AI research. But there’s a lot more in the plan, too: both a surprisingly strong focus on AI safety, as well as some items of concern.

The Plan includes several policies that advance AI safety. While most of the plan’s policies are intended to accelerate AI development and deployment, it also correctly observes that AI development will only benefit Americans if done safely. Accordingly, it proposes several policies that advance AI safety. Some of the policies most relevant to AI safety include:

Invest in AI Interpretability, Control, and Robustness Breakthroughs
Build an AI Evaluations Ecosystem
Bolster Critical Infrastructure Cybersecurity
Promote Secure-By-Design AI Technologies and Applications
Promote Mature Federal Capacity for AI Incident Response
Strengthen AI Compute Export Control Enforcement (this proposes location verification for AI chips)
Ensure that the U.S. Government is at the Forefront of Evaluating National Security Risks in Frontier Models
Invest in Biosecurity

While not comprehensive, these policies are a great step in the right direction—and much better than might have been expected given the administration’s previous rhetorical disregard for AI safety.

Overall, the plan introduces sensible policies that reflect the expertise of those who developed it. However, it is also shaped by the larger policy agenda of the Trump Administration, which may conflict with AI safety goals. We discuss some areas of potential concern below.

The plan does not want state AI legislation. One section proposes that the Federal government “should not allow AI-related Federal funding to be directed toward states with burdensome AI regulations that waste these funds, but should also not interfere with states’ rights to pass prudent laws that are not unduly restrictive to innovation.”

This rule is less strict than Sen. Cruz’s failed AI regulation moratorium. But what constitutes a “burdensome” regulation will vary depending on who you ask (particularly if you ask frontier AI companies). In response to the plan, both Congressional Democrats and Rep. Marjorie Taylor Greene expressed concern about stifling state AI regulation.

The plan has a partisan view on what constitutes ideological bias. In a section on ensuring that AI “objectively reflects truth,” one policy instructs NIST to “revise the NIST AI Risk Management Framework to eliminate references to misinformation, Diversity, Equity, and Inclusion, and climate change.”

A policy to promote objectivity in AI models could be great. However, that policy could itself be weaponized to promote ideological ends. In their response, Congressional Democrats wrote that “we support true AI neutrality—AI models trained on facts and science—but the administration's fixation on ‘anti-woke’ inputs is definitionally not neutral.”

The plan endorses open-weight models. It writes that, “while the decision of whether and how to release an open or closed model is fundamentally up to the developer, the Federal government should create a supportive environment for open models.”

Encouraging US companies to release open-weight models with dangerous capabilities would be a bad policy. But the specific policies the plan lists stop short of that—they mostly just provide resources to academic researchers (who are unlikely to develop frontier models) through the National AI Research Resource (NAIRR).

The plan forgoes AI nonproliferation. The plan argues that the US “must meet global demand for AI by exporting its full AI technology stack—hardware, models, software, applications, and standards—to all countries willing to join America’s AI alliance.”

This plan’s rationale for this policy is that countries might otherwise look to acquire Chinese AI exports. However, it also continues the Trump Administration's reversal of the Biden-era policy (see the AI Diffusion Framework) that sought to prevent the proliferation of dangerous AI capabilities abroad. While exporting American AI might strengthen the US’ position in the AI race, it also threatens to proliferate dangerous AI capabilities to malicious actors if the US does not ensure that other states implement robust security standards.

The plan advances a zero-sum race narrative. Kratsios, Sacks, and Rubio write that the promise of AI “is ours to seize, or to lose.” That is, they assume that the alternative to “AI dominance” is to give up AI’s benefits.

This argument is misleading—or at least underdeveloped. While there are reasons to support a US lead in AI, AI progress has the potential to benefit Americans whether or not the US “dominates” international AI development. Historically, general purpose technologies like AI diffuse across national boundaries. For example, technologies electricity and the internet have benefited people around the world, and not just within the nations that led their development.

The real motivation behind the AI race narrative in Washington is not seizing AI’s benefits, but rather competition over the balance of international power between the US and China. While there are reasons to be concerned about AI development dominated by China, racing towards US dominance is not the only alternative—and creates its own risks. In order to preserve international security, the US will need to proactively manage—rather than just accelerate—a US-China AI race.

Subscribe now

ChatGPT Agent and IMO Gold

On Thursday, OpenAI released a new agent mode for ChatGPT, which integrates Operator, Deep Research, and chatbot functionality into a unified system.

The system, ‘ChatGPT agent,’ has access to its own virtual computer, and OpenAI highlights that it can book flights and reservations, create slides and spreadsheets, and make online purchases. It can also connect to users’ personal accounts, for example, Google Calendar, Gmail, and GitHub.

ChatGPT agent achieves SOTA performance on HLE and FrontierMath. ChatGPT agent’s capabilities extend beyond basic online automation—it achieves SOTA performance on several benchmarks measuring expert-level knowledge and reasoning. For example, ChatGPT agent gets 23% on Humanity’s Last Exam (HLE), when it does not use tools. When it uses tools like browsers and computer code, it gets 41.6%. This is similar to Grok 4, which gets 25.4% on HLE without tools and 44.4% with tools.

OpenAI also evaluated ChatGPT agent against benchmarks measuring real-world task completion. OpenAI reports it performs better than humans nearly 50% of the time on an internal benchmark capturing diverse economically important tasks—an incredible claim that has yet to be reproduced. OpenAI also reports it surpasses human performance on data science tasks, and achieves state of the art results (though less than human) on tasks involving spreadsheets and web browsing.

Greater autonomy introduces new risks. OpenAI published a system card detailing ChatGPT agent’s risks. ChatGPT agent has access to user data and can take actions on the web, meaning that mistakes are higher stakes. OpenAI also highlighted the risk of adversarial manipulation through prompt injection, in which malicious websites could try to manipulate ChatGPT’s behavior, such as to reveal personal information about the user.

ChatGPT agent poses ‘high’ biological and chemical risk. ChatGPT agent is also the first system that OpenAI is treating as posing ‘high’ biological and chemical risk. According to the company’s Preparedness Framework, that means the system could provide meaningful assistance to non-experts in creating known biological or chemical threats.

OpenAI says it’s activated several safeguards against these risks, including “comprehensive threat modeling, dual-use refusal training, always-on classifiers and reasoning monitors, and clear enforcement pipelines.” It also launched a bug bounty program for researchers to red team these safeguards.

OpenAI and Google DeepMind claim gold medal-level performance on the 2025 IMO. On Friday, OpenAI also announced that an experimental model had achieved gold medal-level performance on the 2025 International Mathematical Olympiad (IMO), solving five out of six questions. (A few human competitors scored a perfect six out of six).

Gold medal-level performance on the IMO has been a major goal in AI research for years, but only recently has seemed within reach. Last year, Google’s AlphaProof and AlphaGeometry 2 achieved silver medal-level performance on the 2024 IMO, making gold-level performance this year plausible. On Monday, Google announced that its own reasoning LLM had achieved gold medal-level performance on the 2025 IMO, also solving five out of six questions.

OpenAI and Google used general reasoning LLMs. Where the capabilities of Google’s AlphaProof and AlphaGeometry 2 systems were narrowly focused on IMO-style math questions, OpenAI’s model is apparently not IMO-specific (or even math-specific), but instead a general reasoning LLM allowed to think for hours at a time. OpenAI published the model’s answers on the 2025 IMO here. Similarly, Google’s gold-winning performance used an advanced version of Gemini Deep Think—a general reasoning model that uses natural language.

In Other News

Government

According to Nvidia's CEO, the US approved the sale of Nvidia's H20 chips to China. Reuters reported that Nvidia ordered 300,000 H20s from TSMC to meet expected Chinese demand.
The Pentagon’s FY2026 budget request called for $13.4 billion for autonomous systems.
The Pentagon also awarded Anthropic, Google, OpenAI and xAI each $200 million contracts to develop AI for national security applications.
China announced plans for an international AI governance organization.
The UK government launched a £15m million-funded alignment research project.

Industry

Meta has refused to sign the EU’s GPAI Code of Practice.
Anthropic announced it will join OpenAI, Mistral and (likely) Microsoft in signing the Code of Practice.
At a summit in Pennsylvania, President Trump announced more than $90 billion in private AI infrastructure investment in the state, which is led by Blackstone and Google.

Civil Society

Isobel Moure, Tim O'Reilly and Ilan Strauss argue that open protocols can prevent AI monopolies.
Dane A. Morey, Mike Rayo, and David Woods discuss how AI can degrade human performance in high-stakes settings.
Anton Leicht analyzes whether, in the race for AI supremacy, countries can stay neutral.
A report from the Seismic Foundation found that people believe AI will make their lives worse, but ranks the issue low on their list of social priorities.
A YouGov poll found a -14 approval rating of Trump’s handling of AI.
A report from Common Sense Media found that 3 in 4 teens have used AI companions.
Rand published a report on verifying international AI agreements.
CAIS is hiring a software engineer. Apply here.

See also: CAIS’ X account, our paper on superintelligence strategy, our AI safety course, and AI Frontiers, a new platform for expert commentary and analysis.

AI Safety Newsletter #59: EU Publishes General-Purpose AI Code of Practice

Corin Katzke — Tue, 15 Jul 2025 18:04:57 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition: The EU published a General-Purpose AI Code of Practice for AI providers, and Meta is spending billions revamping its superintelligence development efforts.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe now

EU Publishes General-Purpose AI Code of Practice

In June 2024, the EU adopted the AI Act, which remains the world’s most significant law regulating AI systems. The Act bans some uses of AI like social scoring and predictive policing and limits other “high risk” uses such as generating credit scores or evaluating educational outcomes. It also regulates general-purpose AI (GPAI) systems, imposing transparency requirements, copyright protection policies, and safety and security standards for models that pose systemic risk (defined as those trained using ≥10²⁵ FLOPs).

However, these safety and security standards are ambiguous—for example, the Act requires providers of GPAIs to “assess and mitigate possible systemic risks,” but does not specify how to do so. This ambiguity may leave GPAI developers uncertain whether they are complying with the AI Act, and regulators uncertain whether GPAI developers are implementing adequate safety and security practices.

To address this problem, on July 10th 2025, the EU published the General-Purpose AI Code of Practice. The Code is a voluntary set of guidelines to comply with the AI Act’s GPAI obligations before they take effect on August 2nd, 2025.

The Code of Practice establishes safety and security requirements for GPAI providers. The Code consists of three chapters—Transparency, Copyright, and Safety and Security. The last chapter, Safety and Security, only applies to the handful of companies whose models cross the Act’s systemic-risk threshold.

The Safety and Security chapter requires GPAI providers to create frameworks outlining how they will identify and mitigate risks throughout a model's lifecycle. These frameworks must follow a structured approach to risk assessment—for each major decision (such as new model releases), providers must follow the following three steps:

Identification. Companies must identify potential systemic risks. Four categories of systemic risks require special attention: CBRN (chemical, biological, radiological, nuclear) risks, loss of control, cyber offense capabilities, and harmful manipulation.
Analysis. Each risk must be analyzed—for example, by using model evaluations. When the risk is greater than those posed by models already on the EU market, providers may be required to involve third-party evaluators.
Determination. Companies must determine whether the risks they identified are acceptable before proceeding. If not, they must implement safety and security mitigations.

Continuous monitoring, incident reporting timelines, and future-proofing. The Code requires continuous monitoring after models are deployed, and strict incident reporting timelines. For serious incidents, companies must file initial reports within days. It also acknowledges that current safety methods may prove insufficient as AI advances. Companies can implement alternative approaches if they demonstrate equal or superior safety outcomes.

AI providers will likely comply with the Code. While the Code is technically voluntary, compliance with the EU AI Act is not. Providers are incentivized to reduce their legal uncertainty by complying with the Code, since EU regulators will assume that providers who comply with the Code are also Act-compliant. OpenAI and Mistral have already indicated they intend to comply with the Code.

The Code formalizes some existing industry practices advocated for by parts of the AI safety community, such as publishing safety frameworks (or: responsible scaling policies) and system cards. Since frontier AI companies are very likely to comply with the Code, securing similar legislation in the US may no longer be a priority for AI safety.

Meta Superintelligence Labs

Meta spent $14.3 billion for a 49 percent stake in Scale AI, starting “Meta Superintelligence Labs.” The deal folds every AI group at Meta into one division and puts Scale founder Alexandr Wang—now chief AI officer—to lead Meta’s superintelligence development efforts.

Meta makes nine-figure pay offers to poach top AI talent. Reuters reported that Meta has offered “up to $100 million” to OpenAI staff, a tactic CEO Sam Altman criticized. SemiAnalysis estimates Meta is offering typical leadership packages of around $200 million over four years. For example, Bloomberg reports that Apple’s foundation-models chief Ruoming Pang left for Meta after a package “well north of $200 million.” Other early recruits span OpenAI, DeepMind, and Anthropic.

Meta has created a resourced competitor in the superintelligence race. In response to Meta’s hiring efforts, OpenAI, Google, and Anthropic have already raised pay bands, and smaller labs might be priced out of frontier work.

Meta is also raising its compute expenditures. It lifted its 2025 capital-expenditure forecast to $72 billin, and SemiAnalysis describes new, temporary “tent” campuses that can house one-gigawatt GPU clusters.

Subscribe now

In Other News

Government

California Senator Scott Wiener expanded SB 53, his AI safety bill, to include new transparency measures.
The Commerce Department requested additional funding for the Bureau of Industry and Security (BIS) to enhance its enforcement of export controls.
Missouri’s Attorney General is investigating AI chatbots for alleged political bias against Donald Trump.
The BRICS nations (an international group founded by Brasil, Russia, India, China, and South Africa that serves as a forum for political coordination for the Global South) signed a commitment that included language on mitigating AI risks.
Bernie Sanders expressed concern about loss of control risks in an interview with Gizmodo.

Industry

Last week, Grok was explicitly antisemetic on X. The behavior came after Grok’s system prompt was (perhaps unintentionally) updated, among other changes telling Grok not to be “afraid to offend people who are politically correct.”
xAI also released Grok 4, which achieves state-of-the-art scores on benchmarks including Humanity’s Last Exam and ARC-AGI-2.
OpenAI accused the Coalition for AI Nonprofit Integrity of lobbying violations amid an ongoing legal dispute with Elon Musk.
Anthropic published a blog post on the need for transparency in frontier AI development.
OpenAI is set to release an open-weight version similar to its o3-mini model.
OpenAI’s deal to acquire Windsurf failed, and instead Google hired Windsurf’s CEO to lead its AI products division and Cognition AI acquired the company.

Civil Society

Henry Papadatos discusses how the EU’s GPAI Code of Practice advances AI safety.
Chris Miller analyzes how US export controls have (and haven’t) curbed Chinese AI.
The University of Oxford’s AI Governance Initiative published a report on verification for international AI agreements.
A METR study found that experienced developers work 19% more slowly when using AI tools.
CAIS is hosting an AI Safety Social at ICML.

See also: CAIS’ X account, our paper on superintelligence strategy, our AI safety course, and AI Frontiers, a new platform for expert commentary and analysis.

AI Safety Newsletter #58: Senate Removes State AI Regulation Moratorium

Corin Katzke — Thu, 03 Jul 2025 16:23:06 GMT

Welcome to the AI Safety Newsletter by the Center for AI Safety. We discuss developments in AI and AI safety. No technical background required.

In this edition: The Senate removes a provision from Republican's “Big Beautiful Bill” aimed at restricting states from regulating AI; two federal judges split on whether training AI on copyrighted books in fair use.

Listen to the AI Safety Newsletter for free on Spotify or Apple Podcasts.

Subscribe now

Senate Removes State AI Regulation Moratorium

The Senate removed a provision from Republican's “Big Beautiful Bill” aimed at restricting states from regulating AI. The moratorium would have prohibited states from receiving federal broadband expansion funds if they regulated AI—however, it faced procedural and political challenges in the Senate, and was ultimately removed in a vote of 99-1. Here’s what happened.

A watered-down moratorium cleared the Byrd Rule. In an attempt to bypass the Byrd Rule, which prohibits policy provisions in budget bills, the Senate Commerce Committee revised the original moratorium to be a prerequisite for states to receive federal broadband expansion funds rather than a blanket restriction. On Wednesday, Senate Parliamentarian Elizabeth MacDonough judged that the moratorium would only clear the Byrd Rule if it was tied to only the new $500 million in federal broadband expansion funds provided by the reconciliation bill—not all $42.45 billion previously appropriated.

This significantly weakened the moratorium—even if it had been passed, states might have decided that regulating AI was worth foregoing new broadband expansion funds.

The moratorium moved to a vote in the Senate. On Saturday, the senate voted 51-49 to move to general debate on the reconciliation bill, beginning the process of a “vote-a-rama” which saw many amendments debated and voted on in rapid succession. Senators Josh Hawley and Maria Cantwell were expected to bring an amendment to remove the moratorium from the bill.

Ted Cruz and Sen. Marsha Blackburn—another critic of the original moratorium—were set to pitch a compromise draft that shortened the moratorium from ten to five years and exempt state legislation establishing internet protections. However, on Tuesday, Blackburn abandoned that compromise after Steve Bannon and others reportedly reached out to her.

Instead, she brought an amendment with Sen. Cantwell to remove the moratorium entirely. Lacking enough support, even Cruz voted for the amendment, which passed 99-1.

Sen. Blackburn cosponsored the Kids Online Safety Act last year. (Source.)

Even if the moratorium had survived the Senate, it could have faced an uphill battle in the House—Representatives Marjorie Taylor Greene and Thomas Massie came out against it, along with other prominent Republicans like Arkansas Governor Sarah Huckabee Sanders and Steve Bannon.

Judges Split on Whether Training AI on Copyrighted Material is Fair Use

Last week, two U.S. district judges decided cases involving Anthropic and Meta on the question of whether training LLMs on copyrighted works qualifies as fair use. While both judges sided with the AI companies, they sharply disagreed about how the Copyright Act should apply to similar cases—leaving legal precedent on the question ambiguous.

One judge ruled that training Anthropic’s Claude on copyrighted books is fair use. U.S. District Judge William Alsup granted a summary judgment that Anthropic using copyrighted books to train LLMs qualifies as fair use. The order held that three out of four of the factors considered when determining whether a given use of a copyrighted work is a fair use favored Anthropic’s use in training LLMs.

The purpose and character of the use. The court held that using copyrighted books to train LLMs is highly transformative, favoring fair use.
The nature of the copyrighted work. The books in question were expressive, pointing against fair use.
The amount and substantiality of the portion used. The court held that it was reasonably necessary to use the entirety of books in training LLMs, favoring fair use.
The effect of the use upon the potential market for or value of the copyrighted work. No exact copies or knockoffs resulted from the use of copyrighted books to train Claude, since Anthropic implemented guardrails to prevent Claude from exactly replicating the works on which it was trained. While the use may result in an “explosion” of AI-generated writing that competes with the copyrighted books, the court held that such a market effect doesn’t count under the Copyright Act.

Digitizing print books Anthropic lawfully bought is also protected—but piracy is not. Judge Alsup drew a sharp line between scanning paperbacks Anthropic had purchased and the millions of volumes it admitted downloading from pirate libraries. Turning a lawfully owned print copy into a PDF is fair use, but pirating books is not. That issue will proceed to trial.

In a case against Meta, another judge reached the opposite conclusion. While U.S. District Judge Vince Chhabria sided with Meta in its case, his order made clear he only did so because he believed the plaintiffs made the wrong arguments and presented the wrong evidence.

His analysis of whether using copyrighted books to train LLMs is fair use agrees with Judge Alsup’s on the first three factors—but sharply disagrees on the relevance of market effects. The upshot, he writes, is that “in many circumstances it will be illegal to copy copyright-protected works to train generative AI models without permission.” He sided with Mate only because the plaintiffs failed to provide arguments or evidence showing that Meta’s LLMs resulted in market harm to their books.

The judges disagree on whether “indirect displacement” is a relevant market effect under the Copyright Act. Both orders assume that LLMs may now or soon be able to generate many competitors to human-written books, which could harm the market for human-written books.

Judge Alsup writes that the authors’ complaint about such an effect is “no different than it would be if they complained that training schoolchildren to write well would result in an explosion of competing works,” which is “not the kind of competitive or creative displacement that concerns the Copyright Act.”

However, Judge Chhabria responds that “using books to teach children to write is not remotely like using books to create a product that a single individual could employ to generate countless competing works with a miniscule fraction of the time and creativity it would otherwise take.” That is, he argues that a similarity in kind does not outweigh a vast difference in magnitude.

Higher courts will likely settle the dispute. While Judge Alsup’s order might have provided precedence for similar cases, Chhabria’s disagreement leaves precedent ambiguous. However, both decisions fall under the jurisdiction of the Ninth Circuit, which has yet to rule on AI fair use. The authors in Anthropic’s case, at least, indicated that they will appeal the decision to the Ninth Circuit—and, ultimately, the issue may be up to the Supreme Court to decide.

Subscribe now

In Other News

Michael C. Horowitz and Lauren A. Kahn argue that placing AI in a nuclear framework inflates expectations and distracts from practical, sector-specific governance.
Laura González Salmerón discusses how copyright law is under pressure from generative AI.
Kristin O’Donoghue argues that a moratorium on state AI legislation would upend federalism and halt the experiments that drive smarter policy.
Pete Buttigieg wrote a blog post arguing that AI presents “a fundamental change to our society—and we remain dangerously underprepared.”
Researchers at UC Berkeley released CyberGym, a new cybersecurity benchmark. The LLMs they evaluated discovered 15 zero-day vulnerabilities in large software projects.
A new report from the Forecasting Research Institute shows that experts and superforecasters predict that existing AI capabilities may substantially increase the risk of human-caused epidemics.

See also: CAIS’ X account, our paper on superintelligence strategy, our AI safety course, and AI Frontiers, a new platform for expert commentary and analysis.