How big could an “AI Manhattan Project” get?
An AI Manhattan Project could accelerate compute scaling by two years
Over the last year, the possibility of an AI national project has steadily grown.
In November, the US-China Economic and Security Review Commission listed that its top recommendation to Congress was to “establish and fund a Manhattan Project-like program dedicated to racing to and acquiring an Artificial General Intelligence capability.” Over the last few months, the US Department of Energy has also repeatedly compared AI to the Manhattan Project and indicated that it would use its power to help the project succeed, recently tweeting this:
But what would a “Manhattan Project for AI” actually entail? It’s not entirely clear, but we think that three distinct features capture much of the essence of what people are referring to:
It’s a project initiated by the US government
Private sector AI resources (e.g. compute) are consolidated
Total compute investments reach a similar fraction of US GDP as the peak of the Manhattan Project or the Apollo program
In addition to these core properties, for the purposes of this analysis we focus primarily on the physical bottlenecks to this scaling, thus assuming that there’ll be ample investment and political support for an AI Manhattan Project.
In this post, we argue that such a project in the US could yield a 2e29 FLOP training run by the end of 2027, even after considering power constraints. This is around 500 times larger than the most compute-intensive model to date, and constitutes a 10,000-fold scale-up over GPT-4.1
Previous work has considered each of the bottlenecks to scaling under normal conditions and a national project individually. This post aims to investigate what a national project would mean for scaling, drawing on more recent evidence about the feasibility of near-term supercomputer and energy scaling, such as xAI’s Colossus cluster, Stargate, and NVIDIA revenues.
To be clear, this post is not a policy recommendation, nor are we predicting that a national project will happen. We’re also not strongly specifying the level of secrecy, and the level of reliance on the private sector. Instead, our goal is to figure out what an “AI Manhattan project” would mean for AI scaling, given that such a possibility is on the cards.
How much compute could a national project muster?
At its peak, the Manhattan project cost 0.4% of GDP, which amounts to ~$122B today. Similarly, at the peak of the Apollo program, around 0.8% of US GDP (~$244B today) was dedicated to NASA. But how much compute can you buy with this money?
One way to approach this is to look at the price-performance of current GPUs. For example, the H100 GPU currently does around 2.2e10 FLOP/s/$ at FP16 precision. If there’s $244 billion of investment each year over the coming three years, then a 100 day training run should be around 1.2e29 FLOP.2 This is about 10,000 times the size of GPT-4’s 2.1e25, and around 2 years ahead of schedule.
As it turns out, this amount of investment is in line with a simple trend extrapolation of NVIDIA revenue from selling AI hardware to US companies. So the total buyable compute in the US should also follow from existing compute trends.
We can see this by projecting existing trends for total NVIDIA compute and hardware costs, and considering what happens if this compute is largely consolidated into a single national AI project. First, we note that the cost of hardware relevant for AI training in the US today is around $99B. This is obtained by multiplying NVIDIA revenue to the US’ share of global AI compute in each year, and accounting for depreciation.3
The cost for future hardware is determined similarly, at around $230B annually on average. This comes from combining projections of future revenue with the US’ share of global AI compute, assuming that this stays roughly the same as today, and that hardware costs will be around 80% the total cost of ownership.
Collectively, this would add up to around 27 million H100 equivalent GPUs. If used over 100 days, this would add up to around 3e29 FLOP, which is in the same ballpark as our previous estimate of 1.2e29 FLOP.
Will there be enough power to support this?
The calculations in the previous section suggest that a national project could obtain a lot of compute, but that’s only part of the story. In particular, will there be enough power for the project to use all this compute? Here we only focus on the centralized training run possibility, because this is the worst case scenario for the feasibility of a training run far larger than implied by existing compute trends.4
We first need to determine how much power such a project actually needs. By extrapolating trends in hardware efficiency, we find that the 27 million H100es would require 7.4 GW.5 This is equivalent to more than the average continuous power necessary to serve New York City (5.7 GW), and about the capacity of the world’s largest nuclear plant (7.1 GW).6
Whereas many aspects of building a chip cluster can be very substantially accelerated compared even to aggressive past examples, this seems less likely to be true for building out power infrastructure.7 For example, a previous Epoch report found that >3 GW power plants will take around 5 years to build based on historical precedent. But we expect the landscape would change very substantially with government backing.
A 7.4 GW cluster could be supported with only the already newly planned 8.8 GW of gas-fired generation capacity to be installed in 2027, which had not been planned by 2021 – a large portion of this new power is destined for data centers anyway. These new power additions can likely be concentrated in one place, given historical examples of around 10 GW power generation within a very small region.8 Using typical estimates the installed power might cost about $8B, leaving plenty of headroom to pay a premium for speed, with orders prioritized by invoking the Defense Production Act (DPA).910
Alternatively, already underway projects like those in Abilene or Homer City could be accelerated and supplemented with requisitioned surrounding power from nearby plants also provisioned by the DPA.1112 Bringing the already planned capacity of each up to 7.4 GW would take only around 2% and 6% of the states’ current power generation capacities.13 However the method, it seems clear that energy to support a centralized run of this size or possibly substantially larger could be supplied by a national project without so much difficulty that it would be a bottleneck compared to chips for a while.
Discussion
So what does this actually tell us about the future? First, it seems pretty clear that a unified national project could very substantially accelerate AI progress. Whereas a previous Epoch report analyzed whether or not 4x/year compute growth be sustained until 3e29 FLOP by the end of the decade, we find that this could be accelerated to the end of 2027. Moreover, this would be achievable at an annual cost of less than NASA during the Apollo program, as a fraction of US GDP. Even without substantial help otherwise, only consolidating compute resources under one organization would likely move scaling about a year counterfactually forward.14
Our estimates are also based on newer data than the analysis performed by Leopold Aschenbrenner in Situational Awareness. Using this updated data, we find that a data center of the scale expected for roughly 2028 could train a model about 1 OOM larger than estimated in that piece.
Of course, there are many reasons to be skeptical of this analysis. One way in which this analysis might be too aggressive is that the conditions for a national project might involve a war scenario, e.g. over Taiwan. This would for example make it less likely that NVIDIA would be able to supply all the relevant hardware, although it’s also possible that these wartime scenarios would increase the incentives to push hard on the national project.
Another factor that makes such a project challenging is serial time bottlenecks. Thus far we’ve framed our analysis around substantial compute scaling by 2027, increasing training compute by around three orders of magnitude. But in practice, scaling things up so much tends to involve things like scaling experiments and “derisking runs” that take time to perform. For example, these considerations are perhaps part of the reason why the GPT-4.5 training run took longer and a lot more human effort than expected.
It’s also possible that this analysis doesn’t apply in practice, since a national project might never even happen. And even if there are substantial attempts at a national project, we’ve elided all the major difficulties in obtaining all the necessary compute, such as geopolitical issues from a wartime scenario or substantial price effects from increased compute demand.
Nevertheless, given the historical importance of AI scaling, precedent for major national projects and energy expansion, and stated enthusiasm for an AI Manhattan project, it would be unwise to write off such a possibility. And from what we’ve seen, it seems fairly plausible that an intense effort with substantial investment and political coordination could indeed result in a major acceleration in compute scaling.
We would like to thank Trevor Chow for the conversations that motivated this post and Yafah Edelman, Josh You, Jaime Sevilla, JS Denain, Jannik Schilling, Duncan McClements, and Jack Whitaker for their feedback.
Leave a comment on our website!
Do you have comments, questions, or an idea you think we should explore? Visit the commenting section on our website’s version of this issue.
This follows from the following calculation: 0.8 * (1 + 1.3 + 1.3^2) * $244B * 2.2e10 FLOP/s/$ * 100 days * 24 h/day * 3600s/h * 0.4 utilization * 2 FP/8/FP16 = 1.2e29 FLOP. The 1.3 factor is the annual improvement in FLOP/s/$, and and the factor of 0.8 accounts for non-hardware costs.
We estimate this by multiplying the data center revenue of NVIDIA in 2023 and 2024 by the US’ share of global AI compute in each year, and assume straight-line depreciation over 4 years. We can also adjust this to account for Google TPUs, but it seems unlikely that this would substantially change our conclusions, since they only accounted for around 25% of AI computing capacity as of mid-2024. In particular, it still seems true that substantial accelerations by >1000x relative to today are possible.
In particular, centralized training runs are subject to power supply constraints in a way that decentralized runs are not – there are challenges to moving a lot of power to the same geographic location. We do not consider other bottlenecks here.
This is assuming power usage effectiveness of 1.1, utilization of 0.4, that hardware becomes 40% more energy efficient each year, that each chip is on-trend for in the year it is shipped, and that a current 8-chip H100 server requires 10,200 watts.
New York City’s average annual energy consumption is 50,000 TWh, and we divide this by 24 * 365 = 8760 hours to arrive at the power requirements. The peak draw is about double this annual average.
For example, xAI’s Colossus cluster was built in just over 4 months, while 16 months for a similar project is typically considered very impressive.
Previous proof-of-possibility for that very large scale gas-fired generation capacity can be substantially accelerated can be seen from the Egypt Megaproject.
The DPA was invoked over 100 times during the COVID pandemic.
The distance from The Comanche Peak Nuclear Power Plant with a 2.2 GW capacity to Abilene is only 128 miles, from a Jewett coal plant with 1.8 GW capacity only 251 miles, from Forney Energy Center with 1.8 GW capacity only 213 miles. The distance from Homer City to Beaver Valley Nuclear Generating Station with 1.8 GW capacity is only 86 miles, and to Conemaugh Generating Station with 1.9 GW capacity only 32 miles. All distance estimates are along roads. We assume the currently planned capacities are 3.4 GW in Abilene across three phases and 4.5 GW in Homer City, though plans are somewhat unclear and changing. We assume 155 GW and 48 GW for the current generation capacities. Requisitioned plants’ capacities would probably be replaced by capacity from adjoining regions and built back over time, but it seems the original Manhattan project was also comfortable simply forcing homeowners to move, so that’s also a possibility.
4.5 GW for Homer City and 3.4 GW for Abilene are already planned. Locals would barely feel the impact as seized power capacity could be replaced by further plants and so on to spread impact across a large region.
Even if energy can’t be accelerated and the project keeps the ratio of computing resources to the largest run constant, consolidating resources would multiply the compute under the largest project by 4-5 times, which would presumably show up at some point.