[Note: Originally published in the Stanford Review]
In the sitcom that is Silicon Valley, the debate surrounding artificial intelligence has reached a fever pitch, and nothing seems to divide people more strongly than the future of open-source AI. Depending on where in the valley you work, open-source is either a futile endeavor or God's greatest gift to mankind. The truth likely lies somewhere in the middle.
The argument for open source’s futility is based on the scaling hypothesis, the idea that better AI models will be larger and necessitate more data, compute, and ultimately, cost. For reference, the amount of compute used in frontier models has been increasing ~4x a year, and GPT-4 cost over $100 million to train. Major AI lab CEOs are predicting $1 billion and even $10 billion training runs. Only a select number of closed-source model providers (Google, Microsoft, etc.) will be able to underwrite these ever-increasing costs. Additionally, many closed-sourced providers can train models more efficiently because of their differentiated negotiating power, access to compute, and scaled cloud infrastructure. While costs are rapidly decreasing, the rate should slow if fewer organizations can effectively commoditize the frontier by open-sourcing.
It’s also possible that only closed source will be able to forge the data partnerships needed to train at the cutting edge in the first place—the data needed to create high-quality AI can’t simply (or legally!) be scraped from the web, it must be bought from third parties (e.g., Reddit data). It’s unclear if open-source consortiums can muster the coordination to compete here.
The open-source providers that can train at the frontier (Meta) would ultimately be forced to “close” or risk hemorrhaging their balance sheets as the commodification of their models makes them unable to amortize the spend for their training runs. Closed frontier models will have more pricing power, and the cost of lagging edge models will decrease primarily with the cost of compute/inference.
Ultimately, there is a significant chance that cutting-edge foundation models end up closed and exclusively in the hands of a few well-resourced enterprises. A large chunk of value, and even more of the capture, will come from closed-source frontier models that continuously eat their way up the value chain.
This sounds like a damning indictment of open-source—it needn’t be. Larger and more costly models necessitate higher variable costs (inference) passed along to customers. If you spend $10 billion on a training run, it must be paid for. Current frontier models demonstrate this reality, GPT-4 costs 20x as much per token compared to GPT 3.5. Moreover, the additional performance doesn’t matter for many use cases, and developers will opt for the cheaper model.
As model performance improves and costs rise, this tradeoff will become increasingly acute and cost sensitivity will increase. Microsoft’s Copilot product, for example, already uses a mix of GPT-4 and smaller, cheaper Phi models.
Multiple models are the direction we are headed towards—Satya Nadella, CEO of Microsoft, has commented on his excitement for cheaper, custom “models-as-a-service.” Ultimately, the frontier will be expensive and not uniquely useful for many high-volume, narrowly defined applications—this is where open source can compete.
The massive amount of developer talent and optimization in open-source will enable the creation of highly specialized and inference-optimized models. For any niche that doesn’t require maximally performant models, the open-source community will be able to create fine-tuned models that push the performance/cost frontier for a given task. Moreover, the ability of the open source community to cover a wide variety of use cases, get close to the end user, and iterate fast to develop optimal models is unmatched—major providers simply can’t cover and iterate with the necessary pace on all fronts.
This kind of dynamic already exists in another major technology: semiconductors. Currently, expensive, cutting-edge chips (the two-nanometer ones that go in iPhones) are dominated by Taiwan Semiconductor Manufacturing Corporation (TSMC). Historically, as research costs and intensity in the industry increased, the number of cutting-edge fabrication plants decreased. Just ten years ago, we had three such manufacturers. Now, TSMC stands alone. This is logical: new plants cost billions of dollars, and machines must run 24/7/365 to make up costs. The cutting-edge fabs have captured the most value/profit; TSMC is a half-trillion-dollar company. That said, while TSMC has a sizable revenue share, it represents only a small percentage of total semiconductor volume.
Most chips, around 95%, are cheap and serve legacy needs like those for cars and industrial equipment. These chips are built on mature, deprecated nodes (18 nanometers and larger) and are all highly specialized. Texas Instruments alone has over 50,000 different semiconductor products. Despite the variety, lagging edge chips remain cheap, widely manufactured commodities. The parallels to AI are striking. Already, Azure provides access to 1700 distinct models—the vast majority of which are not at the frontier. High-performance, low-volume products capture the most value, attention, and investment. Yet, the long tail of cheap, lagging edge and specialized products will ultimately account for much of the volume and core functionality.
Interestingly enough, US dependence on these mature Chinese chips appears to be an emerging national security issue in its own right. Similar concerns show up in the open-source debate. Many argue against open source, citing national security fears and US-China competition. There is merit to this—better models are useful in training better models. That said, for now, the quality of models coming out of strategic competitors like China is competitive with US open source. If performance gaps were to widen, the conversation would change. But, at the present moment, bans seem premature. This appears especially true given the scale of existing IP theft.
American adversaries can’t stop state actors from stealing model weights—they may already have access. Banning open source on national security grounds would do little to slow our rivals but would kneecap a nascent (and valuable) domestic industry. Let’s not make that mistake.