Why Big Tech Wants to Make AI Cost Nothing

Earlier this week, Meta both open sourced and released the model weights for Llama 3.1, an extraordinarily powerful large language model (LLM) which is competitive with the best of what Open AI’s ChatGPT and Anthropic’s Claude can offer.1

Llama 3.1 Performance

The terms of release (as with previous Llama releases) are incredibly generous. Any organization with fewer than 700 million monthly active users can do with Llama 3.1 as they wish meaning that virtually all companies not named Google, Apple, Microsoft, or Netflix can start baking Llama directly into their products.

But why would Meta release this model for free? Is it out of pure altruism? Perhaps a ploy to improve the optics for a company that for the past several years has borne the brunt of bipartisan political anger over privacy concerns?

The apparent magnanimity of this release reminded me of a very classic business strategy in Silicon Valley - “commoditize your complement”. Best articulated by Joel Spolsky’s “Strategy Letter V” from 22 years ago, the idea is that as the value of a product’s complement goes down towards the lowest sustainable “commodity price”, demand for that product in turn goes higher.2

For example, in the 90s, Microsoft made money by selling Windows operating systems, and so by making PCs as cheap as possible (by pushing for standardization/modularity of all PC components), Microsoft simultaneously decreased the cost of PC hardware components while increasing demand for its software product. Likewise, when Google released its Android operating system to smartphone vendors, it wasn’t necessarily acting in the interests of those manufacturers. In fact, smartphones - quite possibly the most technologically advanced hardware of all time - became commoditized to the point where just about anyone in the third world could buy one for under $20, reducing profit margins for manufacturers to razor thin levels. The expansion of smartphones however, expanded the market for Google’s search product and ad sales far beyond the size of the desktop computer market.3

LLMs are being commoditized

Right now, I believe we are at a similar crossroads with general purpose Large Language Models (LLMs). According to a recent Sequoia article, the cost to recoup recent AI spending on NVIDIA GPU based data center spending alone is upwards of $600 billion dollars.4 To recoup those enormous costs, businesses need to make well above that number to justify the spending, yet OpenAI’s subscription revenue is reportedly just around $3.4 billion5, with other players likely falling well behind. With Meta releasing an essentially free LLM that is both open-source and open weight, freely accessible to all at meta.ai, we would expect that value to go down and not up over the next several months.

Yet there are even bigger models on the horizon. According to Jensen Huang’s March 2024 GTC Keynote, it only takes roughly 8000 H100 GPUs 90 days to train a 1.8 Trillion Parameter Mixture-of-Experts GPT-4 scale model.6 According to Meta’s Llama whitepaper7, Llama 3.1 405 B model was pre-trained using 16,000 H100 GPUs in 54 days.

“Failure Rates over 54 days of pre-training”
The challenges of training Llama 3.1 on 16,000 H100 GPUs

Despite the enormous technical challenges of training such a large model, according to Meta’s engineering blog, by the end of 2024, Meta will have the equivalent of 600,000 H100s on tap!2. If all of these were somehow dedicated to pre-training LLMs (rather than say inference or building recommender systems for Instagram Reels), that’s the equivalent of having the ability to spit out 75 GPT-4 scale models every 90 days or about 300 such models every year!

This means that (assuming scaling laws persist, and more tokens can be sourced) the next generation of multi-modal hyperscale transformer models which are currently being trained will dwarf what came before… and it’s quite possible newcomers like OpenAI and Anthropic might not even be able to contend with whatever larger companies like Meta release next. Even nation-states like China can cower at the might of 600,000 H100s!

Meta has not disclosed how many next-gen Blackwell GPUs it intends to purchase
According to Jensen, Blackwell GPUs go Brrrrr

And Meta’s not the only big tech company open sourcing LLM models. NVIDIA has released Nemotron-3 (340 B), Microsoft has released Phi and Florence models, Google has released Gemma, and even smaller companies like Cohere and Mistral have released their model weights.

What is the complement to LLMs?

So if multiple players are giving away LLMs for free, what is the natural complement to an LLM? For companies like Google, Microsoft, NVIDIA, and Amazon, the answer is simple - servers. Bigger models require more GPUs (or TPUs) to run, so if you rent out server space or sell GPUs, giving away “AI” for free is good business (safety concerns be damned!).

What’s interesting about the recent Llama 3.1 release is that Meta doesn’t rent out its servers. In fact just about every major cloud provider - AWS, Google Cloud, Azure - stand to benefit from the Llama 3.1 release monetarily in a bigger way than Meta since they can immediately start renting out their data centers to smaller companies running this larger Llama model and fine-tuned derivatives for inference.

Zuckerberg provides some possible answers to the paradox of Meta being the company to open source the biggest LLM. One is standardization.1 Meta has a long legacy of open sourcing (and commoditizing) internal tooling (such as Presto and React) which subsequently became standardized in the marketplace.

There are some other more compelling reasons for open sourcing tools like Llama, however, which Zuckerberg gave in a talk with Bloomberg.8 That reason is user-content generation. By giving users the ability to create AI generated content and the means to independently fine-tune pre-trained models (which would otherwise be prohibitively expensive to train), the amount of unique user-generated content may go up, along with engagement with Meta’s platforms. This might be the end-goal for a company like Meta, which makes most of its money selling ads off its user network.

Another possible reason is that there simply is no real value in having a second-place general purpose LLM, particularly for a company like Meta which users may not trust enough to rely on for subscription API-based access. This is quite possibly the same conclusion reached by companies like Mistral, Cohere, Google, and others. In fact, at the very moment I’m writing this, Mistral just released its Mistral Large 2 model under a research license.

What will happen to the AI Startups?

The big losers in the commoditization of LLMs may ultimately be the current hot and distruptive AI startups - companies like OpenAI, Anthropic, Character.ai, Cohere, and Mistral. When the 5 largest companies on the SP500 start giving away your main product for free, a reckoning may be on its way.

The CEOs of the largest tech companies need not fear scale – they only need fear being out-innovated towards irrelevance.

There is still the question of whether the current path of scaling ever larger multimodal transformer models will ultimately lead to artificial general intelligence (AGI) or even artificial superintelligence (ASI). If these smaller companies have some sort of modeling or R&D edge that doesn’t simply involve having a massive number of GPUs, then perhaps there is still a chance they can outflank the megacorps. After all, OpenAI started out by doing fundamental R&D - DOTA2 bots, robotics, and research into reinforcement learning. The original GPT model was originally merely side-project. Perhaps these LLMs may even be a distraction from the fundamental research that will lead to more capable models and avenues of research.

Regardless, the sheer scale of the current infrastructure build-out gives me hope. The end of the dotcom bubble in 2001 was preceded by a massive infrastructure build out as well. The laying of fiberoptic cable and broadband infrastructure paved the way for Web 2.0 companies like Facebook and Google, even after a massive stock market collapse. And just like how that infrastructure build-out enabled things like cloud computing and streaming video, the current AI infrastructure buildout may also enable breakthroughs in other areas such as robotics, autonomous vehicles, and drug development.

Next step is terminators for sure
According to Jensen the next big thing is robots… definitely robots. Hopefully not terminators.

Citations:

Related Posts

Host Your Own CoPilot

GitHub Co-pilot is a fantastic tool. However, it along with some of its other enterprise-grade alternatives such as SourceGraph Cody and Amazon Code Whisperer has a number of rather annoying downsides.

Read more

All the Activation Functions

Recently, I embarked on an informal literature review of the advances in Deep Learning over the past 5 years, and one thing that struck me was the proliferation of activation functions over the past decade.

Read more

Python has too many package managers

Python is a wonderful programming language. I’ve used it to build webapps, deep learning models, games, and do numerical computation. However there is one aspect of Python that has been an inexcusable pain-in-the ass over many years.

Read more