After ChatGPT’s explosive popularity, the AI battle between tech giants Google and Microsoft has spread to a new field—server chips.
Today, AI and cloud computing have become fiercely contested territories, and chips have emerged as the key to reducing costs and winning over business clients.
Originally, major companies like Amazon, Microsoft, and Google were best known for their software. But now, they are investing billions of dollars in the development and production of chips.
As ChatGPT takes the world by storm, major companies kick off a chip battle royale.
According to reports from The Information and other sources, these three companies have already launched or plan to release eight server and AI chips for internal product development, cloud server rentals, or both.
“If you can manufacture silicon optimized for AI, there’s a huge victory waiting for you,” says Glenn O’Donnell, a director at research firm Forrester.
Will these enormous efforts be rewarded?
The answer is not necessarily.
Intel, AMD, and Nvidia can benefit from economies of scale, but for large tech companies, the situation is far from the same.
They also face many daunting challenges, such as hiring chip designers and convincing developers to build applications using their custom chips.
However, these major companies have already made notable progress in this field.
According to published performance data, Amazon’s Graviton server chip and the AI-specific chips released by Amazon and Google are already on par with traditional chip manufacturers in terms of performance.
The chips that Amazon, Microsoft, and Google develop for their data centers mainly come in two types: standard computing chips and dedicated chips for training and running machine learning models. It is the latter that powers large language models like ChatGPT.
Previously, Apple successfully developed chips for the iPhone, iPad, and Mac, improving the processing of some AI tasks. These major companies may be drawing inspiration from Apple’s success.
Among the three giants, Amazon is the only cloud service provider offering both types of chips in servers, thanks to its 2015 acquisition of Israeli chip designer Annapurna Labs.
Google launched a chip for AI workloads in 2015 and is developing a standard server chip to improve the performance of Google Cloud servers.
In contrast, Microsoft started its chip research and development later, in 2019, and has recently accelerated the timeline for the launch of an AI chip specifically designed for LLMs.
The explosion of ChatGPT has ignited global excitement for AI, further propelling the strategic transformation of these three major companies.
ChatGPT runs on Microsoft’s Azure cloud, using tens of thousands of Nvidia A100s. Both ChatGPT and other OpenAI software integrated into Bing and various programs require so much computing power that Microsoft has already allocated server hardware to the AI development team.
At Amazon, CFO Brian Olsavsky told investors in a conference call last week that Amazon plans to shift spending from its retail business to AWS, partly due to investing in the infrastructure needed to support ChatGPT.
At Google, the engineering team responsible for manufacturing Tensor Processing Units (TPUs) has moved to Google Cloud. Reportedly, the cloud organization can now set roadmaps for TPUs and the software running on them, hoping to get cloud customers to rent more TPU-driven servers.
Google: AI-tailored TPU V4
As early as 2020, Google deployed the most powerful AI chip at the time, the TPU v4, in its data centers.
However, it was not until April 4th of this year that Google first revealed the technical details of this AI supercomputer.
Paper: TPUv4 system has an optically reconfigurable network to assemble groups of 4x4x4 chips like legos (4x4x12? 16x16x16?). SparseCores help w/ embeddings. TPUv4 outperforms TPUv3 by 2.1x & perf/W by 2.7x, & has 4096 chips so ~10x faster overall.
.https://t.co/24gclhXpuQ— Jeff Dean (@🏡) (@JeffDean) April 5, 2023
Compared to the TPU v3, the TPU v4’s performance is 2.1 times higher, and after integrating 4096 chips, the supercomputer’s performance has increased tenfold.
At the same time, Google claims that its chips are faster and more energy-efficient than Nvidia’s A100. For systems of comparable scale, the TPU v4 can deliver 1.7 times the performance of the Nvidia A100 while improving energy efficiency by 1.9 times.
For similar-scale systems, the TPU v4 is 1.15 times faster than the A100 on BERT and about 4.3 times faster than the IPU. For ResNet, the TPU v4 is 1.67 times faster and about 4.5 times faster, respectively.
Additionally, Google has hinted that it is developing a new TPU to compete with Nvidia’s H100. Google researcher Jouppi told Reuters in an interview that Google has a “production line for future chips.”
Microsoft: Secret Weapon Athena
Regardless, Microsoft is still eager to participate in the chip fray.
Previously, it was reported that a secret 300-person team at Microsoft had been developing a custom chip called “Athena” since 2019.
According to initial plans, “Athena” would be built using TSMC’s 5nm process, expected to reduce the cost of each chip by a third.
If widely implemented next year, Microsoft’s internal and OpenAI teams could leverage “Athena” to complete both model training and inference simultaneously.
This would greatly alleviate the shortage of specialized computers.
Bloomberg reported last week that Microsoft’s chip division has been working with AMD to develop the Athena chip, which led to a 6.5% increase in AMD’s stock price on Thursday.
However, an informed source stated that AMD is not involved but is developing its own GPU to compete with Nvidia. AMD has been discussing chip design with Microsoft because Microsoft expects to purchase this GPU.
Amazon: Already One Step Ahead
In the chip race against Microsoft and Google, Amazon seems to have already taken a lead.
Over the past decade, Amazon has maintained a competitive edge over Microsoft and Google in cloud computing services by offering more advanced technology and lower prices.
In the next ten years, Amazon is also expected to maintain its advantage in the competition through its internally developed server chip, Graviton.
As the latest generation of processors, the AWS Graviton3 has up to a 25% increase in computing performance compared to its predecessor, and its floating-point performance has doubled. It also supports DDR5 memory, with a 50% increase in bandwidth compared to DDR4 memory.
For machine learning workloads, the AWS Graviton3 has up to 3 times the performance of its predecessor and supports bfloat16.
Based on the Graviton 3 chip, cloud services are in high demand in some regions, even reaching a state of supply shortage.
Another advantage of Amazon is that it is currently the only cloud provider to offer both standard computing chips (Graviton) and AI-specific chips (Inferentia and Trainium) in its servers.
As early as 2019, Amazon introduced its own AI inference chip, Inferentia.
It allows customers to run large-scale machine learning inference applications in the cloud at a low cost, such as image recognition, speech recognition, natural language processing, personalization, and fraud detection.
The latest Inferentia 2 has tripled its computing performance, quadrupled the accelerator’s total memory, quadrupled its throughput, and reduced latency to one-tenth.
Following the launch of the first-generation Inferentia, Amazon released its custom chip designed primarily for AI training, Trainium.
It is optimized for deep learning training workloads, including image classification, semantic search, translation, speech recognition, natural language processing, and recommendation engines.
In some cases, customizing chips can not only reduce costs by an order of magnitude and reduce energy consumption to one-tenth but also provide better service to customers with lower latency.
Disrupting Nvidia’s monopoly won’t be easy
However, so far, most AI workloads still run on GPUs, with the majority of chips produced by Nvidia.
According to previous reports, Nvidia has an 80% market share in the standalone GPU market and a 90% market share in the high-end GPU market.
For 20 years, 80.6% of the world’s cloud computing and data centers running AI have been powered by Nvidia GPUs. In 2021, Nvidia stated that about 70% of the world’s top 500 supercomputers are powered by their chips.
Now, even the Microsoft data centers running ChatGPT use tens of thousands of Nvidia A100 GPUs.
All along, whether it’s top-tier ChatGPT, Bard, Stable Diffusion, or other models, they are all powered by the Nvidia A100 chip, which costs about $10,000 each.
Moreover, the A100 has become the “mainstay” for AI professionals. The 2022 AI Status Report also lists some companies using A100 supercomputers.
It’s clear that Nvidia has monopolized global computing power, dominating the market with its chips.
According to industry insiders, compared to general-purpose chips, the application-specific integrated circuit (ASIC) chips that Amazon, Google, and Microsoft have been developing are faster and consume less power when executing machine learning tasks.
O’Donnell, a director, made a comparison between GPUs and ASICs: “For everyday driving, you can use a Prius, but if you need four-wheel drive in the mountains, a Jeep Wrangler is more suitable.”
Despite their efforts, Amazon, Google, and Microsoft all face challenges—how to persuade developers to use these AI chips?
Currently, Nvidia’s GPUs dominate the market, and developers are already familiar with its proprietary programming language, CUDA, used to create GPU-driven applications.
If they switch to custom chips from Amazon, Google, or Microsoft, they would need to learn a whole new software language. Would they be willing to do so?
Author:Com21.com,This article is an original creation by Com21.com. If you wish to repost or share, please include an attribution to the source and provide a link to the original article.Post Link:https://www.com21.com/cloud-giants-battle-for-ai-chip-dominance-amazon-google-and-microsoft-challenge-nvidias-monopoly-in-next-generation-computing.html