Not a day goes by without another business use case for generative AI popping up in the media. But very little of the growing body of coverage mentions the fact that this revolutionary new tech demands vast volumes of data and a hefty amount of computing power to process all that material.
When an emerging technology has so much potential, such practical concerns do tend to be overlooked amid all the hype. As a result, few organisations will have looked in detail at what cloud capacity they might require to do the heavy lifting and what that could cost them. Moreover, cloud and data centre providers could soon find it hard to scale up their resources quickly enough to meet all the demands of advanced AI.
To work properly, generative AI needs to be fed huge volumes of data. As these systems continue to evolve, their consumption of data will increase exponentially. Training microchips, for instance, requires a tremendous amount of computing power. The advanced chips capable of running the large language models (LLMs) that generative AI is based on can require months of dedicated training time. Even where businesses use pre-trained models, fine-tuning these will still call for considerable computational clout. The demand for such processing power could easily outpace the supply.
Why AI is a cloud computing challenge
Séamus Dunne is managing director for the UK and Ireland at Digital Realty, a data centre operator. He says that “no one knows how much demand there’s going to be. But, if generative AI ends up being anything like the cloud when that first hit the scene, you can bet that there’s going to be a huge demand for it. Capacity, or the lack of it, will make or break this technology.”
Dunne adds that, while cloud costs will undoubtedly increase, “it’s important to understand that generative AI is a new frontier. There aren’t many legacy AI companies, so competition in this field will be fierce. Startups will emerge daily, all of which will be working on exciting new applications for businesses. We will get to a point where costs stabilise, but that will take time.”
A lot of money is flowing into both AI and cloud computing. Global spending on public cloud services is set to increase by 22% to nearly $600bn (£458bn) this year, according to Gartner, which cites generative AI as a key factor behind that growth. Bloomberg Intelligence has predicted that the market for generative AI will be worth $1.3tn within 10 years.
There are other reasons why generative AI is pushing up corporate IT expenditure. The technology requires high-grade central processing units and other microchips. These sell at a premium in any case, but supply shortages are putting upward pressure on their prices.
Coming to terms with higher cloud costs
In the short term, many firms may need to reconsider their generative AI aspirations once the cloud costs and capacity issues become clear to them. They may have to become more selective about which workloads they run in the cloud, with a focus on truly disruptive applications and those offering clear returns on investment and significant profit margins.
Expect tailored cost-optimisation methods to gain prominence. These will help firms to strike the appropriate balance between performance and cost. Dr Chris Royles, field chief technology officer for EMEA at Cloudera, notes that there are systems available to help them make the most efficient choices in this respect.
“Decisions about whether an AI workload is better suited to cloud-native deployment in a shared public cloud or an on-premise environment must be driven by good data. Workload analytics enable organisations to observe the performance of a workload before making a call one way or the other,” he says.
Options for mitigating the growth in cloud expenditure
The cloud and generative AI landscape is also developing quickly, especially as businesses and investors continue to show willing to spend on the technology. The so-called hyperscalers – Google, Amazon and Microsoft – are likely to offer more cost-efficient off-the-shelf models and cloud infrastructure to accommodate a growing AI ecosystem.
Rahul Pradhan is vice-president for product and strategy at Couchbase, a US firm specialising in database-as-a-service software. He predicts that the hyperscalers will “partner heavily with independent software vendors and emerging AI companies to provide a one-stop shop for AI infrastructure. This strategy will help organisations to cut costs by doubling down on the vendor, although this could come at the cost of vendor lock-in.”
Not all generative AI systems are created equal, of course, and some are significantly more efficient than others. Often the largest LLMs, which contain the greatest number of parameters, may not be the most appropriate for a given business. Instead, smaller LLMs can be used, which can result in faster fine-tuning with the firm’s proprietary data. This could deliver meaningful insights more quickly and at a lower cost, requiring fewer cloud resources.
“If not managed effectively, rapid scaling up to accommodate generative AI applications can lead to cost overruns, while being too conservative or slow may hinder model performance,” Pradhan notes. “It’s a fine balance.”
There are also companies trying to make generative AI more efficient. Firms such as Deci and d-Matrix are reorganising the neural architecture and redefining how memory is used in microchips. The aim is to help businesses consume less cloud computing power to achieve the same AI outputs.
Businesses will soon need to get smarter about how they manage their cloud resources if they want to optimise their investments in generative AI. Those that can do more with less stand to gain a crucial edge over their rivals.