3 reasons why nobody is talking about the cost of generative AI

David Reed dives into the financial side of generative AI as costs can rapidly spiral if not monitored and there is a necessity to be accurate with the use of tools.
David Reed, DataIQ's Chief Knowledge Officer and Evangelist, hosting the 2024 DataIQ 100 and discussing generative AI.

All of these were prime examples of the ability generative AI (genAI) has to categorise extensive data sets, and provide insights in natural language, responding to queries in real-time and improving customer experience along the way, while also taking away a lot of manual effort from human staff.

Each of these represented exactly the type of potentially transformational solution that has seen organisations piling onto genAI platforms. Certainly, the early indications from these proof of concepts (PoCs) were positive and the teams involved seemed rightly proud of their work. 

But here’s the funny thing – nobody wanted to talk about how much it had cost to develop these pilots, let alone what expenditure would be required to roll them out and whether there would be a positive return on that investment. The most telling comment came from the media company’s in-house team who replied to my enquiry, “we’re Data Scientists – we don’t worry about the cost.”

Well, somebody needs to and there are growing rumbles of disquiet about the bills that are rolling in behind experiments, tests, and pilots. What is beginning to dawn on Chief Data Officers, Chief Information Officers and Chief Technology Officers is that genAI represents a triple threat to their budgets. Here are three reasons why talking about costs is hard. 

 

Tokens feel like micro-payments, rather than IT budget-busters

David Reed addressing a crowd about generative AI and the DataIQ 100
David Reed, DataIQ’s Chief Knowledge Officer and Evangelist, addressing a crowd of data leaders about generative AI and the DataIQ 100.

Tokens are the topline expense which most business-side uses of large language models (LLMs) will quickly encounter. In text-based models, one word equals one token. That does not sound like much and in the world of consumer use of genAI, few will reach the point where they will need to buy extra tokens. If they do, it will be via a series of micro-payments, just like buying tracks on iTunes. 

Once you delve into B2B use of genAI, for example when querying the call centre product library, you soon realise that these models involve running the entire product data corpus each time it receives a prompt (barring a level of classification and edge-querying on routine terms). Text scales fast and – when multiplied by a user base in the hundreds – will soon burn through budget.

The video analysis example is even more eye-watering. Instead of one word equalling one token, it would be one frame of video. With a typical frame rate of 24 frames per second, one minute of video would burn 1,440 tokens. Imagine a single insurance claim involving the upload of an entire journey’s dashcam footage, during which an accident occurred, and you realise the scale of the challenge.

 

Cloud costs are a hidden expense for generative AI

Most serious adoptions of genAI in B2B are taking place within private cloud tech stacks in order to prevent leakage of commercially sensitive or private information. These require a licence for the LLM involved (unless one has been built in-house) which then operates inside the corporate cloud.

To avoid causing the lights to dim across the rest of the organisation’s critical tech infrastructure, genAI will need its own cloud space – and LLMs involve massive data volumes and huge compute power. During the PoC phase, these parameters will be relatively constrained and the user base well defined.

Once a model goes into operation, however, and especially if it is made available enterprise-wide, costs scale rapidly alongside adoption. Demand for access to these tools within business processes will surge, which means IT budgets could quickly take a hammering. 

 

Users are not optimal with their prompts

The whole point of genAI can be found in the name of one of the best-known examples – ChatGPT. Interaction should feel like a casual conversation which does not require any coding, technical knowledge, or structure – you can just throw in a question, and it will try to provide the best possible answer.

One consequence of this is that little effort gets made to shorten the prompt-to-outcome process. Instead of one well-crafted query that returns the best answer, multiple iterations are involved. Each time, the model runs in full, and tokens get burned.

So why is nobody talking about these costs? Well, they may not be talking, but they are muttering in private. At a recent DataIQ Member event, one healthcare insurance provider admitted that it had turned off its private LLM specifically because of the cost. It is certain not to be the only organisation that has been forced into that decision, nor will it be the last.

No doubt there will be emerging practices and solutions to help control costs and support full deployments without destroying IT’s bank balance. For the moment and in their absence, it seems likely that genAI could stall because of the funding gap between a proof of concept and full-scale deployment.