The Most Expensive AI Tool Is Often the Wrong One
By Quantiva Team

Sometime in the last two years, "AI" quietly became a synonym for "Large Language Model."
A client asks us to "add AI" to a workflow and what they most often mean is: "Put a chatbot on it," or "Pipe the data through GPT." The assumption has become so reflexive that proposing anything else can feel like evading the question.
But "AI" was never one thing. It's an envelope term, a toolbox. Inside it: mathematical optimization, classical machine learning, statistical forecasting, computer vision, reinforcement learning, plain deterministic rules. The LLM is one tool in that box. For a large class of business problems, it's the wrong one, and the better tools are older, cheaper, faster, more predictable, and easier to validate.
The trouble is that the most visible tool has become the default tool, regardless of fit. It's also often the most expensive and least predictable one available.
Where LLMs Measurably Fall Down
Start with the things companies often try to automate: optimization, forecasting, and numeric reasoning. These are exactly where LLMs are weakest.
Take routing, scheduling, seating charts, staffing plans, and other problems where a system has to find the best arrangement from a huge number of possible combinations. In published benchmarks, even today's most advanced LLMs perform poorly on modest routing-style problems. They miss better answers, struggle as the number of variables grows, and sometimes violate hard constraints, such as vehicle capacity or scheduling limits, because they generate plausible-looking answers rather than actually solving the problem. A dedicated solver is built for that job. It can land within a fraction of a percent of the best possible answer, consistently.
Arithmetic is no better. A widely cited study showed that adding a single irrelevant sentence to a math word problem dropped accuracy by up to 65% across every state-of-the-art model tested. The authors concluded that these systems were not doing genuine logical reasoning in the way people often assume. They were pattern-matching against training data.
And they don't always repeat themselves. LLMs can return different answers to the same prompt, even when the settings are tightly controlled. For brainstorming, drafting, or summarizing, that flexibility can be useful. For anything that needs to be auditable, reproducible, or defensible to a regulator, it can be disqualifying.
Tool-Calling Is the Industry Admitting This
Here's the part that gives the game away.
Modern LLMs increasingly solve these problems by calling tools: invoking a calculator, running Python, querying a database, or sending an optimization problem to a dedicated solver. The model vendors built that capability precisely because they know the model should not be trusted to do the math, retrieve the facts, or optimize the schedule on its own.
That's worth sitting with for a moment.
When an LLM calls a solver to optimize something, the actual work is not being done by the AI everyone's excited about. It is being done by the solver. The LLM is a dispatcher: an expensive, probabilistic dispatcher standing in front of a tool that would have just worked.
Sometimes that dispatcher earns its keep. If a request arrives as unpredictable natural language, "sort out who sits where for the wedding, but keep my aunts away from my ex," the LLM is doing something genuinely useful: it's translating messy human intent into structured instructions. That's a real job.
But most business processes aren't messy or unpredictable. They're repeatable. You optimize the seating chart, the delivery routes, the staff schedule, the inventory forecast, eligibility check, pricing, and the rest based on a known set of fields, on a schedule, the same way every time. The problem is already specified: the inputs, the constraints, and the objective. So, at that point, what exactly is the LLM translating?
Nothing. You already know exactly which tool to call and which variables to send.
Route it through an LLM anyway and you've inserted five new ways to fail in front of a function that doesn't fail. The model has to classify the problem, pick the right tool, populate the variables without inventing or omitting a constraint, format the call correctly, and relay the answer back faithfully. Each step adds cost, latency, and a fresh chance for a wrong answer that looks right.
It's like hiring a translator to read you a document already written in your own language. Every time. And occasionally getting a word wrong.
For a known, repeated invocation, skip the middleman. Call the tool yourself.
The Tools That Actually Win
The alternatives aren't exotic. They're proven, and the evidence is decades deep.
Optimization and operations research. UPS's ORION routing system, built on operations research rather than language models, saves the company hundreds of millions of dollars a year. Airline crew scheduling, solved with mathematical optimization, has saved individual carriers tens of millions annually since the 1990s. These are the same kinds of problems people now want to throw an LLM at. The right answer was settled before LLMs existed.
Tree-based models for tabular data. For the structured, columnar data that runs most enterprises, models such as XGBoost and LightGBM often beat deep learning and LLMs outright. Multiple benchmarking studies have established this pattern, and LightGBM was central to top-performing solutions in the M5 forecasting competition using Walmart sales data.
Statistical forecasting. Four decades of forecasting competitions have shown that simple statistical methods can rival or beat more complex machine-learning methods on many real-world forecasting tasks, at a tiny fraction of the compute cost.
Deterministic rules. Where correctness is non-negotiable, eligibility logic, tax rules, safety checks, compliance thresholds, a rule engine is right 100% of the time, fully explainable, and free to run.
Smaller specialized models. For many classification tasks, a fine-tuned smaller model can run dramatically faster than an LLM while matching or beating it on accuracy. A fine-tuned BERT classifier runs roughly 20x faster than a small LLM on text classification, able to classify a million documents in an hour instead of a day.
When the LLM Is the Right Tool
None of this is anti-LLM.
For genuinely linguistic, unstructured problems, LLMs are extraordinary and nothing else comes close: summarizing documents, extracting data from contracts, classifying free text when there is little labeled training data, drafting, answering questions over a knowledge base, and sitting at the front of a system as a natural-language interface that translates human requests into structured calls to other tools.
The LLM is often the best front door but it's rarely the right back office.
That distinction matters. In a well-designed AI system, the LLM handles language, ambiguity, and messy human input. The rest of the system handles math, logic, constraints, data, and execution.
Why This Matters Now
This isn't academic. Depending on whose study you read, 30% to 95% of AI and generative-AI projects fail to reach production or show measurable return. Gartner has projected that a large share of generative-AI projects are abandoned after the proof-of-concept stage. S&P Global found the share of companies scrapping most of their AI initiatives jumped from 17% to 42% in a single year.
A meaningful slice of those failures trace back to the same root cause: a generative-AI tool deployed where a lightweight, specialized model would have done the job better, cheaper, and more reliably. The technology rarely failed. The choice of tool did.
The most valuable skill in applied AI isn't prompt engineering. It's knowing the toolbox well enough to reach past the LLM when something else is better suited, and recognizing that "we used AI!" should never be the goal. Solving the problem is.
If you are evaluating where AI belongs in your workflow, Quantiva can help separate the parts that need language intelligence from the parts that need speed, structure, accuracy, and repeatability. Get in touch.