The ‘brownie recipe problem’: why LLMs must have fine-grained context to deliver real-time results

Today’s LLMs excel at reasoning, but can still struggle with context. This is especially true in real-time ordering systems like Instacart.

Instacart Cto Anirbanan kundu calls it the "brownie recipe problem."

It’s not as simple as telling LLM « I want to make brownies. » To be truly helpful in meal planning, the model needs to go beyond this simple directive to understand what’s being marketed to the consumer based on their preferences—say, organic eggs versus regular eggs—and factor that into what can be delivered in their geography so the food doesn’t spoil. This among other critical factors.

For Instacart, the challenge is to match latency with the right mix of context to deliver experiences that ideally take less than a second.

« If the reasoning itself takes 15 seconds, and if every interaction is that slow, you’re going to lose the user, » Kundu said at a recent VB event.

Reasoning mixing, real world state, customization

In grocery delivery, there is a « world of reasoning » and a « world of state » (what is available in the real world), Kundu noted, both of which must be understood by the LLM along with user preferences. But it’s not as simple as loading a user’s entire purchase history and known interests into an inference model.

“Your LLM will blow up to a size that will be unmanageable,” Kundu said.

To avoid this, Instacart splits processing into parts. First, data is fed into a large underlying model that can understand intent and categorize products. This processed data is then fed into small language models (SLMs) designed for cataloging context (the types of foods or other items that work together) and semantic understanding.

In the case of a catalog context, the SLM must be able to handle multiple levels of detail around the order itself as well as the different products. For example, what products go together and what are their respective substitutes if the first choice is out of stock? Those substitutions are « very, very important » to a company like Instacart, which Kundu said has « over double-digit cases » where a product is not available in the local market.

In terms of semantic understanding, let’s say a buyer wants to buy healthy snacks for children. The model needs to understand what a healthy breakfast is and what foods are appropriate and appealing to an 8-year-old child, then identify the relevant products. And when those particular products are not available in a given market, the model must also find related subsets of products.

Then there is the logistical element. For example, a product like ice cream melts quickly, and frozen vegetables also don’t do well when left outside at higher temperatures. The model must have this context and calculate an acceptable delivery time.

“So you have this understanding of intent, you have this categorization, then you have this other part about logistics, how do you do it?” Kundu noted.

Avoiding « monolithic » agent systems

Like many other companies, Instacart has been experimenting with AI agents, finding that a combination of agents works better than a « single monolith » that performs multiple different tasks. Unix’s philosophy of a modular operating system with smaller, focused tools helps deal with different payment systems, for example, that have different failure modes, Kundu explained.

« Having to build all of that in one environment was very cumbersome, » he said. Additionally, back-end agents communicate with many third-party platforms, including point-of-sale (POS) and catalog systems. Naturally, not all of them behave in the same way; some are more reliable than others and have different update intervals and channels.

« So we can deal with all of these things, we’ve gone down this path of micro agents rather than agents that are mostly large in nature, » Kundu said.

To manage agents, Instacart integrates with OpenAI’s Model Context Protocol (MCP), which standardizes and simplifies the process of connecting AI models to various tools and data sources.

The company also uses Google’s Universal Commerce Protocol (UCP) open standard, which allows AI agents to interact directly with commerce systems.

However, Kundu’s team still faces challenges. As he noted, it’s not a question of whether integration is possible, but how reliably these integrations behave and how well they are understood by users. Discovery can be difficult not only in identifying the services available, but also in understanding which ones are suitable for which task.

Instacart had to implement MCP and UCP in « very different » cases, and the biggest problems they encountered were failure modes and latency, Kundu noted. « The response times and understanding of the two services are very, very different, I’d say we spend probably two-thirds of the time fixing those bug cases. »

Orchestration,Infrastructure

#brownie #recipe #problem #LLMs #finegrained #context #deliver #realtime #results