Enterprise AI agents keep failing because they forget what they learned

RAG architectures are good at one thing: displaying semantically relevant documents. They stop there too.

A framework called the decision context graph addresses this gap by providing agents with structured memory, time-sensitive reasoning, and clear decision logic. Rippletide, a startup in the Neo4j ecosystem, built one. The key capability: non-regressive agents can freeze validated sequences of actions and combine them over time.

« The key point you want is non-regressivity: How do you make sure that when the agent generates something new, you can combine the previous discoveries? » said Jan Billien, Rippletid Co-Founder and Chief Scientific Officer.

Why RAG doesn’t go far enough

The enterprise context is scattered across ERP tools, logs, databases, vector stores, and policy documents. Generative AI tools can extract from anything—through keyword searches, SQL queries, or full RAG feeds—but extraction has a ceiling.

It should be noted that the retrieved data may not be relevant to the decision (thus causing hallucinations); and even if agents extract the right data, they often lack guidance for making decisions supported by strong reasoning.

That is, RAG retrieves documents, not decision context. « Everyone starts with RAG: Download the relevant documents, fill them in at the prompt, let the model figure it out, » said Wyatt Mayham of Northwest AI Consulting.

While this works well for chatbots, it « immediately breaks down » for agents who need to make decisions and take action, he pointed out. « The biggest thing that builders struggle with is the difference between extraction and applicability. »

The retrieved document doesn’t tell the agent if it’s still in effect, if it’s been superseded or if there’s a conflicting rule that takes precedence, Mayham said. « Agents need context for decision making, not just information. »

In construction (the human world), this might mean knowing that a pricing exception has expired, that a safety policy only applies in certain jurisdictions, or that a standard operating procedure was updated a month ago. « Miss any of that and the agent is confidently doing the wrong thing, » Mayham said.

Without a structured decision context, agents combine incompatible rules, invent constraints to fill gaps, and rely on what Billien calls "probabilistic assumptions on unconstrained data." Errors are difficult to reproduce because builders cannot track why the agent made a given choice.

The compounding error problem is also real, Mayham said: A small percentage of misses per step becomes « catastrophic » in a multi-step workflow. « This is the main reason most enterprise agents never leave the pilot phase. »

How decision context graphs arrive at the corresponding answer

The decision context graph solves this by encoding a structured map of what applies, what the rules are, and when they apply.

The framework is optimized for one question: "Given this situation, which context is currently applicable?" Time is treated as a first-class dimension; each rule, decision and exception applies to the time it is valid.

« The goal is to explicitly address missing, inconsistent, or conflicting data when building the graph to avoid probabilistic (errors) once the agent is running, » Billien said.

The system is built around three principles:

Applicability: The logic is explicitly encoded so that the agent knows what rules to remember and apply in a given situation. The context is returned only when appropriate for the situation.
Time-Aware Memory: Every rule, decision, and exception is time-bound. This allows agents to reason "What was true then versus what is true now" then reproduce or explain your solutions.
Decision-making pathways: The system can explain how it got from A to B and "why" behind its rationale (eg why one part of the context is included and another is not). Agents are given "solution path" examples of how similar cases have been handled before.

In setup, unstructured data is ingested and structured into an ontology: what objects exist, what rules apply, what is considered an exception. Neurosymbolic AI handles pattern recognition and encodes formal, machine-readable logic. Over time, the system refines its knowledge base as new decisions are made.

« The neuro-symbol carries two parts: a neural part that gives agents great autonomy, and a symbolic part to reduce the amount of data needed and provide control, » Billien said.

The agent is tested during build (pre-production) to validate its behavior or identify improvements. This reduces risks as well as computational needs during inference, he noted.

Agents learn rather than regress

As for the lack of regression, the key part is mixing both intelligence (models) and knowledge (shared between agents), Billien said. It is important that agents can explore; when they don’t know how to complete a task, they can try different possibilities, usually in a controlled environment or simulation (such as a support bot trying multiple response patterns).

Then, « once a solution is judged to be satisfactory, the graph freezes that sequence of actions, » Billien said. Future research then starts from this « stable base of validated behaviors » to prevent newly acquired skills from replacing previously learned good behaviors.

Before an agent acts or affects a client, it checks the graph: Does it break a rule? Hallucinating? Stay within the limits? Can he generalize the decision in similar cases?

At the macro level, the system evaluates results: did the behavior improve long-term performance? Does it generalize to similar contexts? Did he retain his previous abilities?

« This determinism is key for agents to manage reliability at scale, » Billien said. This results in behavior that is more consistent, predictable, explainable and allows for greater control and verifiability.

« You want your agents to be able to learn on their own when they encounter something they don’t know, » he said. « You want them to be able to explore and find new solutions. »

Going beyond "episodic" memory

While the team initially assumed it would deploy RL everywhere, "which actually proved very difficult in a corporate environment," Billien said. "Data is sparse for some specific use cases and confusing for others."

Typically, using raw data to make reliable predictions is a manual and time-consuming challenge, but « now with agents, we’ve entered a new era where building ontologies is possible automatically, » Billien said.

Classical controlled fine-tuning methods can lead to hesitation when models forget the last skill they learned while learning the next tone. In general, the learning is not complex, the compression is « dramatic », and the models improve « episodically » rather than continuously, causing them to consistently fail on new or unseen tasks.

As Billien noted, « You’ll never have a fully self-learning model if you regress every time. »

In enterprise use cases — such as banking, where millions of transactions are processed per day — a high level of reliability is critical, he noted. « One question I ask all customers: Is 95% enough? In many use cases, it’s not. You need 99.999%. 1% off is too much. »

Contextual decision graphs can fill this gap, he argues: when the same customer support question is asked repeatedly, the agent will return a « satisfactory » answer predictably and without regression, while maintaining its autonomy.

Encoding applicability and temporal validity in a structured graph – rather than relying on LLM to infer it – is "healthy approach" to a real limitation in existing retrieval frameworks, Mayham said. The open question is whether automatic ontology generation can handle the messy, diverse data that enterprises actually have. "That’s always the hard part," he said.

Orchestration

#Enterprise #agents #failing #forget #learned