Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124

Want more intelligent insights in your input mail? Sign up for our weekly newsletters to get only what matters to AI, data and security leaders. Subscribe now
The Japanese AI Lab Sakana AI has introduced a new technique that allows many large language models (LLM) to collaborate on one task, effectively creating the « dream team » by AI agents. The method, called Multi-Llm AB-MCT, allows models to make trials and mistakes and combine their unique strengths to solve problems that are too complex for each individual model.
For businesses, this approach provides a means of developing healthier and capable AI systems. Instead of being locked in one supplier or model, businesses can dynamically use the best aspects of different border models, assigning the right AI to the right part of the task of achieving excellent results.
AI models are developing rapidly. However, each model has its own different strengths and weaknesses, obtained from its unique data and architecture of training. One can distinguish himself with encoding, while another is distinguished by creative writing. Sakana Ai researchers claim that these differences are not a mistake but a characteristic.
« We see these biases and a variety of abilities not as restrictions, but as valuable resources for creating collective intelligence, » the researchers said in their blog post. They believe that just as human achievements of humanity come from different teams, AI systems can also achieve more by working together. « By uniting their intelligence, AI systems can solve problems that are insurmountable to any model. »
The new Sakana AI algorithm is the « scaling of the conclusion » technique (also called « scaling of test time »), an area of study that has become very popular in the last year. While most of the focus in AI is on « scanning learning time » (makes models larger and trains them in larger data sets), the scales of the conclusion improves productivity by allocating more computing resources after the model is already trained.
A common approach involves the use of training reinforcement to promote the models to generate longer, more detailed sequences of the COTE CONCERN (COT), as seen from popular models such as Openai O3 and Deepseek-R1. Another, more simple method is repeated by sampling, whereby the model is given the same prompted repeatedly to generate various potential solutions, similar to a brain attack session. Sakana AI’s work combines and progresses these ideas.
« Our framework offers a smarter, more strategic version of Best-of-No (also known as a repeated sample), » Takua Akiba, a scientific scientist from Sakana AI and co-author of the article, told Venturebeat. « It complements the techniques of reasoning such as Long Cot by RL. By dynamic selecting the search strategy and the right LLM, this approach increases the efficiency in a limited number of LLM calls, providing better results to complex tasks. »
The core of the new method is an algorithm called Adaptive Banking Monte Carlo Tree Search (AB-MCTS). It enables LLM to effectively perform test and errors by intelligently balancing two different search strategies: « search deeper » and « search wider ». Demand deeper involves making a promising answer and repeatedly refined it, while demanding wider means to generate completely new solutions from scratch. AB-MCTS combines these approaches, which allows the system to improve a good idea, but also to rotate and try something new if it hits a dead end or find another promising direction.
To achieve this, the system uses Monte Carlo’s Tree Search (MCTS), a decision -making algorithm known by Deepmind Alphago. At each step, AB-MCTs uses models of probability to decide whether it is more strategically refining the existing solution or to generate a new one.

Researchers have taken this step forward with Multi-Llm AB-MCTS, which not only decides « what » to do (refine it against generation), but also « who » LLM has to do so. At the beginning of the task, the system does not know which model is most suitable for the problem. It begins with a test of a balanced combination of available LLM and, as it progresses, learns which models are more efficient, distributing more than the workload for them over time.
The researchers tested their multi-LLM AB-MCTS system of the ARC-AGI-2 indicator. ARC (Abstraction and Reasoning Corpus) is designed to test human ability to solve new problems with visual reflections, which makes AI difficult.
The team uses a combination of border models, including O4-Mini, Gemini 2.5 Pro and Deepseek-R1.
The team team was able to find the right solutions for over 30% of the 120 test problems, a result that greatly superior to some of the models working independently. The system demonstrates the ability to dynamic misappropriation of the best model for a problem. According to tasks where there was a clear path to solution, the algorithm quickly identified the most effective LLM and uses it more often.

It is more impressive that the team monitors cases where the models solve problems that were previously impossible for any of them. In one case, a solution generated by the O4-Mini model is incorrect. However, the system passed this insufficient attempt at Deepseek-R1 and Gemini-2.5 Pro, which were able to analyze the error, correct it, and ultimately give the correct answer.
« This shows that the Multi-Llm AB-MCT can flexibly combine border models to previously solve insoluble problems by pressing the boundaries of the achievable, using LLM as a collective intelligence, » the researchers wrote.

« In addition to the individual pros and cons of each model, the trend of hallucination can vary greatly among them, » Akiba said. « By creating an ensemble with a model that is less likely to hallucinate, it may be possible to achieve the best of the two worlds: powerful logical capabilities and strong grounding. Because hallucination is a major problem in the business context, this approach can be valuable to mitigate it. »
To help developers and businesses apply this technique, Sakana AI has released the main algorithm as an open source frame called TreeQuest, available under Apache 2.0 (usable for commercial purposes). TreeQuest provides a flexible API, allowing users to implement Multi-Llm AB-MCTS for their own tasks with custom evaluation and logic.
« While we are in the early stages of applying AB-MCT to specific business-oriented problems, our study reveals significant potential in several areas, » Akiba said.
Beyond the ARC-AGI-2 indicator, the team has successfully applied AB-MCTS to tasks such as complex algorithmic encoding and improving the accuracy of machine learning models.
« AB-MCTS can also be highly effective for problems that require ittering tests and errors, such as optimizing the performance performance indicators of existing software, » Akiba said. « For example, it can be used to automatically find ways to improve the delay of the response to a web service. »
Placing an open source tool can pave the way for a new class of more powerful and reliable AI applications.