New Inference-Time Scaling Technique Enhances Planning Accuracy in Large Language Models

Understanding Inference-Time Scaling in AI
As 2025 approaches, one of the major focuses in artificial intelligence (AI) is inference-time scaling. Various AI laboratories are engaging with this concept in unique ways. A notable contribution comes from Google DeepMind, which has introduced an innovative technique called “Mind Evolution.” This method aims to enhance the responses of large language models (LLMs) concerning tasks that involve planning and reasoning.
What is Inference-Time Scaling?
Inference-time scaling refers to techniques designed to enhance the performance of LLMs by allowing them to perform more internal "thinking" before generating responses. Instead of generating a response all at once, models can create multiple answers, analyze them, and refine their final outputs. This approach can lead to more accurate and reliable answers, particularly for complex queries.
Key Components of Mind Evolution
Mind Evolution hinges on two primary elements: search algorithms and genetic algorithms.
Search Algorithms
Search algorithms play a crucial role in most inference-time scaling methods. They aid LLMs in identifying the best reasoning paths to reach optimal solutions to problems.
Genetic Algorithms
These algorithms draw inspiration from the concept of natural selection, creating and developing a group of candidate solutions based on a specific goal, known as the "fitness function." This evolution of solutions allows for more nuanced and effective responses.
How Mind Evolution Works
The process begins with creating a population of potential solutions in natural language. The LLM generates these solutions after being given a clear problem description and relevant instructions. It then evaluates each potential solution, making improvements as necessary.
- Selection of Parents: The algorithm chooses parent solutions based on quality, favoring higher-quality options more likely to be selected.
- Crossover and Mutation: New solutions are created through crossover, which involves combining elements from parent solutions, and mutation, where random changes are applied.
- Cycle of Refinement: This cycle of evaluation, selection, and combination continues until an optimal solution is found or until a set number of iterations has been completed.
Evaluating Solutions
A critical aspect of Mind Evolution is its evaluation function. Unlike other methods that require structuring a problem from natural language into formal representations, Mind Evolution allows for the utilization of natural language directly. This simplifies the problem-solving process and provides the LLM with textual feedback along with numerical scores, enabling targeted improvements.
Exploring Diverse Solutions
Mind Evolution adopts an "island" model to explore a wider range of solutions. During each stage, the algorithm groups solutions that evolve independently. The best solutions are then migrated between groups to foster collaboration and innovation between them.
Mind Evolution and Its Application in Planning Tasks
The effectiveness of Mind Evolution was tested against several baseline techniques, including:
- 1-Pass: The model generates only one answer.
- Best-of-N: Multiple answers are produced and the best one is chosen.
- Sequential Revisions+: This involves independent generation of candidate solutions, subsequently revised over numerous iterations.
In their testing, Mind Evolution significantly outperformed these baseline methods across various planning tasks, such as trip and meeting planning, illustrating its capability to generate quality solutions efficiently.
Performance Metrics
Testing found that traditional models like Gemini 1.5 Flash and others struggled with benchmarks like TravelPlanner. These models achieved low success rates, with only 5.6% and 11.7% success, respectively. In contrast, Mind Evolution achieved a 95% success rate on the same benchmark, demonstrating its superior capability, especially when tasks became more complex.
The researchers used Gemini 1.5 Flash for most experiments and explored a two-stage model strategy for increased efficiency. This model allowed for the use of higher-performing versions only when necessary, which further improved cost-effectiveness.
Mind Evolution also consumed fewer tokens compared to the nearest competing technique, Sequential-Revision+, while delivering high performance in a range of tests. The consistent outperforming of Mind Evolution as task difficulty increased emphasizes its robustness and potential for future applications in AI-driven problem-solving. This highlights the advantages of using an evolutionary strategy that integrates broad searches with deep solution refinements.