Benchmark

Chinese AI Startup Manus Secures Funding from Benchmark with a Valuation of $500 Million

Chinese AI Startup Manus Secures Funding from Benchmark with a Valuation of $500 Million

ByDeepMind May 1, 2025 6:32 am

DeepSeek Introduces DeepSeek-Prover-V2: Enhancing Neural Theorem Proving with Recursive Proof Search and an Innovative Benchmark

DeepSeek Introduces DeepSeek-Prover-V2: Enhancing Neural Theorem Proving with Recursive Proof Search and an Innovative Benchmark

ByDeepMind April 30, 2025 10:24 pmApril 30, 2025 10:24 pm

GenCast by Google DeepMind Establishes New Benchmark in Medium-Range Mountain Forecasting

GenCast by Google DeepMind Establishes New Benchmark in Medium-Range Mountain Forecasting

ByDeepMind April 28, 2025 4:33 pmApril 28, 2025 4:33 pm

Benchmark Participates in $75 Million Funding Round for Manus AI in China

Benchmark Participates in $75 Million Funding Round for Manus AI in China

ByDeepMind April 28, 2025 2:29 pmApril 28, 2025 2:30 pm

Manus Introduces Paid Subscription Options and a Mobile Application

Chinese AI Startup Manus Secures Funding from Benchmark with a $500M Valuation

ByDeepMind April 26, 2025 10:33 amApril 26, 2025 10:33 am

Startup Behind Manus AI Agent Maneuvers Through U.S.-China Tensions Following Benchmark Agreement

Startup Behind Manus AI Agent Maneuvers Through U.S.-China Tensions Following Benchmark Agreement

ByDeepMind April 26, 2025 4:26 am

Benchmark Backs Chinese Startup Developing Manus AI Agent

Benchmark Backs Chinese Startup Developing Manus AI Agent

ByDeepMind April 25, 2025 11:08 am

OpenAI's O3 AI Model Performs Worse on Benchmark Than Earlier Suggested

OpenAI’s O3 AI Model Performs Worse on Benchmark Than Earlier Suggested

ByDeepMind April 21, 2025 7:01 amApril 21, 2025 7:01 am

OpenAI's o3 AI Model Achieves Lower Benchmark Scores Than Previously Suggested

OpenAI’s o3 AI Model Achieves Lower Benchmark Scores Than Previously Suggested

ByDeepMind April 21, 2025 2:57 amApril 21, 2025 2:57 am

o3 by OpenAI Achieves Nearly Perfect Results on Long Context Benchmark

o3 by OpenAI Achieves Nearly Perfect Results on Long Context Benchmark

ByDeepMind April 20, 2025 9:52 pmApril 20, 2025 9:52 pm

Exploring the Limitations of Long-Context Large Language Models with the Michelangelo Benchmark

Exploring the Limitations of Long-Context Large Language Models with the Michelangelo Benchmark

ByDeepMind April 19, 2025 10:47 am

Michelangelo Benchmark from DeepMind Exposes the Limitations of Long-Context LLMs

Michelangelo Benchmark from DeepMind Exposes the Limitations of Long-Context LLMs

ByDeepMind April 18, 2025 9:22 amApril 18, 2025 9:22 am

Meta AI Unveils MLGym: A New Framework and Benchmark for Enhancing AI Research Agents

Meta AI Unveils MLGym: A New Framework and Benchmark for Enhancing AI Research Agents

ByDeepMind April 17, 2025 6:54 pm

OpenAI Introduces BrowseComp: A New Benchmark for Assessing AI Web Search Performance

OpenAI Introduces BrowseComp: A New Benchmark for Assessing AI Web Search Performance

ByDeepMind April 12, 2025 4:06 pmApril 12, 2025 4:06 pm

BrowseComp: A Benchmark for Evaluating Browsing Agents

BrowseComp: A Benchmark for Evaluating Browsing Agents

ByDeepMind April 11, 2025 5:13 am

OpenAI Unveils BrowseComp: A New Benchmark for AI Internet Browsing

OpenAI Unveils BrowseComp: A New Benchmark for AI Internet Browsing

ByDeepMind April 11, 2025 3:12 am

OpenAI Releases BrowseComp: A Novel Benchmark for Evaluating AI Agents’ Web Browsing Capabilities

OpenAI Releases BrowseComp: A Novel Benchmark for Evaluating AI Agents’ Web Browsing Capabilities

ByDeepMind April 11, 2025 2:11 am

DeepMind Unveils New AI Fact-Checking Benchmark Featuring Gemini as the Front Runner

DeepMind Unveils New AI Fact-Checking Benchmark Featuring Gemini as the Front Runner

ByDeepMind April 8, 2025 8:38 pmApril 8, 2025 8:38 pm

Google Broadens Availability of Gemini 2.5 Pro Following Impressive Benchmark Performance

Google Broadens Availability of Gemini 2.5 Pro Following Impressive Benchmark Performance

ByDeepMind April 5, 2025 11:22 pmApril 5, 2025 11:22 pm

With AI Models Dominating Every Benchmark, Human Evaluation is Now Essential

With AI Models Dominating Every Benchmark, Human Evaluation is Now Essential

ByDeepMind March 29, 2025 5:38 pm

Google Unveils Gemini 2.5, Outperforming DeepSeek R1, OpenAI o3-mini, and Others in Benchmark Tests

Google Unveils Gemini 2.5, Outperforming DeepSeek R1, OpenAI o3-mini, and Others in Benchmark Tests

ByDeepMind March 26, 2025 2:54 amMarch 26, 2025 2:54 am

Tencent's Hunyuan-T1 Reasoning Model Achieves Benchmark Parity with OpenAI's Capabilities

Tencent’s Hunyuan-T1 Reasoning Model Achieves Benchmark Parity with OpenAI’s Capabilities

ByDeepMind March 24, 2025 5:16 pmMarch 24, 2025 5:16 pm

Researchers at Google DeepMind Unveil New Benchmark to Enhance Factual Accuracy and Minimize Hallucinations in Language Models

Researchers at Google DeepMind Unveil New Benchmark to Enhance Factual Accuracy and Minimize Hallucinations in Language Models

ByDeepMind March 17, 2025 2:18 amMarch 17, 2025 2:18 am

Baidu Unveils New AI Models, Claims Advantage Over DeepSeek and OpenAI in Benchmark Tests

Baidu Unveils New AI Models, Claims Advantage Over DeepSeek and OpenAI in Benchmark Tests

ByDeepMind March 16, 2025 12:00 pm