Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124
Physical Address
304 North Cardinal St.
Dorchester Center, MA 02124


Just hours after OpenAI updated its flagship GPT-5 core model to GPT-5.1, promising reduced token usage overall and a more user-friendly personality with more preset options, Chinese search giant Baidu unveiled its next-generation core model, ERNIE 5.0, along with a range of AI product upgrades and strategic international expansions.
The goal: to position yourself as a global competitor in the increasingly competitive enterprise AI market.
Announced at the company’s Baidu World 2025 event, ERNIE 5.0 is a proprietary, natively multimodal model designed to collaboratively process and generate content in text, images, audio and video.
Unlike Baidu’s recently released ERNIE-4.5-VL-28B-A3B-Thinking, which is open source under an enterprise-friendly, permissive Apache 2.0 license, ERNIE 5.0 is a proprietary model and is only available through Baidu’s ERNIE Bot website (I had to manually select it from the model selection drop-down menu) and the Qianfan Cloud Platform Application Programming Interface (API) for enterprise customers.
Alongside the model’s launch, Baidu introduced major updates to its digital human platform, code-free tools and general-purpose AI agents – all aimed at expanding its AI footprint outside of China.
The company also introduced the ERNIE 5.0 Preview 1022, a variant optimized for text-intensive tasks, along with the general preview model that balances between modalities.
Baidu emphasized that ERNIE 5.0 represents a shift in how intelligence is deployed at scale, with CEO Robin Li stating, « When you internalize AI, it becomes an innate ability and transforms intelligence from a cost to a source of productivity. »
The ERNIE 5.0 benchmark results suggest that Baidu has achieved parity—or near parity—with the best Western benchmarks across a wide range of tasks.
In public benchmark slides shared at the Baidu World 2025 event, ERNIE 5.0 Preview outperformed or matched OpenAI’s GPT-5-High and Google’s Gemini 2.5 Pro in multimodal reasoning, document understanding and image-based QAwhile also demonstrating strong language modeling and code execution capabilities.
The company emphasized its ability to handle joint inputs and outputs across modalities rather than relying on post-hoc fusion of modalities, which it outlined as a technical differentiator.
On visual tasks, ERNIE 5.0 achieved leading results on OCRBench, DocVQA, and ChartQA, three benchmarks that test document recognition, understanding, and reasoning over structured data.
Baidu claims the model beat both GPT-5-High and Gemini 2.5 Pro in these document- and chart-based metrics, areas it describes as essential for enterprise applications such as automated document processing and financial analysis.
In image generation, ERNIE 5.0 equaled or surpassed Google’s Veo3 in categories including semantic alignment and image quality, according to Baidu’s internal evaluation based on GenEval. Baidu claims that the model’s multimodal integration allows it to generate and interpret visual content with greater contextual awareness than models relying on modality-specific encoders.
For audio and speech tasks, ERNIE 5.0 demonstrates competitive results on MM-AU and TUT2017 audio comprehension metrics as well as answering spoken language input questions. Its audio presentation, while not as heavily emphasized as vision or text, suggests a broad footprint of capabilities designed to support full-spectrum multimodal applications.
In language tasks, the model performed strongly in following instructions, answering factual questions, and mathematical reasoning—core areas that determine the enterprise utility of large language models.
The Preview 1022 variant of ERNIE 5.0, tailored for text performance, showed even stronger language-specific results in early developer access. Although Baidu does not claim widespread superiority in general language reasoning, its internal evaluations suggest that ERNIE 5.0 Preview 1022 closes the gap with top-level English language models and surpasses them in Chinese language performance.
Although Baidu has not released full benchmark details or raw results publicly, its performance positioning suggests a deliberate attempt to frame ERNIE 5.0 not as a niche multimodal system, but as a leading model rivaling the largest closed-loop models in general-purpose reasoning.
Where Baidu claims a clear lead is in structured document understanding, visual diagram reasoning, and the integration of multiple modalities into a single native modeling architecture. Independent verification of these results remains pending, but the range of claimed capabilities positions ERNIE 5.0 as a serious alternative in the multimodal core model landscape.
ERNIE 5.0 is positioned at premium finish of Baidu’s model pricing structure. The company released specific pricing for API usage on its Qianfan platform, leveling the price with other top-tier offerings from Chinese rivals such as Alibaba.
|
Model |
Entry Price (for 1K Tokens) |
Starting price (for 1K tokens) |
Source |
|
ERNIE 5.0 |
$0.00085 (¥0.006) |
$0.0034 (¥0.024) |
Qianfang |
|
ERNIE 4.5 Turbo (front) |
$0.00011 (¥0.0008) |
$0.00045 (¥0.0032) |
Qianfang |
|
Qwen3 (pre-encoder) |
$0.00085 (¥0.006) |
$0.0034 (¥0.024) |
Qianfang |
The price contrast between the ERNIE 5.0 and earlier models, such as the ERNIE 4.5 Turbo, highlights Baidu’s strategy to differentiate between high-volume, low-cost models and high-capability models designed for complex tasks and multimodal reasoning.
Compared to other US alternatives, it remains mid-range in pricing:
|
Model |
Input (/1 million tokens) |
Output (/1 million tokens) |
Source |
|
GPT-5.1 |
$1.25 |
$10.00 |
OpenAI |
|
ERNIE 5.0 |
$0.85 |
$3.40 |
Qianfang |
|
ERNIE 4.5 Turbo (front) |
$0.11 |
$0.45 |
Qianfang |
|
Close work 4.1 |
$15.00 |
$75.00 |
Anthropocene |
|
Gemini 2.5 Pro |
$1.25 (≤200k) / $2.50 (>200k) |
$10.00 (≤200k) / $15.00 (>200k) |
Google Vertex AI Pricing |
|
Grok 4 (grok-4-0709) |
$3.00 |
$15.00 |
xAI API |
In tandem with the launch of the model, Baidu is expanding internationally:
GenFlow 3.0now with 20 million users, is the company’s largest general-purpose AI agent and features enhanced memory and multimodal task processing.
Famousa self-evolving agent capable of dynamically solving complex problems is now commercially available by invitation.
fearthe international version of Baidu Miaoda’s no-code creator is live globally via medo.dev.
Oreateproductivity workspace with support for documents, slides, images, video and podcasts, has reached over 1.2 million users worldwide.
Baidu’s digital human platform, already launched in Brazil, is also part of the global push. According to the company, 83% of live streamers during this year’s « Double 11 » shopping event in China used Baidu’s digital human technology, which contributed to a 91% increase in GMV.
Meanwhile, Baidu’s autonomous ride-hailing service Apollo Go has surpassed 17 million trips, operating fleets of driverless cars in 22 cities and claiming the title of the world’s largest network of robotics.
Two days before the ERNIE 5.0 flagship event, Baidu also released an open-source multimodal model under the Apache 2.0 license: ERNIE-4.5-VL-28B-A3B-Thinking.
As reported by my colleague Michael Nunes of VentureBeat, the model activates only 3 billion parameters while supporting a total of 28 billion, using a Mixture-of-Experts (MoE) architecture for efficient inference.
Key technical innovations include:
« Thinking with images » that enables visual analysis based on dynamic zoom
Support for diagram interpretation, document understanding, visual grounding and temporal awareness in video
Runtime on a single 80GB GPU, making it affordable for mid-sized organizations
Full compatibility with Transformers, vLLM and Baidu’s FastDeploy toolkits
This release adds pressure on closed source competitors. With Apache 2.0 licensing, the ERNIE-4.5-VL-28B-A3B-Thinking becomes a viable core model for commercial applications without license restrictions – something that few high-performance models in this class offer.
Following the release of ERNIE 5.0, AI developer and evaluator Lisan al Gaib (@scaling01) posted a mixed review of X. Although they were initially impressed with the model’s performance, they reported a persistent problem where ERNIE 5.0 repeatedly called tools – even when explicitly instructed not to – during SVG generation tasks.
« The ERNIE 5.0 benchmarks looked insane until I tested it…unfortunately RL’s brain is broken or they have a serious problem with their chat platform/system, » Lisan wrote.
After a few hours, Baidu’s developer-focused support account, @ErnieforDevs, responded:
« Thanks for the feedback! This is a known bug — certain syntax can constantly trigger it. We’re working on a fix. You can try paraphrasing or changing the prompt to avoid it for now. »
The quick turnaround reflects Baidu’s increasing emphasis on communication with developers, especially as it woos international users through proprietary and open-source offerings.
Baidu’s ERNIE 5.0 marks a strategic escalation in the global foundation model race. With performance claims that put it on a par with state-of-the-art systems from OpenAI and Google, and a combination of premium pricing and open-access alternatives, Baidu is signaling its ambition to become not only a local AI leader, but also a trusted global infrastructure provider.
At a time when enterprise AI users are increasingly demanding multimodal performance, flexible licensing and deployment efficiency, Baidu’s two-pronged approach—first-class hosted APIs and open source releases—can broaden its appeal to both enterprise and developer communities.
Whether the company’s performance claims hold up to third-party testing remains to be seen. But in an environment shaped by rising costs, model complexity and computational bottlenecks, ERNIE 5.0 and its supporting ecosystem give Baidu a competitive edge in the next wave of AI deployment.
AI
#Baidu #Unveils #Patented #ERNIE #Beating #GPT5 #Performance #Charting #Document #Understanding