OpenAI releases the most powerful professional model GPT-5.4, automatically operates computers, with plugin support for AI to master Excel and financial analysis

2026-03-07 09:14:34

Faster and More Discerning GPT-5 Series Models: GPT-5.3 Just one day after the debut of GPT-5.3, a faster and more capable series, OpenAI released a new flagship base model, GPT-5.4, on Thursday, Eastern Time, simultaneously launching it across ChatGPT, API, and development tools like Codex.

OpenAI describes GPT-5.4 as “the most powerful and efficient professional frontier model to date,” focusing on enterprise office and complex knowledge work scenarios. Compared to previous versions, the biggest change in GPT-5.4 is the enhancement of AI agent capabilities. For the first time in APIs and Codex, GPT-5.4 achieves native “computer operation” functions, supporting intelligent agents to execute complex workflows across software.

GPT-5.4 can generate text or code and, for the first time, introduces native computer control into a general model, allowing direct operation of software, web browsing, mouse and keyboard control to complete tasks, and deep integration with enterprise applications like spreadsheets and financial analysis tools, embedding deeply into Microsoft Excel and Google Sheets.

In ChatGPT, GPT-5.4 supports “pre-showcasing thought processes,” allowing users to adjust task directions during model responses, and improves deep web search and context retention in long logical conversations.

Industry experts believe that a series of upgrades in GPT-5.4 mark the transition of AI models from “dialogue tools” to automated task-executing digital agents, further penetrating enterprise productivity software and professional knowledge work.

OpenAI also launched two versions this Thursday: GPT-5.4 Thinking, which excels at complex reasoning, and GPT-5.4 Pro, a high-performance version, targeting paid users and high-end enterprise clients.

In the OSWorld-Verified computer control benchmark, GPT-5.4 achieved a success rate of 75.0%, surpassing the human average of 72.4%, a significant jump from GPT-5.2’s 47.3%. The financial services suite released simultaneously showed GPT-5.4’s score in OpenAI’s internal investment bank benchmark jumped from 43.7% to 88.0%.

Early testing organizations have given positive feedback. Daniel Swiecki, head of AI solutions at investment firm Walleye Capital, said GPT-5.4 improved accuracy by 30 percentage points in internal financial and Excel assessments. Brendan Foody, CEO of AI talent platform Mercor, called it “the best model we’ve tried so far” and noted GPT-5.4 ranked first in Mercor’s APEX-Agents benchmark for professional services.

Native Computer Control in General Models Breaks Single-Round Q&A Limits

The most groundbreaking feature of GPT-5.4 is its native computer control capability, a first for general models from OpenAI. Through APIs and Codex, the model can operate computers like humans, completing multi-step workflows across applications.

Specifically, GPT-5.4 can control a computer by writing code with libraries like Playwright or directly respond with screenshots and send mouse and keyboard commands. Developers can also configure custom confirmation strategies to suit different risk tolerance scenarios.

Benchmark data supports substantial progress: in OSWorld-Verified, GPT-5.4’s success rate is 75.0%, exceeding GPT-5.2’s 47.3% and surpassing the human benchmark of 72.4%. In WebArena-Verified browser control tests, success is 67.3%, higher than GPT-5.2’s 65.4%. In Online-Mind2Web, it achieves a 92.8% success rate using only screenshots.

In web search capabilities, BrowseComp testing shows GPT-5.4 improves by 17 percentage points over GPT-5.2, with GPT-5.4 Pro setting a new high score of 89.3%.

Mainstay, a real estate tech company, reports that in tests covering about 30,000 property tax portals, GPT-5.4 achieved a 95% success rate on first try and 100% within three attempts, a significant improvement over previous computer control models (success rates around 73-79%), with speeds about three times faster and token consumption reduced by approximately 70%.

Tool Search Mechanism Overhaul Significantly Reduces Token Usage

As the tool ecosystem expands, efficient management of tool calls becomes a bottleneck for deploying agent systems. GPT-5.4 introduces a “Tool Search” mechanism in the API, fundamentally changing how tools are defined and transmitted.

Previously, models needed to preload all tool definitions in prompts for each request, which in large systems could consume thousands or tens of thousands of tokens per request, increasing costs, latency, and diluting context. The new mechanism allows the model to receive only a lightweight list of tools and retrieve full definitions only when needed.

OpenAI provides concrete data: in 250 tasks of the Scale MCP Atlas benchmark, with all 36 MCP servers enabled, the tool search mode reduced total token usage by 47% compared to exposing all MCP functions directly, while maintaining the same accuracy.

Wade, CEO of Zapier, states that GPT-5.4 performed excellently in tool usage benchmarks across hundreds of real workflows, calling it “the most sustainable model to date.”

Financial and Enterprise Applications: Deep Excel Integration and Investment Banking Performance Doubles

Alongside GPT-5.4, OpenAI released the “OpenAI Financial Services” suite for enterprises and financial institutions, featuring core products like ChatGPT for Excel and Google Sheets (beta)—embedding ChatGPT directly into spreadsheet cells to build, analyze, and update complex financial models.

The suite integrates data partners like FactSet, MSCI, Third Bridge, and Moody’s, and introduces reusable Skills functions covering high-frequency financial tasks such as earnings previews, comparable company analysis, DCF valuation, and investment memos.

In internal investment bank benchmarks, GPT-5.4 Thinking’s score jumped from 43.7% to 88.0%. In simulated junior investment analyst spreadsheet modeling tasks, GPT-5.4 scored an average of 87.3%, far above GPT-5.2’s 68.4%.

Niko Grupen, head of legal AI platform Harvey’s applied research, reports GPT-5.4 scored 91% in their BigLaw Bench, stating it “outperforms other models in structured complex transaction analysis, maintaining accuracy across lengthy contracts, and providing the detailed insights legal practitioners need.”

Knowledge Work and Hallucination Suppression: Fully Benchmarking Against Professionals

OpenAI demonstrates GPT-5.4’s capabilities across multiple real-world professional benchmarks. In GDPval, which covers 44 knowledge work tasks across professions—including sales demos, accounting spreadsheets, manufacturing charts—GPT-5.4 matches or exceeds industry professionals in 83.0% of cases, up from 71.0% with GPT-5.2.

In presentation quality assessments, human reviewers prefer GPT-5.4 outputs 68.0% of the time, citing better visual aesthetics, richer diversity, and more effective image generation.

Regarding hallucinations and factual errors, OpenAI states GPT-5.4 is their “most factually accurate model to date”: on de-identified fact-error flagged prompts, the rate of individual statement errors decreased by 33% compared to GPT-5.2, and the overall error probability in full responses dropped by 18%.

In programming, GPT-5.4 performs on par or better than GPT-5.3-Codex on SWE-Bench Pro, with lower latency across reasoning settings. The /fast mode of Codex can boost token generation speed by up to 1.5 times, using the same model and intelligence but optimized for speed. Mario Rodriguez, GitHub’s Chief Product Officer, states GPT-5.4 excels in logical reasoning and executing complex multi-step workflows relying on tools, calling it “the model enterprise should adopt from day one.”

Two Versions Cover Different User Needs, Context Window Up to 1 Million Tokens

GPT-5.4 Thinking targets general professional scenarios requiring deep reasoning, while GPT-5.4 Pro is designed for the most complex tasks, pushing performance limits.

On ChatGPT, GPT-5.4 Thinking is available from this Thursday to Plus ($20/month), Team, and Pro users, replacing GPT-5.2 Thinking, which will be retired on June 5, 2026. GPT-5.4 Pro is limited to Pro ($200/month) and Enterprise plans. Free users can access GPT-5.4 in limited capacity via system routing. Enterprise and education users can enable early access through admin settings.

On the API, GPT-5.4 is available under the gpt-5.4 identifier, and GPT-5.4 Pro as gpt-5.4-pro, both accessible via Codex platform. The maximum output is 128,000 tokens, consistent with previous models. Both support a maximum context window of 1 million tokens, the largest OpenAI has offered, suitable for planning, executing, and verifying long multi-step tasks.

Pricing Higher Than Previous Generation, Efficiency Gains Offset Cost Increase

API pricing for GPT-5.4 is higher than GPT-5.2:

GPT-5.4: $2.50 per million input tokens, $15 per million output tokens (GPT-5.2: $1.75 input / $14 output)
GPT-5.4 Pro: $30 per million input tokens, $180 per million output tokens (GPT-5.2 Pro: $21 input / $168 output)
Batch and Flex plans are half-price; Priority processing costs double the standard rate.

Note that for inputs exceeding 272,000 tokens in a single request, the excess is billed at double the standard rate. In Codex, the default compression limit is 272,000 tokens, but developers can manually increase this to handle larger prompts, with excess tokens incurring higher charges.

OpenAI explains the higher prices with three reasons: enhanced capabilities in programming, computer control, deep research, high-level document generation, and tool invocation; significant technological advances from their research roadmap; and more efficient reasoning mechanisms that consume fewer reasoning tokens for the same tasks, partially offsetting the price increase. They also state that even with the price hike, GPT-5.4 remains cheaper than comparable leading models from competitors.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.