Silicon Valley's Top Funds Betting Collectively! Morgan Stanley's In-Depth Analysis of AI's Next Frontier — "World Models"

2026-03-23 06:09:43

The large models have taken the “language” path to today, with increasingly clear boundaries: they excel at writing, searching, editing, and programming, but once problems involve three-dimensional space, temporal evolution, and physical constraints, existing paradigms start to struggle. Morgan Stanley is betting on the next growth phase in “world models”—enabling AI to understand, simulate, and make decisions within environments. Applications will extend beyond robotics and autonomous driving to reshape digital content industries like gaming, design, and film production.

According to WindTrader, Adam Jonas, a North American equity analyst at Morgan Stanley, straightforwardly wrote in his latest report: “AI is moving beyond language toward models that understand, simulate and navigate the physical world.” The underlying implication is: in the next round of competition, it’s not about whose chat is more human-like, but who can compress the laws of the real world into a usable internal representation and turn it into an interactive “imagination engine.”

The evidence provided in the report is not just visionary storytelling but practical engineering efforts already underway: Waymo has conducted “billions of miles” of virtual testing using a world model based on DeepMind Genie 3; Microsoft used Muse to turn 1997’s “Quake II” into a fully AI-rendered, playable version; Roblox has also revealed research directions for generating immersive environments and iterating games via self-developed world models. Major companies involved include DeepMind, Meta, Microsoft, Tesla, NVIDIA, and new startups competing for talent and funding.

More notably, Morgan Stanley’s report focuses on two emerging players: Li Feifei’s World Labs, which aims to generate navigable 3D worlds, and Yang Likun’s AMI Labs, which focuses on learning efficient latent space representations for prediction and reasoning. Behind these two routes lies the same fundamental question: how will AI “understand the world,” and when can this understanding shift from demos to productive capabilities?

From language to physics: what world models need to address are the shortcomings of LLMs

The report describes the “physical world” as a more challenging battleground: constrained by matter, thermodynamics, fluids, lighting, and other laws, operating within constantly changing three-dimensional space. LLMs are primarily trained on text and its variants, excelling at white-collar tasks (coding, searching, writing), but their shortcomings are not due to lack of data but the absence of environment representations and reasoning abilities that can maintain consistency over long periods and simulate future states.

Therefore, a world model is defined as an “internal usable environment representation”: it must not only reproduce what is seen but also project states forward and provide different future branches when “action conditions” change—that is, an “imagination engine” for AI.

World models are not a single thing: five main parallel approaches

Morgan Stanley broadly categorizes current approaches (noting that boundaries will gradually blur):

Interactive, action-conditioned world models: like “learned game engines,” where the environment changes in real-time based on agent actions (e.g., DeepMind Genie).
Consistent 3D world generators: emphasizing spatial geometric consistency and multi-view exploration (e.g., World Labs Marble).
Abstract representations/non-generative models: not aiming to generate pixel-perfect images but predicting higher-level latent structures and dynamics, focusing on efficiency and reasoning (e.g., Meta V-JEPA, AMI Labs).
Predictive generative world models: more like “predict the next frame/next state,” used for planning, forecasting, and driving inference (e.g., Wayve GAIA, NVIDIA Cosmos Predict).
Physics-constrained simulation engines: combining world models with simulation/physics engines and data pipelines to produce more “physically consistent” synthetic data for training robots (e.g., NVIDIA Cosmos Transfer).

This classification has practical significance: the same term “world model” covers different goals—some aim to generate a navigable world, others to compress the world into computable states; product forms, computational structures, and commercialization paths vary accordingly.

Initial focus on gaming and content creation: replacing engines is tempting but not immediate

Gaming is the most “intuitive” use case in the report: world models can generate interactive environments from minimal prompts, potentially elevating content production speed to a new level. The example of Microsoft’s Muse creating a playable “Quake II” illustrates this—no longer relying on traditional rendering engines, but predicting each frame based on player input.

However, Morgan Stanley’s gaming analysts (citing Matt Cost’s framework) are not optimistic about long-term prospects: two scenarios are envisioned—existing giants integrating AI into their toolchains for “adaptation,” or being replaced/seriously disrupted by new paradigms. Replacement seems simpler because current models can already “generate playable worlds from natural language.”

The challenge lies ahead: computational speed and cost might be manageable, but issues like “meta-systems” and latency will be harder. Problems such as “determinism, memory, and updates” could be tough under the world model paradigm. This means that short-term constraints give incumbents a window, but long-term threats remain.

Autonomous driving and robotics are more pragmatic: virtual worlds for data augmentation and pre-visualization

The approach for autonomous driving is clearer: move rare, dangerous, and costly “edge cases” into virtual environments for large-scale testing. The report mentions Waymo’s use of a DeepMind Genie 3-based world model for “billions of miles” of virtual driving tests, used to train and validate system performance in rare edge scenarios—difficult to encounter or risky in real roads.

For robotics, the logic is more engineering-oriented: world models could address two issues—training data volume and pre-execution reasoning. Studies cited in the report show that training robots with data generated by world models can be comparable to training with real interaction data. But Morgan Stanley clarifies that, in the short term, world models and simulation data are likely to supplement, not replace, real data pipelines.

The real sticking points come from “touch and friction”: subtle physical quantities often critical—tiny forces from fingers, actuator wear, surface friction, material property variations, even static friction in joints—these can cause significant gaps between simulation and reality.

The hardest challenges are “long-term stability” and “controllability”: several hurdles remain

The report lists specific, candid challenges:

Error accumulation and drift over time: the longer the interaction, the higher the risk of object drift, geometric deformation, and physical rule deviations. Even advanced Genie 3 currently supports only “a few minutes” of continuous interaction.
Limited controllability: no matter how beautiful the visuals, if the action space is limited to basic movements, product value is constrained.
Multi-agent and social dynamics: interactions involving multiple humans, vehicles, or robots are much more complex than single-camera scenarios; DeepMind also notes this as a difficulty for Genie 3.
Data scale and diversity: especially in robotics, collecting real sensor data is expensive and slow.
Lack of unified benchmarks: how to quantify long-term interaction quality remains without a standard, with progress often relying on demos and task-specific tests.

These constraints shape a realistic pace: world models are likely to first proliferate in “fault-tolerant, fast-iterating” digital content fields, then gradually penetrate industries requiring strict physical fidelity.

Fei-Fei Li’s Bet: Let AI “Understand” 3D Space

Morgan Stanley positions World Labs as a representative of “generating consistent 3D worlds.” Founded by Fei-Fei Li and her team in 2023, it emerged from stealth in 2024; its flagship product Marble was publicly released in November 2025, aiming to generate “persistent, explorable” 3D environments from text, images, short videos, or rough 3D inputs, supporting editing and expansion.

The listed features resemble a creative and production-oriented workspace: generate and modify objects, rough out models with “Chisel” before adding details, expand selections, compose multiple worlds into larger scenes, export to external 3D software or engines, and provide APIs for developers.

It also emphasizes integration with industry toolchains: export to Unreal Engine and Unity; connect with NVIDIA Isaac Sim and other simulation platforms; and demonstrate applications in architecture, robotics simulation, and more.

Capital interest is also highlighted: PitchBook estimates World Labs has raised about $1.29 billion, with a post-money valuation of approximately $5.4 billion after a February 2026 funding round.

Yang Likun’s Alternative Path: Focus on Structure, Not Rendering

AMI Labs’ story leans more toward a “research paradigm”: founded in March 2026 by Yann LeCun, it adopts a path aligned with the JEPA framework—predicting latent embeddings of occluded or future parts, rather than reconstructing every pixel, learning the evolution of the world through more abstract structures. Morgan Stanley classifies it among “abstract representations/non-generative models,” emphasizing its potential in reasoning, planning, and physical AI systems (especially robotics).

Details about specific products are limited; possible applications include robotics, autonomous driving, video understanding/analysis, AR/VR with cameras, and intelligent assistants. Regarding funding, the report notes AMI Labs announced a seed round exceeding $1 billion, with a post-money valuation over $4.5 billion according to PitchBook.

Capital and talent are gathering: the race for spatial intelligence is accelerating

The most significant signal from Morgan Stanley’s report may not be a particular model parameter or demo, but the broader pattern it describes: from DeepMind, Meta, Microsoft, Tesla, NVIDIA to a new wave of startups, world models are becoming the “common language” of the next stage. This explains why productivity leaps are happening in gaming, film, and design, and why autonomous driving and robotics are increasingly moving training, validation, and planning into virtual worlds.

World models are not plug-and-play universal solutions. The report’s conclusion resembles a roadmap: scenarios that can run are already emerging, but key challenges—long-term stability, controllability, multi-agent interactions, physical detail, and evaluation systems—are still on the table. The next step is for those who can turn these hard problems into engineering closed loops, determining how far the “digital to physical” journey can go.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.