Sergio Arnaud

Background

Senior ML Engineer

Waymo · March 2026 – Present · Mountain View, CA

World models and multimodal foundation models for autonomous driving

Senior Research Engineer

Meta FAIR · February 2024 – February 2026 · Menlo Park, CA

World models for robotics, 3D vision-language grounding for robotic manipulation, and physical world modeling

AI Resident

Meta FAIR · September 2022 – September 2023 · Menlo Park, CA

Visual representations for robot control, language models for planning, and embodied AI research

Tech Lead (AI)

deep dive (dive.ai) · January 2018 – July 2022 · Mexico City, Mexico

Computer Vision and Natural Language Processing systems

BSc Applied Mathematics

Instituto Tecnológico Autónomo de México (ITAM) · 2020 · Mexico City, Mexico

Graduated with highest honors (Magna Cum Laude), top 3% of students

Featured Publications

World Modeling

Learning predictive models of the world for planning and decision making

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

M. Assran* , A. Bardes* , D. Fan* , Q. Garrido* , R. Howes* , M. Komeili* , M. Muckley* , A. Rizvi* , C. Roberts* , K. Sinha* , A. Zholus* , Sergio Arnaud* , et al.

arXiv 2025

paper blog code

Human-level Learning of Complex Novel Tasks as Theory-Based Modeling, Exploration, and Planning

P.A. Tsividis , J. Loula , J. Burga , J.P. Rodriguez , Sergio Arnaud , N. Foss , A. Campero , A. Subramanian , T. Pouncy , S.J. Gershman , J.B. Tenenbaum

Philosophical Transactions of the Royal Society A

paper

Visuo-Tactile World Models

Carolina Higuera , Sergio Arnaud , Byron Boots , Mustafa Mukadam , Francois Robert Hogan , Franziska Meier

In Press

paper project

DreamSteer: Latent World Models Can Steer VLA Policies During Deployment

H. Cui , Sergio Arnaud , A. Majumdar , D. Dugas , E. Aljalbout , K. Desingh , K.M. Jatavallabhula , F. Meier

In Press

paper project

Beyond Latents: Planning with Motion Cues in World Models

S. Yenamandra , Sergio Arnaud , H. Huang , T.-Y. Yang , E. Aljalbout , A. Majumdar , D. Sadigh , H. Bharadhwaj , F. Meier

In Press

Heterogeneous World Models for Cross-Embodiment Transfer

Sergio Arnaud , et al.

In Progress

3D Vision & Spatial Reasoning

Grounding language in 3D space for embodied understanding

Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D

Sergio Arnaud* , P. McVay* , A. Martin* , A. Majumdar , K.M. Jatavallabhula , P. Thomas , R. Partsey , D. Dugas , A. Gejji , A. Sax , et al.

ICML 2025 Spotlight Top 2.6%

paper blog code demo

From Thousands to Billions: 3D Visual Language Grounding via Render-Supervised Distillation (LiftGS)

A. Cao , Sergio Arnaud , O. Maksymets , J. Yang , A. Jain , S. Yenamandra , A. Martin , V.-P. Berges , P. McVay , R. Partsey , et al.

ICML 2025

paper project

Unifying 2D and 3D Vision-Language Understanding (UniVLG)

A. Jain , A. Swerdlow , Y. Wang , Sergio Arnaud , A. Martin , A. Sax , F. Meier , K. Fragkiadaki

ICML 2025

paper code project

OpenEQA: Embodied Question Answering in the Era of Foundation Models

A. Majumdar* , A. Ajay* , X. Zhang* , P. Putta , S. Yenamandra , M. Henaff , S. Silwal , P. McVay , O. Maksymets , Sergio Arnaud , K. Yadav , Q. Li , B. Newman , et al.