RoboPearls: Editable Video Simulation for Robot Manipulation

ICCV 2025

Tao Tang, Likui Zhang, Youpeng Wen, Kaidong Zhang, Jia-Wang Bian, xia zhou, Tianyi Yan, Kun Zhan, Peng Jia, Hefeng Wu, Liang Lin, Xiaodan Liang

ICCV 2025

Abstract

The development of generalist robot manipulation policies has seen significant progress, driven by large-scale demonstration data across diverse environments. However, the high cost and inefficiency of collecting real-world demonstrations hinder the scalability of data acquisition. While existing simulation platforms enable controlled environments for robotic learning, the challenge of bridging the sim-to-real gap remains. To address these challenges, we propose RoboPearls, an editable video simulation framework for robotic manipulation. Built on 3D Gaussian Splatting (3DGS), RoboPearls enables the construction of photo-realistic, view-consistent simulations from demonstration videos, and supports a wide range of simulation operators, including various object manipulations, powered by advanced modules like Incremental Semantic Distillation (ISD) and 3D regularized NNFM Loss (3D-NNFM). Moreover, by incorporating large language models (LLMs), RoboPearls automates the simulation production process in a user-friendly manner through flexible command interpretation and execution. Furthermore, RoboPearls employs a vision-language model (VLM) to analyze robotic learning issues to close the simulation loop for performance enhancement. To demonstrate the effectiveness of RoboPearls, we conduct extensive experiments on multiple datasets and scenes, including RLBench, COLOSSEUM, Ego4D, Open X-Embodiment, and a real-world robot, which demonstrate our satisfactory simulation performance.

Framework

Experiment

Conclusion

In this paper, we introduce RoboPearls, an automated ed itable video simulation framework for robotic manipulation. Leveraging Gaussian representations, RoboPearls generates highly adaptable and photorealistic simulations from demonstration videos. Moreover, RoboPearls supports a wide range of simulation operators to cover various scenarios, driven by well-designed modules such as Incremental Semantic Distillation and 3D regularized NNFM Loss. To further streamline the process, RoboPearls integrates LLMs and VLM, allowing users to generate complex simulations using only natural language commands while enabling advanced closed-loop simulation capabilities. These features facilitate robust simulations for diverse robotic tasks. Extensive experiments across multiple datasets demonstrate the framework’s simulation effectiveness, yielding significant improvements in robotic performance. Overall, RoboPearls represents a significant step toward providing a scal able, user-friendly solution for robotic simulation.

中山大学人机物智能融合实验室 Human Cyber Physical Intelligence Integration Lab

hcp@sysu.edu.cn
广州市广州大学城外环东路132号

Official Account

News: Achievements; Activities; sharings; Talks

People: Faculty; Students; Alumni

Projects: Computer Vision; Multimodal; Robotics

Links: Git-Lab