梁小丹

Xiaodan Liang

Professor

xdliang328@gmail.com

Welcome

I am a Professor of Computer Science in the Sun Yat-sen University and also the joint Associate Professor of Computer Vision, MBZUAI at the computer vision department, MBZUAI. Before that, from 2014 to 2016, I was a visiting scholar at the National University of Singapore. Between 2016 and October 2018, I have spent wonderful times as a Postdoctoral Researcher at Machine Learning Department, Carnegie Mellon University (CMU), United States (under the supervision of Professor Eric P. Xing).

My research interests are in the areas of Multi-modal Understanding and Generation, Embodied AI, and AI for Math.

Email: liangxd9@mail.sysu.edu.cn; xdliang328@gmail.com; xiaodan.liang@mbzuai.ac.ae

Google Scholar Profile: https://scholar.google.com/citations?user=voxznZAAAAAJ&hl=zh-CN

Prospective students: I am looking for self-motivated Ph.D. students, postdoctoral reseachers, research assistants, and visiting scholars, working together on exciting and cutting-edge computer vision, and artificial intelligence projects. If you are interested in working with me, please drop me an email with your resume.

Academic service:

Serving as regular Area chairs for top CCF A conferences, including CVPR, ICCV, ECCV, ICLR, ICML, NeurIPS, ACM MM, AAAI.

Serving as CVPR23 Ombud Chair, CVPR21 workshop chair, ICCV 2029 local chair

Serving as associate editors of the journals of The Visual Computer, Neural Network

Organizing the CVPR 2025 Embodied AI Workshop Challenge: Social Mobile Manipulation Challenge: https://smm-challenge.github.io/

Organizing the 2nd AI for Math Workshop at ICML 2025

Organizing the AI for Math Workshop at ICML 2024

Publications (selected):

[1]Jun Zhou, Jiahao Li, Zunnan Xu, Hanhui Li, Yiji Cheng, Fa-Ting Hong, Qin Lin, Qinglin Lu, Xiaodan Liang*. FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model. CVPR 2025.

[2]Bingqian Lin, Yunshuang Nie, Ziming Wei, Jiaqi Chen, Shikui Ma, Jianhua Han, Hang Xu, Xiaojun Chang, Xiaodan Liang*. Navcot: Boosting llm-based vision-and-language navigation via learning disentangled reasoning. TPAMI 2025.

[3]Haoyuan Li, Yanpeng Zhou, Tao Tang, Jifei Song, Yihan Zeng, Michael Kampffmeyer, Hang Xu, Xiaodan Liang*. UniGS: Unified Language-Image-3D Pretraining with Gaussian Splatting. ICLR 2025.

[4]Zheng Chong, Wenqing Zhang, Shiyue Zhang, Jun Zheng, Xiao Dong, Haoxiang Li, Yiling Wu, Dongmei Jiang, Xiaodan Liang*. CatV2TON: Taming Diffusion Transformers for Vision-Based Virtual Try-On with Temporal Concatenation. ICLR 2025.

[5]Kaidong Zhang, Pengzhen Ren, Bingqian Lin, Junfan Lin, Shikui Ma, Hang Xu, Xiaodan Liang*. PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation. NeurIPS 2025.

Full paper list: https://scholar.google.com/citations?user=voxznZAAAAAJ&hl=zh-CN