杨思蓓

Sibei Yang

Associate Professor

sibeiyang9@gmail.com

https://sibeiyang.github.io/

教师简介

杨思蓓，中山大学计算机学院副教授（引进人才系列），博士生导师，逸仙学者。主要研究领域为跨模态视觉感知、理解、生成与交互，特别聚焦于：1）多模态大模型（LLM/MLLM）；2）视觉-语言统一理解与生成；3）具身智能；4）开放世界视觉感知与理解。迄今为止累计发表CCF A类/中科院一区论文 50 余篇，其中以第一作者/通讯作者身份发表近40篇，Google Scholar 引用近2500次。主持了包括国自然面上、国自然青年、浦江人才计划、上海领军人才-海外计划等多项科研项目。担任ICCV、ICLR、WACV等顶级会议领域主席（AC）。入选全球前2%顶尖科学家榜单。

杨思蓓分别于2020年和2016年获得香港大学（香港政府奖学金）博士学位和浙江大学（竺可桢学院）学士学位。2020至2021，她曾担任香港理工大学研究助理教授，博导。2021至2025年，她担任上海科技大学助理教授，研究员，博导。2012年入选教育部珠峰计划。

详情请参见个人主页：https://sibeiyang.github.io/

[Recruitment-2025/12]：招募研究实习生（Research Interns），方向涵盖多模态大模型与具身智能。本科生请以[研究实习生-姓名]为邮件标题，将成绩单与简历发送至 sibeiyang9@gmail.com；硕博同学需征得导师同意后发送邮件，并 cc 导师。同等条件下，将优先考虑计划申请本组硕博的同学。

PS. 杨思蓓累计指导超过20名学生在CCF-A类会议上以第一作者/共同一作身份发表论文，其中包括7名本科生。

研究领域

跨模态视觉感知、理解、生成与交互，尤其是1）多模态大模型（LLMs/MLLMs），2) 视觉-语言统一理解与生成，3）具身智能，4）开放世界视觉感知与理解。

News

2025/9/19 5 papers are accepted by NeurIPS 2025~

教育背景

2012/09-2016/07：浙江大学，竺可桢学院，计算机科学与技术，学士学位

2016/09-2020/09：香港大学，香港政府奖学金，计算机科学，博士学位

工作经历

2020/10－2021/05：香港理工大学，计算机系，研究助理教授，博导

2021/06－2025/05：上海科技大学，信息学院，助理教授，研究员，博导

2025/06－至今：中山大学数据科学与计算机学院，副教授，博导

科研项目

主持包括国自然面上、国自然青年、浦江人才计划、上海领军人才（海外）计划、启动培育项目等在内的多项科研课题。

代表性论著

(*)代表通讯作者

[1] Sibei Yang, Guanbin Li, and Yizhou Yu. Relationship-Embedded Representation Learning for Grounding Referring Expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2021, 43: 2765-2779. [CCF A][中科院一区]

[2] Sibei Yang, Meng Xia, Guanbin Li, Hong-Yu Zhou, and Yizhou Yu. Bottom-up shift and reasoning for referring image segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2021: 11266-11275. [CCF A]

[3] Sibei Yang, Guanbin Li, and Yizhou Yu. Graph-structured referring expression reasoning in the wild. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2020: 9952-9961. [CCF A]

[4] Sibei Yang, Guanbin Li, and Yizhou Yu. Dynamic graph attention for referring expression comprehension. Proceedings of the IEEE/CVF international conference on computer vision (ICCV Oral). 2019: 4644-4653. [CCF A]

[5] Sibei Yang, Guanbin Li, and Yizhou Yu.Cross-modal relationship inference for grounding referring expressions.Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2019:4145-4154. [CCF A]

[6] Sibei Yang, Guanbin Li, and Yizhou Yu. Propagating over phrase relations for one-stage visual grounding. Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16. Springer International Publishing (ECCV), 2020: 589-605. [CCF B]

[7] Sibei Yang*(通讯作者), Ge Zheng, Jiajin Tang, Jiaye Qian, Hanzhuo Huang, Cheng Shi. Discovering Compositional Hallucination in LVLMs. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]

[8] Xiang He, Sibei Yang*(共同一作), Guanbin Li, Haofeng Li, Huiyou Chang, and Yizhou Yu. Non-local context encoder: Robust biomedical image segmentation Proceedings of the AAAI Conference on Artificial Intelligence (AAAI Oral). 2019, 33(01): 8417-8424. [CCF A]

[9] Cheng Shi, Yizhou Yu, Sibei Yang*(通讯作者). Vision Function Layer in Multimodal LLMs.Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]

[10] Yulin Zhang, Cheng Shi, Yang Wang, Sibei Yang*(通讯作者). Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]

[11] Jiaye Qian, Ge Zheng, Yuchen Zhu, Sibei Yang*(通讯作者). Intervene-All-Paths: Unified Mitigation of LVLM Hallucinations across Alignment Formats. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]

[12] Yue Xu, Chengyan Fu, Li Xiong, Sibei Yang, Wenjie Wang. Auto-Search and Refinement: An Automated Framework for Gender Bias Mitigation in Large Language Models. Advances in Neural Information Processing Systems (NeurIPS). 2025. [CCF A]

[13] Jiajin Tang, Zhengxuan Wei, Yuchen Zhu, Cheng Shi, Guanbin Li, Liang Lin, Sibei Yang*(通讯作者). Sim-DETR: Unlock DETR for Temporal Sentence Grounding. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]

[14] Bin Yang, Yulin Zhang, Hong-Yu Zhou, Sibei Yang*(通讯作者). No More Sibling Rivalry: Debiasing Human-Object Interaction Detection. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]

[15] Ge Zheng, Jiaye Qian, Jiajin Tang, Sibei Yang*(通讯作者). Why LVLMs Are More Prone to Hallucinations in Longer Responses: The Role of Context. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]

[16] Zhengxuan Wei, Jiajin Tang, Sibei Yang*(通讯作者). Augmenting Moment Retrieval: Zero-Dependency Two-Stage Learning. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]

[17] Jiajin Tang, Zhengxuan Wei, Sibei Yang*(通讯作者). Closed-Loop Transfer for Weakly-supervised Affordance Grounding. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]

[18] Qiyuan Dai, Hanzhuo Huang, Yu Wu, and Sibei Yang*(通讯作者). Adaptive Part Learning for Fine-Grained Generalized Category Discovery: A Plug-and-Play Enhancement. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025. [CCF A]

[19] Qiyuan Dai, and Sibei Yang*(通讯作者). Enhancing Flexibility in Test-Time Adaptation with Online EM.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025 [CCF A]

[20] Yuchen Zhu, Cheng Shi, Dingyou Wang, Jiajin Tang, Zhengxuan Wei, Yu Wu, Guanbin Li, Sibei Yang*(通讯作者). Rethinking Query-based Transformer for Continual Image Segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025 [CCF A]

[21] Chaoqi Chen, Yushuang Wu, Qiyuan Dai, Hong-Yu Zhou, Mutian Xu, and Sibei Yang*(通讯作者), Xiaoguang Han*, Yizhou Yu*. A survey on graph neural networks and graph transformers in computer vision: A task-oriented perspective. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024. [CCF A][中科院一区]

[22] Qiyuan Dai, and Sibei Yang*(通讯作者). Curriculum point prompting for weakly-supervised referring image segmentation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024: 13711-13722. [CCF A]

[23] Ge Zheng, Bin Yang, Jiajin Tang, Hong-Yu Zhou, and Sibei Yang*(通讯作者). Ddcot: Duty-distinct chain-of-thought prompting for multimodal reasoning in language models. Advances in Neural Information Processing Systems (NeurIPS), 2023, 36: 5168-5191. [CCF A]

[24] Hanzhuo Huang, Yufan Feng, Cheng Shi, Lan Xu, Jingyi Yu, and Sibei Yang*(通讯作者). Free-bloom: Zero-shot text-to-video generator with llm director and ldm animator. Advances in Neural Information Processing Systems (NeurIPS), 2023, 36: 26135-26158. [CCF A]

[25] Cheng Shi, and Sibei Yang*(通讯作者). EdaDet: Open-vocabulary object detection using early dense alignment. Proceedings of the IEEE/CVF international conference on computer vision (ICCV). 2023: 15724-15734. [CCF A]

[26] Jiajin Tang, Ge Zheng, and Sibei Yang*(通讯作者). Temporal collection and distribution for referring video object segmentation. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 15466-15476. [CCF A]

[27] Cheng Shi, and Sibei Yang*(通讯作者). LogoPrompt: Synthetic text images can be good visual prompts for vision-language models. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 2932-2941. [CCF A]

[28] Jiajin Tang, Ge Zheng, Jingyi Yu, and Sibei Yang*(通讯作者). CotDet: Affordance knowledge prompting for task driven object detection. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 3068-3078. [CCF A]

[29] Longwen Zhang, Qiwei Qiu, Hongyang Lin, Qixuan Zhang, Cheng Shi, Wei Yang, Ye Shi, Sibei Yang*(通讯作者), Lan Xu*, and Jingyi Yu*. DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance. ACM Transactions on Graphics (TOG), SIGGRAPH, 2023, 42(4): 1-16. [CCF A]

[30] Jiajin Tang, Ge Zheng, Cheng Shi, and Sibei Yang*(通讯作者). Contrastive grouping with transformer for referring image segmentation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR). 2023: 23570-23580. [CCF A]

[31] Xuyang Liu, Bingbing Wen, and Sibei Yang*(通讯作者). CCQ: cross-class query network for partially labeled organ segmentation. Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). 2023, 37(2): 1755-1763. [CCF A]

[32] Cheng Shi, Yuchen Zhu, and Sibei Yang*(通讯作者). Plain-Det: A Plain Multi-Dataset Object Detector. European Conference on Computer Vision. Cham: Springer Nature Switzerland (ECCV), 2024: 210-226. [CCF B]

[33] Cheng Shi, Yulin Zhang, Bin Yang, Jiajin Tang, Yuexin Ma, and Sibei Yang*(通讯作者). Part2Object: Hierarchical Unsupervised 3D Instance Segmentation. European Conference on Computer Vision. Cham: Springer Nature Switzerland (ECCV), 2024: 1-18. [CCF B]

[34] Cheng Shi, and Sibei Yang*(通讯作者). Spatial and visual perspective-taking via view rotation and relation reasoning for embodied reference understanding. European Conference on Computer Vision. Cham: Springer Nature Switzerland (ECCV), 2022: 201-218. [CCF B]

[35] Hanzhuo Huang, Yuan Liu, Ge Zheng, Jiepeng Wang, Zhiyang Dou, and Sibei Yang*(通讯作者). MVTokenFlow: High-quality 4D Content Generation using Multiview Token Flow. The Thirteenth International Conference on Learning Representations (ICLR). 2025. [清华A]

[36] Cheng Shi, and Sibei Yang*(通讯作者). The Devil is in the Object Boundary: Towards Annotation-free Instance Segmentation using Foundation Models. The Twelfth International Conference on Learning Representations (ICLR). 2024. [清华A]

[37] Haoyang Xu, Tianhao Zhao, Sibei Yang, Yutian Lin. Penalizing Boundary Activation for Object Completeness in Diffusion Models. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]

[38] Ruifei Zhang, Wei Zhang, Xiao Tan, Sibei Yang, Xiang Wan, Xiaonan Luo, Guanbin Li. VLDrive: Vision-Augmented Lightweight MLLMs for Efficient Language-grounded Autonomous Driving. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2025. [CCF A]

[39] Hong-Yu Zhou, Chixiang Lu, Chaoqi Chen, Sibei Yang, and Yizhou Yu. A unified visual information preservation framework for self-supervised pre-training in medical image analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023, 45(7): 8020-8035. [CCF A][中科院一区]

[40] Zijian He, Yuwei Ning, Yipeng Qin, Guangrun Wang, Sibei Yang, Liang Lin, and Guanbin Li. VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025. [CCF A]

[41] Chunlin Yu, Hanqing Wang, Ye Shi, Haoyang Luo, Sibei Yang, Jingyi Yu, and Jingya Wang. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2025. [CCF A]

[42] Yingdong Shi, Changming Li, Yifan Wang, Yongxiang Zhao, Anqi Pang, Sibei Yang, Jingyi Yu, and Kan Ren.Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability.Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).2025. [CCF A]

[43] Yumeng Liu, Yaxun Yang, Youzhuo Wang, Xiaofei Wu, Jiamin Wang, Yichen Yao, Sören Schwertfeger, Sibei Yang, Wenping Wang, Jingyi Yu, Xuming He, and Yuexin Ma. RealDex: towards human-like grasping for robotic dexterous hand. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence (IJCAI). 2024: 6859-6867. [CCF A]

[44] Han Liang, Jiacheng Bao, Ruichi Zhang, Sihan Ren, Yuecheng Xu, Sibei Yang, Xin Chen, Jingyi Yu, and Lan Xu. OMG: Towards open-vocabulary motion generation via mixture of controllers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2024: 482-493. [CCF A]

[45] Yu Wu, Yana Wei, Haozhe Wang, Yongfei Liu, Sibei Yang, and Xuming He. Grounded image text matching with mismatched relation reasoning. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2023: 2976-2987. [CCF A]

[46] Hong-Yu Zhou, Chixiang Lu, Sibei Yang, and Yizhou Yu. Convnets vs. transformers: Whose visual representations are more transferable? Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 2021: 2230-2238. [CCF A]

[47] Hong-Yu Zhou, Chixiang Lu, Sibei Yang, Xiaoguang Han, and Yizhou Yu. Preservational learning improves self-supervised medical image models by reconstructing diverse contexts. Proceedings of the IEEE/CVF International Conference on Computer Vision (CVPR). 2021: 3499-3509. [CCF A]

[48] Weifeng Ge, Sibei Yang, and Yizhou Yu. Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning. Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 2018: 1277-1286. [CCF A]

[49] Zhenxiang Lin, Xidong Peng, Peishan Cong, Yuenan Hou, Xinge Zhu, Sibei Yang, and Yuexin Ma. Wildrefer: 3d object localization in large-scale dynamic scenes with multi-modal visual data and natural language. European Conference on Computer Vision. Cham: Springer Nature Switzerland (ECCV), 2024: 456-473. [CCF B ]

[50] Liang Lin, Pengxiang Yan, Xiaoqian Xu, Sibei Yang, Kun Zeng, and Guanbin Li. Structured attention network for referring image segmentation. IEEE Transactions on Multimedia (TMM), 2021, 24: 1922-1932. [CCF B]

[51] Jinpeng Li, Haiping Wang, Jiabin chen, Yuan Liu, Zhiyang Dou, Yuexin Ma, Sibei Yang, Yuan Li, Wenping Wang, Zhen Dong, Bisheng Yang. CityAnchor: City-scale 3D Visual Grounding with Multi-modality LLMs. The Thirteenth International Conference on Learning Representations (ICLR). 2025. [清华A]

[52] Yifan Wang, Yifei Liu, Yingdong Shi, Changming Li, Anqi Pang, Sibei Yang, Jingyi Yu, Kan Ren. Discovering Influential Neuron Path in Vision Transformers. The Thirteenth International Conference on Learning Representations (ICLR). 2025. [清华A]