Hierarchically controlled deformable 3D gaussians for talking head synthesis

Proceedings of the AAAI Conference on Artificial Intelligence

Zhenhua Wu, Linxuan Jiang, Xiang Li, Chaowei Fang, Yipeng Qin, Guanbin Li

Proceedings of the AAAI Conference on Artificial Intelligence

Abstract

Audio-driven talking head synthesis is a critical task in digital human modeling. While recent advances using diffusion models and Neural Radiance Fields (NeRF) have improved visual quality, they often require substantial computational resources, limiting practical deployment. We present a novel framework for audio-driven talking head synthesis, namely Hierarchically Controlled Deformable 3D Gaussians (HiCoDe), which achieves state-of-the-art performance with significantly reduced computational costs. Our key contribution is a hierarchical control strategy that effectively bridges the gap between sparse audio features and dense 3D Gaussian point clouds. Specifically, this strategy comprises two control levels: i) coarse-level control based on a 3D Morphable Model (3DMM) and ii) fine-level control using facial landmarks. Extensive experiments on the HDTF dataset and additional test sets demonstrate that our method outperforms existing approaches in visual quality, facial landmark accuracy, and audio-visual synchronization while being more computationally efficient in both training and inference.

Framework

Experiment

Conclusion

We present a novel framework for audio-driven talking head synthesis based on deformable 3D Gaussians, addressing the

challenge of achieving high visual quality while maintaining computational efficiency. The key contribution of our work

is the hierarchical control strategy, which bridges the gap between sparse audio inputs and dense 3D Gaussians and

leverages the complementary power of 3DMM and facial landmark control signals. Our extensive experiments on the HDTF dataset and additional test sets demonstrate that our method outperforms existing state-of-the-art approaches in both visual quality and computational costs.

中山大学人机物智能融合实验室 Human Cyber Physical Intelligence Integration Lab

hcp@sysu.edu.cn
广州市广州大学城外环东路132号

Official Account

News: Achievements; Activities; sharings; Talks

People: Faculty; Students; Alumni

Projects: Computer Vision; Multimodal; Robotics

Links: Git-Lab