Audio-driven Facial Animation Synthesis

Problem

Generate realistic facial animation directly from raw audio while preserving lip-sync accuracy and expressive motion over time.

Method

The approach uses a deep learning pipeline that maps audio features to facial dynamics with temporal modeling to improve coherence across frames.

My Role

I designed the model architecture and training strategy, especially around lip-sync quality, temporal consistency, and controllable expression behavior.

Focus

Temporal coherence
Cross-modal alignment between speech and motion
Natural and controllable facial dynamics