Problem

Generate realistic facial animation directly from raw audio while preserving lip-sync accuracy and expressive motion over time.

Method

The approach uses a deep learning pipeline that maps audio features to facial dynamics with temporal modeling to improve coherence across frames.

My Role

I designed the model architecture and training strategy, especially around lip-sync quality, temporal consistency, and controllable expression behavior.

Focus

  • Temporal coherence
  • Cross-modal alignment between speech and motion
  • Natural and controllable facial dynamics

Updated: