Pose-Aware Speech Driven Facial Landmark Animation Pipeline for Automated Dubbing

Mc Donnell, Rachel

dc.contributor.author	Mc Donnell, Rachel
dc.date.accessioned	2024-02-21T06:40:26Z
dc.date.available	2024-02-21T06:40:26Z
dc.date.issued	2022
dc.date.submitted	2022	en
dc.identifier.citation	Bigioi, D. and Jordan, H. and Jain, R. and Mcdonnell, R. and Corcoran, P., Pose-Aware Speech Driven Facial Landmark Animation Pipeline for Automated Dubbing, IEEE Access, 10, 2022, 133357-133369	en
dc.identifier.other	Y
dc.description.abstract	A novel neural pipeline allowing one to generate pose aware 3D animated facial landmarks synchronised to a target speech signal is proposed for the task of automatic dubbing. The goal is to automatically synchronize a target actors’ lips and facial motion to an unseen speech sequence, while maintaining the quality of the original performance. Given a 3D facial key point sequence extracted from any reference video, and a target audio clip, the neural pipeline learns how to generate head pose aware, identity aware landmarks and outputs accurate 3D lip motion directly at the inference stage. These generated landmarks can be used to render a photo-realistic video via an additional image to image conversion stage. In this paper, a novel data augmentation technique is introduced that increases the size of the training dataset from N audio/visual pairs up to NxN unique pairs for the task of automatic dubbing. The trained inference pipeline employs a LSTM-based network that takes Mel-coefficients as input from an unseen speech sequence, combined with head pose, and identity parameters extracted from a reference video to generate a new set of pose aware 3D landmarks that are synchronized with the unseen speech.	en
dc.format.extent	133357-133369	en
dc.language.iso	en	en
dc.relation.ispartofseries	IEEE Access;
dc.relation.ispartofseries	10;
dc.rights	Y	en
dc.subject	speech	en
dc.subject	novel neural pipeline	en
dc.subject	automatic dubbing	en
dc.subject.lcsh	speech	en
dc.subject.lcsh	novel neural pipeline	en
dc.subject.lcsh	automatic dubbing	en
dc.title	Pose-Aware Speech Driven Facial Landmark Animation Pipeline for Automated Dubbing	en
dc.type	Journal Article	en
dc.type.supercollection	scholarly_publications	en
dc.type.supercollection	refereed_publications	en
dc.identifier.peoplefinderurl	http://people.tcd.ie/ramcdonn
dc.identifier.rssinternalid	251191
dc.identifier.doi	http://dx.doi.org/10.1109/ACCESS.2022.3231137
dc.rights.ecaccessrights	openAccess
dc.identifier.orcid_id	0000-0002-1957-2506
dc.identifier.uri	http://hdl.handle.net/2262/105583

Files in this item

Name:: Pose-Aware_Speech_Driven_Facia ...
Size:: 2.519Mb
Format:: PDF

View/Open

Name:: license.txt
Size:: 3.463Kb
Format:: Text file

View/Open

This item appears in the following Collection(s)

Computer Science (Scholarly Publications)
Computer Science (Scholarly Publications)
RSS Feeds

Show simple item record

Browse

My Account

Pose-Aware Speech Driven Facial Landmark Animation Pipeline for Automated Dubbing

Files in this item

This item appears in the following Collection(s)