GESTURE SYNTHESIS USING MULTIMODAL SUPERVISED LEARNING

BHANDARI, PRASUN; ROUNIYAR, RAHUL; SHRESTHA, RONAB; GURAU, SUNIL

GESTURE SYNTHESIS USING MULTIMODAL SUPERVISED LEARNING

dc.contributor.author	BHANDARI, PRASUN
dc.contributor.author	ROUNIYAR, RAHUL
dc.contributor.author	SHRESTHA, RONAB
dc.contributor.author	GURAU, SUNIL
dc.date.accessioned	2023-07-31T10:15:04Z
dc.date.available	2023-07-31T10:15:04Z
dc.date.issued	2023-05
dc.description	One of the long-standing ambitions of the modern science and engineering has been to create a non-human entity that manifests human-like intelligence and behavior. One step to achieving the goal is executing a communication just like the humans do.	en_US
dc.description.abstract	One of the long-standing ambitions of the modern science and engineering has been to create a non-human entity that manifests human-like intelligence and behavior. One step to achieving the goal is executing a communication just like the humans do. Human speech is often accompanied by a variety of gestures which add rich non-verbal information to the message the speaker is trying to convey. Gestures add clarity to the intention and emotions of the speaker and enhance the speech by adding visual cues alongside audio signal. Our project aims to synthesize co-speech gestures by learning from individual speaker’s style. We follow a data-driven approach instead of rule-based approach as the audio-gesture relation is poorly captured by a rule-based system due to issues like asynchrony and multi-modality. As is the current trend, we train the modal from in-the-wild videos embedded with audio instead of relying on the motion capture of subjects in lab for video annotation. For establishing the ground truth for the data set of video frames, we rely on an automatic pose detection system. Although the ground truth signal tends to be not as accurate as manually annotated frames, the approach relieves us of time and labor expense. We perform the crossmodal translation from monologue speech of a single speaker to their hand and arm motion based on the learning of temporal correlation between the sequence of pose and audio sample.	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.14540/18866
dc.language.iso	en	en_US
dc.publisher	I.O.E. Pulchowk Campus	en_US
dc.subject	Gesture synthesis,	en_US
dc.subject	Supervised learning,	en_US
dc.subject	Human Computer Interaction,	en_US
dc.title	GESTURE SYNTHESIS USING MULTIMODAL SUPERVISED LEARNING	en_US
dc.type	Report	en_US
local.academic.level	Bachelor	en_US
local.affiliatedinstitute.title	Pulchowk Campus	en_US
local.institute.title	Institute of Engineering	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Prasun Bhandari et al. be report electronics may 2023.pdf
Size:: 1.77 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Electronics and communication Engineering