GESTURE SYNTHESIS USING MULTIMODAL SUPERVISED LEARNING

dc.contributor.authorBHANDARI, PRASUN
dc.contributor.authorROUNIYAR, RAHUL
dc.contributor.authorSHRESTHA, RONAB
dc.contributor.authorGURAU, SUNIL
dc.date.accessioned2023-07-31T10:15:04Z
dc.date.available2023-07-31T10:15:04Z
dc.date.issued2023-05
dc.descriptionOne of the long-standing ambitions of the modern science and engineering has been to create a non-human entity that manifests human-like intelligence and behavior. One step to achieving the goal is executing a communication just like the humans do.en_US
dc.description.abstractOne of the long-standing ambitions of the modern science and engineering has been to create a non-human entity that manifests human-like intelligence and behavior. One step to achieving the goal is executing a communication just like the humans do. Human speech is often accompanied by a variety of gestures which add rich non-verbal information to the message the speaker is trying to convey. Gestures add clarity to the intention and emotions of the speaker and enhance the speech by adding visual cues alongside audio signal. Our project aims to synthesize co-speech gestures by learning from individual speaker’s style. We follow a data-driven approach instead of rule-based approach as the audio-gesture relation is poorly captured by a rule-based system due to issues like asynchrony and multi-modality. As is the current trend, we train the modal from in-the-wild videos embedded with audio instead of relying on the motion capture of subjects in lab for video annotation. For establishing the ground truth for the data set of video frames, we rely on an automatic pose detection system. Although the ground truth signal tends to be not as accurate as manually annotated frames, the approach relieves us of time and labor expense. We perform the crossmodal translation from monologue speech of a single speaker to their hand and arm motion based on the learning of temporal correlation between the sequence of pose and audio sample.en_US
dc.identifier.urihttps://hdl.handle.net/20.500.14540/18866
dc.language.isoenen_US
dc.publisherI.O.E. Pulchowk Campusen_US
dc.subjectGesture synthesis,en_US
dc.subjectSupervised learning,en_US
dc.subjectHuman Computer Interaction,en_US
dc.titleGESTURE SYNTHESIS USING MULTIMODAL SUPERVISED LEARNINGen_US
dc.typeReporten_US
local.academic.levelBacheloren_US
local.affiliatedinstitute.titlePulchowk Campusen_US
local.institute.titleInstitute of Engineeringen_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Prasun Bhandari et al. be report electronics may 2023.pdf
Size:
1.77 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: