LIVE CAMERA FEED SCENE DESCRIPTOR FOR VISUALLY IMPAIRED

Abstract
Every visually impaired people wants to interact with their nature, surrounding and people. They want to feel the event happening on the nature but are naturally deprived. Our product aims at assisting visually impaired individuals in navigating their way around Pulchowk Campus and describing the actions happening inside the campus. The product utilizes a live camera feed and visual transformer techniques to generate a descriptive caption and its audio output, providing the user with a proper and timely description of their surroundings. The product is designed to work on some landmark of the campus and wide range of activities. We suggest a model that is fine tuned on the pre-trained Git-base-Vatex model in our campus video datasets to describe the surrounding scene.
Description
Every visually impaired people wants to interact with their nature, surrounding and people. They want to feel the event happening on the nature but are naturally deprived.
Citation