SUBEDI, AADITYA MANI et al2023-07-302023-07-302023-04-30https://hdl.handle.net/20.500.14540/18796Every visually impaired people wants to interact with their nature, surrounding and people. They want to feel the event happening on the nature but are naturally deprived.Every visually impaired people wants to interact with their nature, surrounding and people. They want to feel the event happening on the nature but are naturally deprived. Our product aims at assisting visually impaired individuals in navigating their way around Pulchowk Campus and describing the actions happening inside the campus. The product utilizes a live camera feed and visual transformer techniques to generate a descriptive caption and its audio output, providing the user with a proper and timely description of their surroundings. The product is designed to work on some landmark of the campus and wide range of activities. We suggest a model that is fine tuned on the pre-trained Git-base-Vatex model in our campus video datasets to describe the surrounding scene.enVideo Captioning,Generative Image-To-Text Transformer,processor,Vatex dataset,LIVE CAMERA FEED SCENE DESCRIPTOR FOR VISUALLY IMPAIREDReport