Electronics and communication Engineering

Permanent URI for this collectionhttps://hdl.handle.net/20.500.14540/17054

Browse

Now showing 1 - 11 of 11

GESTURE SYNTHESIS USING MULTIMODAL SUPERVISED LEARNING
(I.O.E. Pulchowk Campus, 2023-05) BHANDARI, PRASUN; ROUNIYAR, RAHUL; SHRESTHA, RONAB; GURAU, SUNIL
One of the long-standing ambitions of the modern science and engineering has been to create a non-human entity that manifests human-like intelligence and behavior. One step to achieving the goal is executing a communication just like the humans do. Human speech is often accompanied by a variety of gestures which add rich non-verbal information to the message the speaker is trying to convey. Gestures add clarity to the intention and emotions of the speaker and enhance the speech by adding visual cues alongside audio signal. Our project aims to synthesize co-speech gestures by learning from individual speaker’s style. We follow a data-driven approach instead of rule-based approach as the audio-gesture relation is poorly captured by a rule-based system due to issues like asynchrony and multi-modality. As is the current trend, we train the modal from in-the-wild videos embedded with audio instead of relying on the motion capture of subjects in lab for video annotation. For establishing the ground truth for the data set of video frames, we rely on an automatic pose detection system. Although the ground truth signal tends to be not as accurate as manually annotated frames, the approach relieves us of time and labor expense. We perform the crossmodal translation from monologue speech of a single speaker to their hand and arm motion based on the learning of temporal correlation between the sequence of pose and audio sample.
TRAFFIC VIOLATION DETECTION WITH COMPUTER VISION
(I.O.E. Pulchowk Campus, 2023-04) DAHAL, NEHA; G.C., SIMON; MAINALI, SUBASH; SUBEDI, UDAYA RAJ
In this project we used YOLOv5s that was trained on custom dataset collected by us which consisted of 2193 images of 6 classes which was augmented to extend our dataset to 5259 images and was split in the ratio of 70:20:10 for train, validation, and test respectively. For tracking the detected objects in the video, we used DeepSORT which tracks and outputs the bounding box for the object with respective track IDs. Then if the detected and tracked object have violated traffic lights the corresponding license plate of the object in question is sent as input for segmentation program. The image of the license plate undergoes HSV color space conversion, color masking and perspective transformed in that order before it is preprocessed for profiling the different types of license plate in the dataset. The image undergoes horizontal projection profiling and vertical projection profiling which is then validated to separate the characters of the license plate.
SMART TRIAL SYSTEM
(I.O.E. Pulchowk Campus, 2023-03) SHAKYA, AYUSH; ATAL, DEWASHISH; KARN, PRASHANNA; PANDIT, PRASHANNA RAJ
Driving is a common and important mode of transportation in the daily lives of people. It allows people to move freely and independently from one place to another. To drive legally in Nepal, as in most countries, a person requires a driving license, for which he/she has to undergo a series of tests in the government office. In Nepal, a candidate applying for a license has to drive over a fixed path in front of the authorities. The candidate has to complete the test in accordance with specific rules and if he fails to do so he/she will be disqualified. Manually monitoring the errors of the candidates is done by these authorities. Also, the issue of bribery for obtaining a driver’s license is a significant problem in Nepal. To address this issue, we have proposed an AI-based solution named CarSight. CarSight is a Computer Vision model designed to analyze visual data from cameras that verifies whether a driver has passed or failed the driving test. It uses YOLO-v5 to detect and track the motion of the trial vehicles in a set of real-time videos obtained from cameras placed around the model track. The system gives an instant pass or fails result depending on the driver’s performance once they finish the trial exam. The accuracy of detecting and evaluating the driver’s performance was successfully demonstrated by our experimental results for our system. Our proposed system can help to bring fairness in the process of obtaining a driver’s license.
GUIDANCE, NAVIGATION AND CONTROL OF A VTOL VEHICLE TO MAKE IT FOLLOW A PREDETERMINED TRAJECTORY
(I.O.E. Pulchowk Campus, 2023-04-30) PATHAK, SAKAR; ACHARYA, SAMUNDRA; SILWAL, SHREEJAN SINGH; DHAKAL, SWAYAM
In interplanetary missions, the landing of space vehicles is typically accomplished using parachutes. However, this simple method is not without its challenges, as these vehicles are prone to parachute drifts that are difficult to predict, especially on planets with dense atmospheres like Earth. As a result, significant attention has recently been given to the development of active control systems for space vehicles, allowing for precise guidance, navigation, and control over predetermined trajectories and enabling soft and accurate landings on planetary surfaces. The ability to follow a predetermined path and land softly and precisely using real-time onboard control algorithms would greatly enhance the capabilities of vehicles for interplanetary travel, while also increasing the re-usability of space vehicles. This not only benefits interplanetary travel but also improves space payload delivery systems by reducing costs and increasing efficiency. To this end, this project aims to implement control algorithms on an Electric Ducted Fan (EDF) powered model of a Vertical Take Off and Landing (VTOL) vehicle, enabling it to follow a fixed trajectory. A small CanSat payload will be attached to the vehicle and deployed at a specific altitude, simulating the tasks required of a full-scale vehicle. By utilizing these advanced control systems, space vehicles can navigate more accurately and efficiently, reducing the risks and costs associated with interplanetary travel. With a focus on trajectory control and precision landing, this project aims to contribute to the ongoing efforts to enhance space exploration and technology development.
AUTOMATIC MUSIC GENERATION
(I.O.E. Pulchowk Campus, 2023-05) KHANAL, PRAVESH; BHANDARI, SANJEEV; LAMICHHANE, SRIJANA; TAMANG, SUDIP
‘Automatic Music Generation’ composes short pieces of music using different parameters like notes, pitch interval, and chords. This project uses the concept of RNN (Recurrent Neural Network) and LSTM (Long Short Term Memory) to generate music using models. The traditional way of composing music requires much trial and error. With automatic music generation, we can predict suitable follow-up music using AI rather than testing music in a studio, effectively saving time. The main focus of this project is to use the LSTM-NN model and algorithm approach to generate music while ensuring that the output is synchronized between two separate outputs. The dataset used in this project was sourced from the ESAC Folk database[1]. The original format of the dataset was in .kern file format, which was converted into MIDI format for use in this project. MIDI files were used as a music data source, encoded into a time-series notation format, and used to train the model. This project uses the concept of dependencies on the time series music sequence to train a model. The trained model can generate time series notation and decode our generated music to obtain a music file.
MUSIC RECOGNITION USING DEEP LEARNING
(I.O.E. Pulchowk Campus, 2022-04) KHANAL, PRADEEP KUMAR; KAYASTHA, ACHYUT; KHATAKHO, ASHISH
In our daily lives, we often listen to songs that we like and enjoy. However, there may be instances when we are in transit or at venues such as clubs or restaurants, and we hear a song playing in the background that catches our attention. We may desire to listen to this song again at a later time, but unfortunately, we may not be aware of the title of the song. As a result, we are unable to locate and listen to it again. Our project, titled “Music Recognition Using Deep Learning,” aims to provide a convenient solution for identifying songs that are heard in various locations. While there are several existing popular applications, such as Shazam, SoundHound, and Google Sound Search, which offer music recognition services, we conducted an in-depth study of papers related to these apps to identify appropriate technologies and algorithms for our project. Our project employs a deep neural network that leverages a contrastive learning approach for the purpose of song recognition. Initially, a large collection of songs is gathered and subjected to signal processing techniques, including Short Time Fourier Transform (STFT), mel filter bank, and decibel scale to generate log mel-spectrograms. These log mel-spectrograms are then fed into the neural network, which is trained to generate a fingerprint for each song at the segment level. These fingerprints are stored in a database.
PREDICTION OF LYSINE SUCCINYLATION SITE USING PROTEIN LANGUAGE MODEL
(I.O.E. Pulchowk Campus, 2023-03) THAKUR, ANANDA; SIMKHADA, PRADIPTI; ADHIKARI, SHAMBHAVI; MAHARJAN, SUBIN
Lysine succinylation is an important post-translational modification(PTM) that controls protein shape, function, and physiochemical properties and has an effect on metabolic processes, the incidence of many diseases, and their progression. Several experimental and computational approaches have been proposed. This method uses a protein Language Model (pLMs) to extract features from protein sequences and convolution neural network (CNN) with artificial neural networks to predict succinylation. This method used two protein Language Models which are ProtBert and ProtT5. The protein sequences are fed to the model to develop a different set of protein embeddings. The embeddings from different pLMs were found to have negligible similarity. The embeddings are fed to 1DCNN Neural networks and the outputs from the networks are stacked and ANN is trained on top of that. The stacking ensemble has improved the performance of our proposed architecture. On comparison using benchmarking dataset, our method was comparable with other state of the art models on nearly every metric.
TEXT SUMMARIZATION USING LSA WITH TRANSFORMERS
(I.O.E. Pulchowk Campus, 2023-03) NEPAL, ABHAY; TRIPATHI, DIPESH; ADHIKARI, GOKARNA; DHAKAL, KSHITIZ
This project endeavors to present an implementation of a text summarization method employing the amalgamation of Latent Semantic Analysis (LSA) with Transformers. The primary objective of the proposed approach is to create a brief summary of an input text while retaining its fundamental meaning. The summarization model is assessed through two metrics, namely BLEU scores and ROUGE scores, which are utilized to gauge the model’s efficacy in generating a succinct and accurate summary. The project comprises several steps, including text preprocessing, feature extraction using LSA, and summary generation using Transformers. The resulting summary is evaluated by comparing it against a reference summary, and the quality of the summary is measured by the BLEU metric and ROUGE scores. The evaluation results reveal that the proposed approach yields high scores on both metrics, indicating its effectiveness in generating precise and concise summaries. Moreover, the project incorporates an analysis of the impact of various parameters on the performance of the summarization model, thereby providing valuable insights into the optimal parameter
PARAPHRASE GENERATION OF NEPALI LANGUAGE IN DEVANAGARI SCRIPT USING NATURAL LANGUAGE PROCESSING
(I.O.E. Pulchowk Campus, 2023-04-30) SAPKOTA, AAJAY; ACHARYA, ABINASH; BADE, ANISH; UPRETI, MAHESH
The project aims to develop a system for generating paraphrases using transformer-based models. Fine-tuning the pre-trained models on a large-scale dataset of sentence pairs, consisting of source sentences and their corresponding paraphrases, and evaluation of their performance on several benchmarks was performed. To accomplish the project’s objectives, several tasks were undertaken, such as researching and allocating resources, collecting and translating datasets, sampling, filtering, and analyzing the feasibility of the model. The comprehensive approach employed in the project has enabled the development of a powerful tool for generating high-quality paraphrases, which could enhance the natural language processing and generation capabilities of various applications. Moreover, this model excels in utilizing mathematical and statistical metrics such as BLEU and ROUGE scores to accurately assess paraphrasing. Additionally, the model demonstrated excellent performance on different datasets, showcasing its ability to generalize across different types of test sets. But, the zero-shot evaluation produced a result not so expected, suggesting a low recall score for new sentences which highlighted the need for further improvements in the model. Similarly, this model faces significant challenges such as entity mismatches, semantic and syntactic differences, and exact match problems between the input sentences and their corresponding generated sentences. Furthermore, the implementation of a web application enabled users to input sentences and receive their paraphrases in real time, demonstrating the practicality of our approach. Nonetheless, this research emphasizes the vast potential of advanced language models to enhance natural language processing capabilities in low-resource languages.
AUGMENTING SELF-LEARNING AGENT IN FIRST-PERSON SHOOTER GAME USING REINFORCEMENT LEARNING
(I.O.E. Pulchowk Campus, 2023-04-30) SINGH, SAMRAT; NEUPANE, SKEIN; PANDEY, SUSHANT; JOSHI, YACHU RAJA
This group project highlights the effectiveness of utilizing reinforcement learning (RL) along with the Proximal Policy Optimization (PPO) algorithm to train an agent to play aWolfenstein3Dlike game with multiple levels. The agent exhibited exceptional performance in relation to reward, time efficiency, and overall effectiveness. An in-depth analysis of its performance indicated marked enhancements in the reward curves, strategic navigation throughout the game levels, and expeditious completion of each level. The study highlights the potential of RL and PPO for training agents in complex video games with multiple levels, as well as in other applications such as agent-based modeling and machine learning.
DEEP LEARNING APPROACH FOR HEART RATE PREDICTION USING PPG SIGNAL
(I.O.E. Pulchowk Campus, 2023-04) BASTAKOTI, NAYAN; LAMSAL, SUMAN; POUDEL, SURAJ; SHARMA, YUKTA
Photoplethysmography (PPG) is a low-cost optical device that measures changes in blood volume in the microvascular tissue bed from the skin’s surface. It has been used in commercial medical devices to gauge peripheral vascular disease and autonomic function by monitoring blood pressure, heart rate, and oxygen saturation. Due to the presence of motion artifact during exercises, there arises difficulty in measuring heart rate from PPG signal. A machine learning based approach is used to monitor heart rate (HR) using wrist-type photoplethysmography (PPG) signals in this paper. By combining 1D CNN and a bidirectional LSTM, the model get benefit from the strengths of both architectures, capturing both local and long-term patterns in the input data. The proposed model exhibits average absolute error of less than 1.5 bpm for all the training and test datasets. The model shows the promising result with less than 300 thousands network parameters.

Browse

Recent Submissions