“Design of Stock Trading Agent Using Deep Reinforcement Learning”

Date
2022-09
Authors
Lal, Janak Kumar
Journal Title
Journal ISSN
Volume Title
Publisher
IOE Pulchowk Campus
Abstract
This study adopts Double Deep Q learning algorithm to design trading strategies to trade stocks of four commercial banks listed in NEPSE. The reinforcement learning agent takes discrete actions and gets negative or positive reward from the environment. CNN is utilized to form the policy network. A target network is used to mitigate instability due to Deep Q Network. The concept of experience replay is used to randomly sample the batches of experience from the memory and train the network. The performance of Double Deep Q learning agent was compared with various baseline trading strategies in terms of annualised expected trade return. The maximum annualised expected trade return obtained with traditional baseline methods was 103% for testing data of NABIL, while for the same data the reinforcement learning agent using double deep Q learning algorithm obtained annualized expected trade return of 114.44%. The experiments showed that, Double Deep Q learning agent with experience replay had higher annualised expected trade return compared to baseline trading strategies.
Description
Malcom Gladwell in his book “Outliers: The Story of Success” [1] writes that it takes 10,000 hours of intensive practice to achieve mastery of complex skills. Can an amature violinist be an expert by playing the same song for 10,000 hours? Does a person who has tossed an unbiased coin 10,000 times predict the next outcome more accurately than a person who has not tossed a coin? In case of tossing an unbiased coin no matter how many times a person has tossed a coin the outcome of the next toss will be random and unpredictable
Keywords
Reinforcement learning, Double Deep Q-Learning
Citation