“Design of Stock Trading Agent Using Deep Reinforcement Learning”
Date
2022-09
Authors
Lal, Janak Kumar
Journal Title
Journal ISSN
Volume Title
Publisher
IOE Pulchowk Campus
Abstract
This study adopts Double Deep Q learning algorithm to design trading strategies
to trade stocks of four commercial banks listed in NEPSE. The reinforcement
learning agent takes discrete actions and gets negative or positive reward from the
environment. CNN is utilized to form the policy network. A target network is
used to mitigate instability due to Deep Q Network. The concept of experience
replay is used to randomly sample the batches of experience from the memory
and train the network. The performance of Double Deep Q learning agent was
compared with various baseline trading strategies in terms of annualised expected
trade return. The maximum annualised expected trade return obtained with
traditional baseline methods was 103% for testing data of NABIL, while for the
same data the reinforcement learning agent using double deep Q learning algorithm
obtained annualized expected trade return of 114.44%. The experiments showed
that, Double Deep Q learning agent with experience replay had higher annualised
expected trade return compared to baseline trading strategies.
Description
Malcom Gladwell in his book “Outliers: The Story of Success” [1] writes that it
takes 10,000 hours of intensive practice to achieve mastery of complex skills. Can
an amature violinist be an expert by playing the same song for 10,000 hours? Does
a person who has tossed an unbiased coin 10,000 times predict the next outcome
more accurately than a person who has not tossed a coin? In case of tossing an
unbiased coin no matter how many times a person has tossed a coin the outcome
of the next toss will be random and unpredictable
Keywords
Reinforcement learning, Double Deep Q-Learning