Abstract
Demand Response (DR) is a useful tool to develop a balance between the available generation and loads under smart grid environment. There are various price based schemes to implement DR and flatten the load profile. Hence, for the benefit of customers, proper load scheduling is required to lower the usage of electricity during peak load periods in order to decrease the electricity cost. This work formulates load scheduling as multi stage decision making problem or Markov Decision Problem (MDP). Reinforcement learning (RL) has been used to solve many decision making problems under stochastic environment. epsilon- Greedy algorithm is the most popular exploration method used in RL. In this paper, pursuit algorithm is developed to achieve a balance between exploration and exploitation process of the RL. The performance of both the algorithms is compared which shows the supremacy of Pursuit Algorithm over epsilon- greedy algorithm.