Niloofar Alavi
Multi-Agent Reinforcement Learning via domain adaptation
Abstract
Reinforcement Learning (RL) refers to a kind of Machine Learning on while an agent learns by interacting with its environment, observing the results of interactions and receiving a reward (positive or negative), accordingly. RL has many applications for multi-agent systems, especially in dynamic and unknown environments. Applications of MARL can be categorized from robot soccer, networks, cloud computing, job scheduling, and to optimal reactive power dispatch. However, most multi-agent reinforcement learning (MARL) algorithms in these environments suffer from some problems specifically the exponential computational complexity in the joint state-action space, which leads to the lack of scalability of algorithms in realistic multi-agent problems. Consequently, in this study, two novel algorithms are presented that are based on Markov decision process (MDP) model. Both the RKT-MARL and SR-MARL algorithms unlike the traditional reinforcement learning methods exploit the sparse interactions and knowledge transfer to achieve an equilibrium across agents. Moreover, both algorithms benefit from negotiation to find the equilibrium set. Both algorithms use the minimum variance method to select the best action in the equilibrium set, and transfer the knowledge of state-action values across various agents. Also, RKT-MARL, initialize the Q-values in coordinate states as coefficients of current environmental information and previous knowledge. The SR- MARL, has also improved the performance of RKT-MARL by creating a predictive map of the environment using the successor representation. This predictive map shows the expected occupancy of the future states of a current state. In order to evaluate the performance of our presented algorithms, a group of experiments are conducted on five grid world games and the results show fast convergence of RKT-MARL and SR-MARL. Fast convergence in this algorithms indicates that agents quickly solve the problem of reinforcement learning and approach their goal.
Key Words : Domain adaptation , Machine learning, Multi-agent reinforcement learning, Knowledge transfer, Sparse interactions, Successor representation, Negotiation, Equilibrium, Regularization.