Free Classifieds Ads In India Buy/Sell/Rent

angoraknee2's Profile

Individual

We present hierarchical Deep Q-Network with Forgetting (HDQF) that took first place in MineRL competition. GAMES HDQF works on imperfect demonstrations utilize hierarchical structure of expert trajectories extracting effective sequence of meta-actions and subgoals. We introduce structured task dependent replay buffer and forgetting technique that allow the HDQF agent to gradually erase poor-quality expert data from the buffer. In this paper we present the details of the HDQF algorithm and give the experimental results in Minecraft domain.Deep reinforcement learning (RL) has achieved compelling success on many complex sequential decision-making problems especially in simple domains. In such example as AlphaStar [6], AlphaZero [2], OpenAI Five human or super-human level of performance was attained. However, RL algorithms usually require a huge amount of environment-samples required for training to reach good performance [1]. Learning from demonstration is a well-known alternative, but until now, this approach has not achieved serious success in complex non-single-task environments. This was largely due to the fact that obtaining high-quality expert demonstrations in sufficient quantity in sample-limited, real-world domains is a separate non-trivial problem.Minecraft as a compelling domain for the development of reinforcement and imitation learning based methods was recently introduced [5]. It presents unique challenges because Minecraft is a 3D, first-person, open-world game where the agent should gather resources and create of structures and items to achieve any goal. Due to its popularity as a video game it turned out to be possible to collect a large number of expert trajectories in which individual subtasks are solved. This allowed the appealing MineRL competition to run. Organizers have released the largest-ever dataset of human demonstrations on a Minecraft domain. The primary goal of the competition is to foster the development of algorithms that can efficiently leverage human priors to drastically reduce the number of samples needed to solve complex, hierarchical, and sparse environments.The main difficulty in solving the MineRL problem was the imperfection of demonstrations and the presence of hierarchical relationships of subtasks. In this paper we present hierarchical Deep Q-Network with Forgetting (HDQF) hat allowed us to take the first place in MineRL competition [4]. HDQF works on imperfect demonstrations and utilize hierarchical structure of expert trajectories extracting effective sequence of meta-actions and subgoals. Each subtask is solved by its own simple strategy, which extends DQfD approach [7] and relies on a structured buffer and gradually forgets poor-quality expert data. In this paper we present the details of our algorithm and give the results that allow the HDQF agent to play Minecraft at the human level.One way to explore the domain with the use of expert data is to do behavioral cloning (BC). Pure supervised learning methods based on BC suffer from distribution shift: because the agent greedily imitates demonstrated actions, it can drift away from demonstrated states due to error accumulation. The other way to use expert data in search of exploration policy is to use conventional RL methods like PPO, DDDQN, etc. and guide exploration through enforcing occupancy measure matching between the learned policy and current demonstrations. Main approach is to use demonstration trajectories sampled from an expert policy to guide the learning procedure, by either putting the demonstrations into a replay buffer or using them to pretrain the policy in a supervised manner.Organizers of MineRL competition provided us a few baselines. Standard DQfD [3] get the max score of 64 after 1000 episodes, PPO get max of 55 after 800 episode, rainbow also get max of 55 after 800 episodes of training.Our best solution exploits the method of injecting expert data into agent replay buffer. The DQfD, which our method is based on, is an advanced approach to reinforcement learning with additional expert demonstrations. The main idea of DQfD is to use an algorithm called Deep Q-Network (DQN) and combine loss function J(Q)
Full name : angoraknee2
Address :
Location : Port Blair, Tripura, India
Website : https://mooc.elte.hu/eportfolios/379879/Home/Minecraft_Video_Game__TV_Tropes
Phone No :

Latest ads from this seller

Pages

We are Social !!!