![]() ![]() While both approaches ultimately allow us to take intelligent actions given a situation, the means of getting to that action differ significantly. ![]() ![]() Unlike policy gradient methods, which attempt to learn functions which directly map an observation to an action, Q-Learning attempts to learn the value of being in a given state, and taking a specific action there. It will hopefully give an intuition into what is really happening in Q-Learning that we can then build on going forward when we eventually combine the policy gradient and Q-learning approaches to build state-of-the-art RL agents (If you are more interested in Policy Networks, or already have a grasp on Q-Learning, feel free to start the tutorial series here instead). Given that we are going back to basics, it may be best to think of this as Part-0 of the series. Instead of starting with a complex and unwieldy deep neural network, we will begin by implementing a simple lookup-table version of the algorithm, and then show how to implement a neural-network equivalent using Tensorflow. These are a little different than the policy-based algorithms that will be looked at in the the following tutorials (Parts 1–3). ![]() Our version is a little less photo-realistic.įor this tutorial in my Reinforcement Learning series, we are going to be exploring a family of RL algorithms called Q-Learning algorithms. We’ll be learning how to solve the OpenAI FrozenLake environment. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |