Below is a 3x2 version of the reinforcement learning task from class. Other than being a smaller world, the details are the same. The agent can execute actions: up, down, left, or right. When executing an action, the agent has an 80% chance of actually moving in that direction, a 10% chance of moving in the-90 degrees direction, and a 10% chance of moving in the +90 degrees direction. If the agent attempts to move into a wall, then the agent stays in the same location. If the agent moves into location (3,1), it receives a +1 reward and the task is over. If the agent moves into location (3,2). It receives a -1 reward and the task is over. For all other actions, the agent receives a -0.04 reward. I 2 1 1 -1 +1 1 2 3 (a) Show the utility equations for U(1.1), U(1,2), U(2,1) and U(2,2) for the policy in the above picture assuming the discount factor gamma=0.9. (b) Show the final utility values for U(1,1), U(1,2), U(2,1), and U(2,2) for this policy. You do not need to show the computations, just the final values rounded to two-digit precision.
Below is a 3x2 version of the reinforcement learning task from class. Other than being a smaller world, the details are the same. The agent can execute actions: up, down, left, or right. When executing an action, the agent has an 80% chance of actually moving in that direction, a 10% chance of moving in the-90 degrees direction, and a 10% chance of moving in the +90 degrees direction. If the agent attempts to move into a wall, then the agent stays in the same location. If the agent moves into location (3,1), it receives a +1 reward and the task is over. If the agent moves into location (3,2). It receives a -1 reward and the task is over. For all other actions, the agent receives a -0.04 reward. I 2 1 1 -1 +1 1 2 3 (a) Show the utility equations for U(1.1), U(1,2), U(2,1) and U(2,2) for the policy in the above picture assuming the discount factor gamma=0.9. (b) Show the final utility values for U(1,1), U(1,2), U(2,1), and U(2,2) for this policy. You do not need to show the computations, just the final values rounded to two-digit precision.
Chapter7: Uncertainty
Section: Chapter Questions
Problem 7.3P
Related questions
Question
4
Expert Solution
This question has been solved!
Explore an expertly crafted, step-by-step solution for a thorough understanding of key concepts.
Step by step
Solved in 3 steps
Knowledge Booster
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, economics and related others by exploring similar questions and additional content below.Recommended textbooks for you
Managerial Economics: A Problem Solving Approach
Economics
ISBN:
9781337106665
Author:
Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike Shor
Publisher:
Cengage Learning
Managerial Economics: A Problem Solving Approach
Economics
ISBN:
9781337106665
Author:
Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike Shor
Publisher:
Cengage Learning