Below is a 3x2 version of the reinforcement learning task from class. Other than being a smaller world, the details are the same. The agent can execute actions: up, down, left, or right. When executing an action, the agent has an 80% chance of actually moving in that direction, a 10% chance of moving in the-90 degrees direction, and a 10% chance of moving in the +90 degrees direction. If the agent attempts to move into a wall, then the agent stays in the same location. If the agent moves into location (3,1), it receives a +1 reward and the task is over. If the agent moves into location (3,2). It receives a -1 reward and the task is over. For all other actions, the agent receives a -0.04 reward. I 2 1 1 -1 +1 1 2 3 (a) Show the utility equations for U(1.1), U(1,2), U(2,1) and U(2,2) for the policy in the above picture assuming the discount factor gamma=0.9. (b) Show the final utility values for U(1,1), U(1,2), U(2,1), and U(2,2) for this policy. You do not need to show the computations, just the final values rounded to two-digit precision.

Microeconomic Theory
12th Edition
ISBN:9781337517942
Author:NICHOLSON
Publisher:NICHOLSON
Chapter7: Uncertainty
Section: Chapter Questions
Problem 7.3P
icon
Related questions
Question

4

Below is a 3x2 version of the reinforcement learning task from class. Other than being a smaller world, the details are the same. The agent can
execute actions: up, down, left, or right. When executing an action, the agent has an 80% chance of actually moving in that direction, a 10% chance
of moving in the -90 degrees direction, and a 10% chance of moving in the +90 degrees direction. If the agent attempts to move into a wall, then the
agent stays in the same location. If the agent moves into location (3,1), it receives a +1 reward and the task is over. If the agent moves into location
(3.2). it recelves a -1 reward and the task is over. For all other actions, the agent receives a -0.04 reward.
2
-1
1
+1
1
2
3
(a) Show the utility equations for U(1,1), U(1,2), U(2,1) and U(2,2) for the policy in the above picture assuming the discount factor gamma = 0.9.
(b) Show the final utility values for U(1,1), U(1,2), U(2,1), and U(2,2) for this policy. You do not need to show the computations, just the final values
rounded to two-digit precislon.
Transcribed Image Text:Below is a 3x2 version of the reinforcement learning task from class. Other than being a smaller world, the details are the same. The agent can execute actions: up, down, left, or right. When executing an action, the agent has an 80% chance of actually moving in that direction, a 10% chance of moving in the -90 degrees direction, and a 10% chance of moving in the +90 degrees direction. If the agent attempts to move into a wall, then the agent stays in the same location. If the agent moves into location (3,1), it receives a +1 reward and the task is over. If the agent moves into location (3.2). it recelves a -1 reward and the task is over. For all other actions, the agent receives a -0.04 reward. 2 -1 1 +1 1 2 3 (a) Show the utility equations for U(1,1), U(1,2), U(2,1) and U(2,2) for the policy in the above picture assuming the discount factor gamma = 0.9. (b) Show the final utility values for U(1,1), U(1,2), U(2,1), and U(2,2) for this policy. You do not need to show the computations, just the final values rounded to two-digit precislon.
Expert Solution
steps

Step by step

Solved in 3 steps

Blurred answer
Knowledge Booster
Probability and Expected Value
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, economics and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Microeconomic Theory
Microeconomic Theory
Economics
ISBN:
9781337517942
Author:
NICHOLSON
Publisher:
Cengage
Managerial Economics: A Problem Solving Approach
Managerial Economics: A Problem Solving Approach
Economics
ISBN:
9781337106665
Author:
Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike Shor
Publisher:
Cengage Learning