Below is a 3x2 version of the reinforcement learning task from class. Other than being a smaller world, the details are the same. The agent can execute actions: up, down, left, or right. When executing an action, the agent has an 80% chance of actually moving in that direction, a 10% chance of moving in the-90 degrees direction, and a 10% chance of moving in the +90 degrees direction. If the agent attempts to move into a wall, then the agent stays in the same location. If the agent moves into location (3,1), it receives a +1 reward and the task is over. If the agent moves into location (3,2). It receives a -1 reward and the task is over. For all other actions, the agent receives a -0.04 reward. I 2 1 1 -1 +1 1 2 3 (a) Show the utility equations for U(1.1), U(1,2), U(2,1) and U(2,2) for the policy in the above picture assuming the discount factor gamma=0.9. (b) Show the final utility values for U(1,1), U(1,2), U(2,1), and U(2,2) for this policy. You do not need to show the computations, just the final values rounded to two-digit precision.

Below is a 3x2 version of the reinforcement learning task from class. Other than being a smaller world, the details are the same. The agent can execute actions: up, down, left, or right. When executing an action, the agent has an 80% chance of actually moving in that direction, a 10% chance of moving in the-90 degrees direction, and a 10% chance of moving in the +90 degrees direction. If the agent attempts to move into a wall, then the agent stays in the same location. If the agent moves into location (3,1), it receives a +1 reward and the task is over. If the agent moves into location (3,2). It receives a -1 reward and the task is over. For all other actions, the agent receives a -0.04 reward. I 2 1 1 -1 +1 1 2 3 (a) Show the utility equations for U(1.1), U(1,2), U(2,1) and U(2,2) for the policy in the above picture assuming the discount factor gamma=0.9. (b) Show the final utility values for U(1,1), U(1,2), U(2,1), and U(2,2) for this policy. You do not need to show the computations, just the final values rounded to two-digit precision.

Microeconomic Theory

12th Edition

ISBN:9781337517942

Author:NICHOLSON

Publisher:NICHOLSON

Chapter7: Uncertainty

Section: Chapter Questions

Problem 7.3P

See similar textbooks

Similar questions

The injured football player Bad news everyone! There is 1 second left in the game, and Tom Brady has injured himself. The matrices below depict the relative probabilities of winning givenan offensive and a defensive play call. (The row player is the New England Patriots and the column player is the opponent.) How much has the all star's home team probability of winning decreased due to the injury? Pass uny Patriots D Pass .4, .6 D Run .9,.1 .8,.2 .5,.5 Pass Run Opponent D Pass D Run .06, .94 .32, .68 .8,.2 .5,.5
Cost-Benefit Analysis Suppose you can take one of two summer jobs. In the first job as a flight attendant, with a salary of $5,000, you estimate the probability you will die is 1 in 40,000. Alternatively, you could drive a truck transporting hazardous materials, which pays $12,000 and for which the probability of death is 1 in 10,000. Suppose that you're indifferent between the two jobs except for the pay and the chance of death. If you choose the job as a flight attendant, what does this say about the value you place on your life?
Choose the correct answer.4.
choose the correct answer. 2.
Bill owes Bob $36. Just before Bill pays him the money, he gives Bob the opportunity to play a dice game to potentially win more money. The rules of this game are as follows: If Bob rolls doubles (probability 1/6), Bill will Bob double ($72). If he misses doubles on pay the first try, he can try again or settle for half the money ($18). If he makes doubles on the second try Bill will again pay-up double ($72), but if Bob misses doubles on the second try Bill will only pay him one-third ($12). Should Bob decide to play the dice game with Bill, or insist that he pay the $36 now? Use a decision tree to support your answer.
Suppose that an individual is just willing to accept a gamble to win or lose $1000 if the probability ofwinning is 0.6. Suppose that the utility gained if the individual wins is 100 utils. How much utility does one lose if one loses the gamble?
. Ayça and Barış are playing a game and following payoff matrix is for the payoffs of Ayça. Answer the questions according to the following payoff matrix. a) What is the probability that the value of the game is 10?
Suppose that a car - rental agency offers insurance for a week that costs $125. A minor fender bender will cost 34000 whereas a major accident might cost $16 comma 000 in repairs. Without the insurance, you would be personally liable for any damages. There are two decision alternatives: take the insurance, or do not take the insurance. You researched insurance industry statistics and found out that the probability of a major accident is 0.04% and that the probability of a fender bender is 0.18%. The expected payoff if you buy the insurance is $125.00. The expected payoff if you do not buy the insurance is $12.52. Develop a utility function for the payoffs associated with this decision for a risk-averse person. Determine the decision that would result using the utilities instead of the payoffs. Based on the expected payoffs, the best decision is to not purchase the insurance. Are these two decisions consistent?
A business owner must decide to build sofas, chairs, or bed mattresses with the materials the company owns. The workers at the factory can build 100 sofas, 150 bed mattresses, or 200 chairs. The cost of manufacturing the sofas is more than chairs but less than the cost of manufacturing the mattresses. The profit margin on the mattresses is potentially highest. However, the mattresses require the most time to make. To make the chairs cost the least but uses the most resources. Taking all of these factors into consideration, the owner has to decide which type of furniture to manufacture. Based on the scenario, the business owner needs to mainly determine which of these? The scarcity of resources for each option The value of manufacturing each option The sales taxes for manufacturing each option The maximum utility from each option
A clothing manutacturer must decide which of two clothing lines to emphasze for the spring season, her usual line or a budget line Her success with each line depends on the sta Budget Line Usual Line Strong Economy 15,000 35,000 In-between Economy 18,000 28,000 Weak Economy 27,000 10,000 Economists believe that there is a 5% chance of a strong economy next year, a 75% chance of a weak economy, and a 20% chance of an in-between economy Use the payoff m O A. Emphasize the usual line O B. Wait and see O C. Emphasize the budget line
To go from Location 1 to Location 2, you can either take a car or take transit. Your utility function is: U= -1Xminutes -5Xdollars +0.13Xcar (i.e. 0.13 is the car constant) Car= 15 minutes and $8 Transit= 40 minutes and $4 What is your probability of taking transit given the conditions above? What is your probability of taking transit if the number of buses on the route were doubled, meaning the headways are halved? Remember to include units.
Problem 7 A casino offers people the chance to play the following game: flip two fair coins. If both come up heads, the gambler wins $1. If both come up tails, the gambler wins $3. If one is heads and one is tails, the gambler gets nothing. The game costs $1.25 to play. Your friend, Richard, who has not taken a probability course and thus doesn't know any better, goes to this casino and plays the game 600 times. Estimate the probability that your friend loses between $132 and $195 over the course of the 600 games. (You need to provide a number instead of an expression involving NA(a,b)).