#### DMCA

## Reusing Risk-Aware Stochastic Abstract Policies in Robotic Navigation Learning?

### Citations

1894 |
Markov Decision Processes: Discrete Stochastic Dynamic Programming
- Puterman
- 1994
(Show Context)
Citation Context ...ssible actions a; Ta : S S ! [0; 1] is a conditional probability distribution over S given each state in S;s: S ! [0; 1] is an initial probability distribution; and r : S A ! is a reward function =-=[16]-=-. An MDP starts at a state s 2 S, selected with probabilitys(s ). If the process is in state i at time t and action a is applied, then the next state is j with probability Ta(i; j) and a reward r(i; a... |

254 |
of Statistics: A Concise Course in Statistical Inference. Texts in Statistics
- Wasserman
- 2004
(Show Context)
Citation Context ...self, and adding arcs from sG to each one of the other states, associating each arc from sG to s withs(s). We say the SSP problem is recurrent if each M (for each ) defines a recurrent Markov chain =-=[20]-=-. Recall that a finite Markov chain is recurrent if it is irreducible (any state can reach any state with positive probability); and if a finite Markov chain is recurrent, it is positive recurrent (th... |

158 | Learning Without State-Estimation in Partially Observable Markovian Decision Processes.
- Singh, Jaakkola, et al.
- 1994
(Show Context)
Citation Context ...hastic abstract policies 7 policies are appropriate, because as they are more flexible by offering more than one choice of action per state, they can be arbitrarily better than deterministic policies =-=[18,17]-=-. We define a memoryless stochastic abstract policy as ab : Sab Aab ! [0; 1], with P (j) = ab(; ), 2 Sab, 2 Aab. AbsProb-PI [17] is an algorithm that, given an MDP, uses cumulative discoun... |

128 | Tsitsiklis, “An analysis of stochastic shortest path problems
- Bertsekas, N
- 1991
(Show Context)
Citation Context ...(R(h))] = h P (hj)u(R(h)): (1) Clearly if u(x) = x for some > 0, we return to the usual E[R(h)] = h P (hj)R(h). If instead u(x) = e x (exponential utility), then: E[u(R(h))] = h P (hj)e R h : =-=(2)-=- An optimal policy is a policy that yields the highest expected utility [16]. 4 V. F. Silva, M. L. Koga, F. G. Cozman, and A. H. R. Costa 2.2 Stochastic Shortest Path Problem A special case of MDPs ... |

107 | Memoryless policies: Theoretical limitations and practical results.
- Littman
- 1994
(Show Context)
Citation Context ...lving MDPs with AbsProb-PI The task of the agent in an MDP is to find a policy. In a concrete fully-observable MDP, the set of deterministic memoryless policies, : S ! A, contains an optimal policy =-=[16,10]-=-. When considering abstract states and actions, stochastic Reusing risk-aware stochastic abstract policies 7 policies are appropriate, because as they are more flexible by offering more than one choic... |

69 |
Risk-sensitive Markov decision processes
- Howard, Matheson
- 1972
(Show Context)
Citation Context ...isk, when learning features of a new target task. Introducing explicit preferences over risk requires us to measure subjective attitudes towards risk and to evaluate policies as risky or conservative =-=[6,15]-=-. A decision maker can have three distinct attitudes towards risk: averse, neutral or prone [6]. Relatively few activity in AI and Robotics addresses risk-awareness in policy construction, despite the... |

66 | Transfer learning via inter-task mappings for temporal difference learning.
- Taylor, Stone, et al.
- 2007
(Show Context)
Citation Context ... and 305395/2010-6). 2 V. F. Silva, M. L. Koga, F. G. Cozman, and A. H. R. Costa Substantial previous work can be found on transfer learning methods. One must decide what to transfer: value functions =-=[12,19]-=-, features extracted from the value functions [1,8], heuristics and policies [3,5]. We focus on policy transfer; all transferred information is encoded by policies. An advantage of policy-based transf... |

61 | Towards a unified theory of state abstraction for MDPs. Paper presented at the 9th international symposium on artificial intelligence and mathematics
- Li, Walsh
- 2006
(Show Context)
Citation Context ...tates/actions of the target task; this amounts to less information than required by methods that transfer value functions [5]. Policy-search is in fact quite appropriate when coupled with abstraction =-=[9]-=-. Our approach is to transfer learning by exploiting abstract policies encoded through relational representations that compactly capture domains in terms of relations and objects [14]. Each abstract s... |

47 | Value-Function-Based Transfer for Reinforcement Learning Using Structure Mapping.
- Liu, Stone
- 2006
(Show Context)
Citation Context ... and 305395/2010-6). 2 V. F. Silva, M. L. Koga, F. G. Cozman, and A. H. R. Costa Substantial previous work can be found on transfer learning methods. One must decide what to transfer: value functions =-=[12,19]-=-, features extracted from the value functions [1,8], heuristics and policies [3,5]. We focus on policy transfer; all transferred information is encoded by policies. An advantage of policy-based transf... |

28 | General game learning using knowledge transfer
- Banerjee, Stone
- 2007
(Show Context)
Citation Context ... Cozman, and A. H. R. Costa Substantial previous work can be found on transfer learning methods. One must decide what to transfer: value functions [12,19], features extracted from the value functions =-=[1,8]-=-, heuristics and policies [3,5]. We focus on policy transfer; all transferred information is encoded by policies. An advantage of policy-based transfer is that it requires only a mapping between the s... |

28 | Percentile Optimization for Markov Decision Processes with Parameter Uncertainty.
- Delage, Mannor
- 2010
(Show Context)
Citation Context ...ctical importance of this issue, possibly due the difficulty in optimizing risk-aware measures. Liu and Koenig [11] consider probabilistic planning with nonlinear utility functions. Delage and Mannor =-=[4]-=- consider a relaxed version of minimax by defining probabilistic constraints on minimal performance. Mannor and Tsitsiklis [13] consider a trade-off between expected and variance of accumulated reward... |

16 | Tsitsiklis. Mean-Variance Optimization in Markov Decision Processes.
- Mannor, John
- 2011
(Show Context)
Citation Context ...robabilistic planning with nonlinear utility functions. Delage and Mannor [4] consider a relaxed version of minimax by defining probabilistic constraints on minimal performance. Mannor and Tsitsiklis =-=[13]-=- consider a trade-off between expected and variance of accumulated rewards. In all of these approaches, optimal policies are non-stationary in general. We focus on robotic navigation problems modeled ... |

14 | Probabilistic policy reuse for inter-task transfer learning. Robotics and Autonomous Systems,
- Fernandez, Garcia, et al.
- 2010
(Show Context)
Citation Context ...stantial previous work can be found on transfer learning methods. One must decide what to transfer: value functions [12,19], features extracted from the value functions [1,8], heuristics and policies =-=[3,5]-=-. We focus on policy transfer; all transferred information is encoded by policies. An advantage of policy-based transfer is that it requires only a mapping between the states/actions of the source tas... |

13 | Accelerating autonomous learning by using a heuristic selection of actions
- Bianchi, Ribeiro, et al.
(Show Context)
Citation Context ...stantial previous work can be found on transfer learning methods. One must decide what to transfer: value functions [12,19], features extracted from the value functions [1,8], heuristics and policies =-=[3,5]-=-. We focus on policy transfer; all transferred information is encoded by policies. An advantage of policy-based transfer is that it requires only a mapping between the states/actions of the source tas... |

9 | Transfer in reinforcement learning via shared features
- Konidaris, Scheidwasser, et al.
- 2012
(Show Context)
Citation Context ... Cozman, and A. H. R. Costa Substantial previous work can be found on transfer learning methods. One must decide what to transfer: value functions [12,19], features extracted from the value functions =-=[1,8]-=-, heuristics and policies [3,5]. We focus on policy transfer; all transferred information is encoded by policies. An advantage of policy-based transfer is that it requires only a mapping between the s... |

4 | S.: Probabilistic planning with nonlinear utility functions
- Liu, Koenig
- 2006
(Show Context)
Citation Context ...ivity in AI and Robotics addresses risk-awareness in policy construction, despite the practical importance of this issue, possibly due the difficulty in optimizing risk-aware measures. Liu and Koenig =-=[11]-=- consider probabilistic planning with nonlinear utility functions. Delage and Mannor [4] consider a relaxed version of minimax by defining probabilistic constraints on minimal performance. Mannor and ... |

2 |
Simultaneous Abstract and Concrete Reinforcement Learning
- Matos, Bergamo, et al.
- 2011
(Show Context)
Citation Context ... with abstraction [9]. Our approach is to transfer learning by exploiting abstract policies encoded through relational representations that compactly capture domains in terms of relations and objects =-=[14]-=-. Each abstract state aggregates a set of concrete states; given a single abstract state, one cannot know which concrete state obtains, but each concrete state is mapped to a unique abstract state. We... |

2 |
Finding memoryless probabilistic relational policies for inter-task reuse
- Silva, Pereira, et al.
- 2012
(Show Context)
Citation Context ...nnot know which concrete state obtains, but each concrete state is mapped to a unique abstract state. We consider abstract policies that are memoryless and stochastic. We use the AbsProb-PI algorithm =-=[17]-=- to construct such abstract policies in such a way that abstraction captures the main features of the source task. A memoryless policy chooses actions only according to the current observation of the ... |

2 |
discount? the rationale of discounting in optimisation problems
- Whittle
- 1996
(Show Context)
Citation Context ...other interpretations ofs; for instance,smay reflect the fact that the agent focuses more intensely in close rewards; another interpretation is thatsis the probability of surviving one more time step =-=[21]-=-. Obviously, if the original problem already hass= 1, then the agent already has a risk-neutral attitude. But ifs< 1 in the original problem, we can say that the agent has a risk-prone attitude. Now s... |

1 |
Speeding-up reinforcement learning tasks through abstraction and transfer learning
- Koga, Silva, et al.
- 2013
(Show Context)
Citation Context ...et S of a relational MDP is defined as the set of possible ground conjunctions over PS and C, and the action set A is the set of possible ground atoms over PA and C (for more details, please refer to =-=[14,7]-=-). When S and A in a relational MDP are ground sets, we call this a concrete MDP. Note that a relational representation enables us to aggregate states and actions by using variables instead of constan... |

1 |
V.F.: Shortest stochastic path with risk sensitive evaluation
- Minami, Silva
(Show Context)
Citation Context ...isk, when learning features of a new target task. Introducing explicit preferences over risk requires us to measure subjective attitudes towards risk and to evaluate policies as risky or conservative =-=[6,15]-=-. A decision maker can have three distinct attitudes towards risk: averse, neutral or prone [6]. Relatively few activity in AI and Robotics addresses risk-awareness in policy construction, despite the... |