Robust Airborne Collision Avoidance through Dynamic Programm(12)_航空信息_民用航空_通用航空

曝光台注意防骗网曝天猫店富美金盛家居专营店坑蒙拐骗欺诈消费者

To make solving MDPs e.cient, the immediate cost function is typically assumed to depend only on the current state and action. Although the immediate cost function may be probabilistic,
1In other texts, the expected sum of instantaneous costs is often called “cost-to-go.”
this report assumes that the function is deterministic for simplicity. The cost of executing action a from state s is denoted C(s, a).
One of the challenges when solving an MDP is deciding how to represent the policy. If the number of states is .nite, then the mapping from states to actions can be represented using a table. For problems with continuous state variables, as is the case with collision avoidance, it is no longer possible to enumerate every possible state. The .eld of approximate dynamic programming has studied
several
di.erent
approaches
[20],
including
the
use
of
neural
networks
[21]
and
decision
trees
[22].
This
report
focuses
on
a
representation
that
uses
a
grid-based
discretization
of
the
state
space
[23].

The .rst step in computing the optimal policy using a grid-based representation is to choose suitable cut points along the dimensions of the state space. The Cartesian product of the sets of cut points de.nes the vertices of the grid. These vertices correspond to the discrete states in the representation. One limitation of a grid-based method is that the number of discrete states grows exponentially with the number of state variables, making the storage and computation of the policy expensive. Hence, when designing an MDP to model a problem, it is important to include only the relevant state variables to help prevent the discrete state space from becoming too large.
Once a discretization scheme has been chosen, the original continuous transition function T must be rede.ned over the discrete state space. Given a start state and action to be executed, the resulting state distribution from the original transition function may include states that are not part of the discrete state space. Di.erent strategies exist for arranging the resulting state distribution to have support only over the set of discrete states. The approach taken in this report is to sample from the resulting state distribution and apportion probability mass to the discrete states associated with the vertices of the cell enclosing the sample. Discrete states that are closer to
the
sample
are
assigned
more
weight
[12,23].

DP can be used to compute the expected cost from every discrete state when following an optimal K-step horizon policy π. . The optimal expected cost function JK can be used to determine
K the optimal policy. Computing JK is done using an iterative process, starting with initializing the function J0(s) = 0 for all states s. Given the function Jk.1, the function Jk is determined as follows:

Jk(s) = minC(s, a)+T (s, a, s )Jk.1(s ). (1)
a

s
This iteration is continued to the desired horizon. The state-action cost JK (s, a) with a K-step horizon is given by

JK (s, a) = minC(s, a)+T (s, a, s )JK.1(s ). (2)
a

s
An optimal K-step policy satis.es π . (s) = arg min JK (s, a). (3)
K a
In general, there may be multiple actions that satisfy the equality above from a particular state. In this work, it is assumed that ties are broken according to some ordering over the set of available actions, resulting in a unique optimal policy. In the context of collision avoidance, it may make sense to give preference to less extreme advisories if they have the same expected cost as more extreme advisories.
Because the actions are categorical, one cannot simply interpolate the optimal policy directly to determine the optimal action from non-discrete states. Instead, one can interpolate the state-action function and from that determine the optimal action. Storing the optimal state-action function requires |S| × |A| entries, whereas storing the optimal policy requires only |S| entries. Because the number of actions available from a particular state is typically small in the collision avoidance domain, the storage of the state-action table can be compressed. The implementation used for the experiments in this report stores the state-action costs only for valid state-action pairs and uses an index .le for fast lookups.
　
中国航空网 www.aero.cn
航空翻译 www.aviation.cn
本文链接地址：Robust Airborne Collision Avoidance through Dynamic Programm(12)