Policy Graph File Format

Input and output format for policy graphs

About this Page

This page describes the file format for a policy graph file output by the 'pomdp-solve' program (usually with suffix ".pg").

About Policy Graphs

If the solution to an infinite horizon POMDP problem converges, then a finite state controller can be created from the value function's partitioning of the belief space. With this finite state controller, one can execute the optimal policy without needing to track the belief state. To use this first requires knowing which of the policy graph states to start in. This can be achieved by finding the alpha vector with the maximal dot product with the initial starting state. That "best" alpha vector will align with the nodes in the output policy graph, so that determines the starting point in the finite state controller. The node of the policy graph dictates the action to take. After that, the observation received is used to lookup the next node in the polciy graph, and hence the next action to take. This repeats as the way to execute the optimal policy.

Each line of the file represents one node of the policy graph and its contents are:

N A  Z1 Z2 Z3 ...

Here 'N' is a node ID giving the node a unique name, numbered sequentially and lining up sequentially with the value function vectors in the corresponding output '.alpha' file (see ).

The 'A' is the action number defined for this node; it is an integer refering to the the POMDP file actions by its 0-based index number.

These are followed by a list of node IDs, one for each observation. Thus the list will have a length equal to the number of observations in the POMDP. This list specifies the transitions in the policy graph. The n'th number in the list will be the index of the node that follows this one when the observation received is 'n'.