POMDP File Format

How to encode a POMDP problem for pomdp-solve

About this Format

This page describes the file format for encoding a POMDP problem to be used as an input file for the 'pomdp-solve' program. Other's have since adopted this same format for their code, so this file specification tends to be more widely applicable. Examples of this format can be seen on the POMDP Example Domains Page.

POMDP File Grammar

For a more detailed and formal specification of the syntax of this file format see the POMDP File Grammar Page.

General Requirements

  • All floating point number must be specified with at least one digit before and one digit after the decimal point.
  • Everything from a '#' symbol to the end-of-line is treated as a comment. They can appear anywhere in the file.

Preamble

The following 5 lines must appear at the beginning of the file. They may appear in any order as long as they preceed all specifications of transition probabilities, observation probabilities and rewards.

discount: %f
values: [ reward, cost ]
states: [ %d, <list-of-states> ]
actions: [ %d, <list-of-actions> ]
observations: [ %d, <list-of-observations> ]

The definition of states, actions and/or observations can be either a number indicating how many there are or it can be a list of strings, one for each entry. These mnemonics cannot begin with a digit. For instance, both:

actions: 4
actions: north south east west

will result in 4 actions being defined. The only difference is that, in the latter, the actions can then be referenced in this file by the mnemonic name. Even when mnemonic names are used, later references can use a number as well, though it must correspond to the positional numbering starting with 0 in the list of strings. The numbers are assigned consecutively from left to right in the listing starting with zero.

When listing states, actions or observations one or more whitespace characters are the delimiters (space, tab or newline). When a number is given instead of an enumeration, the individual elements will be referred to by consecutive integers starting at 0.

Starting Belief State (optional)

After the preamble, there is the optional specification of the starting state. (Note that this is ignored for some exact solution algorithms.) There are a number of different formats for the starting state. You can either:

  • enumerate the probabilities for each state,
  • specify a single starting state,
  • give a uniform distribution over states, or
  • give a uniform distribution over a subset of states.
For the last one, you can either specific a list of states too be included, or a list of states to be excluded. Examples of this are:

start: 0.3 0.1 0.0 0.2 0.5

start: uniform

start: first-state

start: 5

start include: first-state third state

start include: 1 3

start exclude: fifth-state seventh-state

State Transition Probabilities

After the initial five lines and optional starting state, the speciifications of transition probabilities, observation probabilities and rewards appear. These specifications may appear in any order and can be intermixed. Any probabilities or rewards not specified in the file are assumed to be zero.

You may also specify a particular probability or reward more than once. The definition that appears last in the file is the one that will take affect. This is convenient for specifying exceptions to a more general specification.

To specify an single, individual transition probability:

T: <action> : <start-state> : <end-state> %f

Anywhere an action, state or observation can appear in this format, you can also put the wildcard character '*' which means that this is true for all possible entries that could appear here. For example:

T: 5 : * : 0 1.0

is interpreted as action 5 always moving the system state to state 0, no matter what the starting state was (i.e., for all possible starting states.)

To specify a single row of a particular action's transition matrix:

T: <action> : <start-state>
%f %f ... %f

Where there is exactly one probability entry for each possible next state. This allows defining the specific transition probabilities for a particular starting state only. Instead of a list of probabilities the mnemonic word 'uniform' may appear. In this case, each transition for each next state will be assigned the probability 1/#states. Again, an asterick in either the action or start-state position will indicate all possible entries that could appear in that position.

To specify an entire transition matrix for a particular action:

T: <action>
%f %f ... %f
%f %f ... %f
...
%f %f ... %f

Where each row corresponds to one of the start states and each column specifies one of the ending states. Each entry must be separated from the next with one or more white-space characters. The state numbers go from left to right for the ending states and top to bottom for the starting states. The new-lines are just for formatting convenience and do not affect final matrix results. The only restriction is there must be NxN values specified where 'N' is the number of states.

In addition, there are a few mnemonic conventions that can be used in place of the full, explicit matrix:

  • identity
  • uniform

Note that uniform means that each row of the transition matrix will be set to a uniform distribution. The identity mnemonic will result in a transition matrix that leaves the underlying state unchanged for all possible starting states (i.e., the identity matrix).

Observation Probabilities

The observational probabilities are specified in a manner similiar to the transition probabilities. To specify individual observation probabilities:

O : <action> : <end-state> : <observation> %f

The asterick wildcard is allowed in any of the positions.

To specify a row of a particular action's observation probability matrix:

O : <action> : <end-state>
%f %f ... %f

This specifies a probability of observing each possible observation for a particular action and ending state. The mnemonic short-cut uniform may also appear in this place and would encode that the observation is yielding no information about the underlying state.

To specify an entire observation probability matrix for an action:

O: <action>
%f %f ... %f
%f %f ... %f
...
%f %f ... %f

The format is similiar to the transition matrices except the number of entries must be "N x O" where "N" is the number of states and "O" is the number of observations. Here too the uniform mnemonic can be substituted for an enire matrix. In this case it will assign each entry of each row the probability 1/#observations.

Immediate Rewards

To specify an individual reward:

R: <action> : <start-state> : <end-state> : <observation> %f

For any of the entries, an asterick for either the state, action, or observation indicates a wildcard that will be expanded to all existing entities.

There are two other forms to specify rewards:

R: <action> : <start-state> : <end-state>
%f %f ... %f

This specifies a particular row of a reward matrix for a particular action, start state and end state where the reward is independent of the observation.

The last reward specification form is

R: <action> : <start-state>
%f %f ... %f
%f %f ... %f
...
%f %f ... %f

which lets you specify an entire reward matrix for a particular action and start state combination. This can be used when the reward only depends on the starting state and is independent of the ending state and observation.