This repository is the reference implementation of the paper "Multi-agent active perception with prediction rewards".
Some further information is available in this blog post.
If you find the work useful, please cite it as: Mikko Lauri and Frans A. Oliehoek. "Multi-agent active perception with prediction rewards", in Advances in Neural Information Processing Systems 33, 2020.
BiBTeX entry:
@inproceedings{lauri2020multiagent,
author = {Mikko Lauri and Frans A. Oliehoek},
title = {Multi-agent active perception with prediction rewards},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2020}
}
The code consists of a C++ backend for solving Dec-POMDPs and a Python frontend that implements the APAS algorithm presented in the paper. Follow the steps below to install necessary requirements and compile the planner.
Install the required system libraries on a Ubuntu system by:
sudo apt-get install libboost-all-dev libeigen3-dev
Additionally, you need a C++ compiler that supports C++17, and CMake version 3.0 or later.
You can compile the C++ backend by executing:
cd solver && mkdir build && cd build
cmake ..
make
Note: this will download and compile MADP toolbox version 0.4.1 which usually takes quite a long time.
If you already have MADP installed, you can save a lot of time by specifying where to find it: cmake .. -DMADPPATH=/path/to/your/madp/installation.
Use Python3.
Only numpy is required.
You probably have it, or you can run:
pip install -r requirements.txt
You can solve the MAV domain with horizon 5 using the experimental settings from the paper by running:
python apas.py --horizon 5 `pwd`/problems/mav.dpomdp --verbose
We toggled verbose output to get some printouts in the terminal.
Results will be stored in the subfolder results. There you will find the following contents:
apas_policy.outindicates where to find the best individual policies found by APAS for each agentapas_value.npya file that can be loaded usingnp.loadcontainng the value of the best policy found by APASbeliefs_XYZ.txttext files containing on each row a belief state used as linearization point for the final reward at iterationXYZof APASpolicy_values.npyloadable numpy file with the value of the policy foudn at each iteration of APAS- Subfolders
pgi_XYcontaining the best individual policies and all individual policies considered by policy graph improvement for each agent at iterationXYof APAS. The files are in.dotformat and can be visualized usingxdot.
All values stored are exact. The approximation by a piecewise linear function is not used when evaluating the policies, it is only used when planning.
The archives linked below contain the raw data corresponding to the results presented in the paper and supplementary material. The format is similar to that described above.
The software uses the parser from the MADP toolbox to read problems formatted as .dpomdp files.
You can specify your own problems in this format.
See this example problem for a description of the format.
However, note that the .dpomdp format does not allow specifying rewards that are not linear in the belief state (i.e., functions of the hidden state and actions).
The planner software implicitly assumes you wish to solve a Dec-rhoPOMDP with negative entropy as the final reward.
If you want to use a different final reward, modify DecPOMDPConversions.hpp.
You will need to add functionality for getting the linearizing hyperplanes of your (convex and bounded) final reward function; see LinearizedNegEntropy.hpp for an example.
The conversion from Definition 3 in the paper is implemented in DecPOMDPConversions.hpp.
The main part of the Dec-POMDP solver is implemented in BackwardPass.hpp.
We use particle-based PGI, however modified with UCB1 applied to optimize node configurations.
Pull requests are welcome, although there are no plans for active further development as of now.
Licensed under the MIT license - see LICENSE for details.