Repository: https://github.com/Alee08/maprl
Paper (ECAI 2025): https://doi.org/10.3233/FAIA251253
Companion library (required): https://github.com/Alee08/multiagent-rl-rm
Note
MAPRL explicitly depends onmultiagent-rl-rm.
This dependency is declared insetup.py, sopip install -e .will automatically pull it from PyPI.
MAPRL fuses partial-order planning (POP) with Reward Machines (RMs) to tackle strongly-coupled cooperative MARL.
The planner produces a POP, the POP is compiled into one RM per agent, and each agent runs an off-the-shelf learner (e.g., Q-Learning) guided by its RM state while public joint actions fire when their preconditions are satisfied.
This repository provides the MAPRL library that performs POP→RM synthesis and exposes utilities to coordinate distributed learners via RMs, together with grid-based environments and thin wrappers that expose RM state to learners.
Pipeline of MAPRL. The off-line planning phase yields a partial-order plan, from which reward machines are extracted to guide on-line decentralized learning.
- Transforms a multi-agent planning problem (MAP) into a partial-order plan (POP), then into one Reward Machine (RM) per agent.
- Explicitly models concurrency and synchronization points in the RM, so joint actions are triggered automatically when their preconditions hold.
- Supports agents with private (stochastic) and public (deterministic) actions in a cooperative MARL setting.
- Dramatically reduces sample complexity on strongly-coupled tasks by combining planning and distributed RL.
From left to right: Office World, Concurrent Maze, Temple Quest.
The “planning” side of MAPRL is implemented using the multi-agent extension of Unified Planning:
-
For each domain (maze, office world, temple quest) we define a
MultiAgentProblemwith:- high-level public actions (e.g.,
move,start_open_employer_door,push_button), - shared environment fluents (e.g., room connectivity, door status, buttons),
- and a joint goal (e.g., all agents reaching a target room while the system is “free”).
- high-level public actions (e.g.,
-
This planning model is exported to MA-PDDL and solved with FMAP, which returns a partial-order plan (POP) capturing causal and concurrency constraints between joint actions.
-
MAPRL then compiles the POP into one Reward Machine per agent:
- each RM encodes that agent’s sequence of high-level responsibilities (doors to open, buttons to press, rooms to reach),
- and is later used to guide QRM during learning.
The scripts that build and solve the planning models, and that turn POPs into RMs, live under:
multiagentplanning_rl/environments/integration_planning_and_learning/planning_utils/
Most users will just run the provided experiments (which load pre-generated POP→RM artifacts), but the full planning pipeline is available if you want to regenerate or modify the tasks.
multiagentplanning_rl/multi_agent/reward_machine.py— RM semantics shared by all environments.multiagentplanning_rl/environments/integration_planning_and_learning/ma_environment.py— base environment that wires POP artifacts, agents, and RMs.multiagentplanning_rl/multi_agent/wrappers/rm_environment_wrapper.py— observation wrapper adding RM state to the agents.multiagentplanning_rl/render/render.py— optional Pygame renderer for grid maps, agents, and interactable objects.multiagentplanning_rl/utils/— helper utilities for grounding, RM simplification, state evaluation, and sequential simulation.
multiagentplanning_rl/environments/integration_planning_and_learning/— shared environment utilities, per-experiment configs, and the Unified Planning + FMAP pipeline underplanning_utils/(MAP → POP → per-agent RMs).concurrent_maze/maze_main.py— entry point for the maze tasks.concurrent_office_world/office_world_main.py— entry point for the office world tasks.concurrent_temple_quest/temple_quest_main.py— entry point for the temple quest tasks.requirements.txt— dependency pins; the companionmultiagent-rl-rmlibrary is auto-installed viasetup.py.
- Python ≥ 3.10
multiagent-rl-rm(installed automatically bypip install -e .)- Standard scientific Python stack (NumPy, etc.)
- Optional: Weights & Biases for logging (
--wandb_enabled)
MAPRL is not yet on PyPI. Install it in developer mode.
# Clone MAPRL
git clone https://github.com/Alee08/maprl
cd maprl
pip install -e .
The command above will automatically install the dependency multiagent-rl-rm from PyPI (as specified in setup.py).
You can run the experiments using the command-line interface.
All three environments share the same CLI contract:
--num_episodes(default20000) — length of the training loop.--wandb_enabled— opt-in Weights & Biases logging (off by default).--experiment(exp1,exp2, orexp3) — selects the task variant for the chosen domain.
Each domain (maze, office world, and temple quest) defines three predefined experiments: exp1, exp2, and exp3 (see the paper for the exact setup of each variant).
Run each experiment after pip install -e ..
The most direct approach is to cd into the folder that contains the entry point and execute the script (e.g., python office_world_main.py).
If you prefer, you can stay at the repo root and call the module with python -m ..., which resolves imports the same way.
cd multiagentplanning_rl/environments/integration_planning_and_learning/concurrent_office_world
python office_world_main.py --num_episodes 20000 --experiment exp1 --wandb_enabledcd multiagentplanning_rl/environments/integration_planning_and_learning/concurrent_maze
python maze_main.py --num_episodes 20000 --experiment exp1 --wandb_enabledcd multiagentplanning_rl/environments/integration_planning_and_learning/concurrent_temple_quest
python temple_quest_main.py --num_episodes 20000 --experiment exp1 --wandb_enabledEach entry point runs its predefined scenario without extra flags; drop --wandb_enabled if you do not want to send metrics to W&B.
The environment is defined using a grid-like structure with various objects such as walls, plants, and goals. The map_1 string in office_world_main.py represents the grid layout, where:
- 🟩 - Empty space
- ⛔ - Wall
- 🚪 - Door
- 🥤 - Coffee station
- 🪴 - Plant
- ✉️ - Letter
A,B,C,Oetc. - Indicate specific goal locations
The positions and connections of the grid are parsed using custom parsing functions, such as parse_office_world_.
Below is the emoji-based map used in office_world_main.py (parsed into rooms, doors, walls, and goal locations):
map_1 = """
B 🟩 🟩 ⛔ 🟩 🥤 🟩 ⛔ 🟩 🟩 🟩 ⛔ 🟩 🟩 🟩
🟩 🟩 🟩 🚪 🟩 🟩 🟩 🚪 🟩 🪴 🟩 🚪 🟩 🟩 🟩
🟩 🟩 🟩 ⛔ 🟩 🟩 🟩 ⛔ 🟩 🟩 🟩 ⛔ O 🪴 🟩
⛔ 🚪 ⛔ ⛔ ⛔ 🚪 ⛔ ⛔ ⛔ 🚪 ⛔ ⛔ ⛔ ⛔ 🚪
🟩 🟩 🟩 ⛔ ✉️ 🪴 🟩 🚪 🟩 🟩 🟩 ⛔ 🟩 🟩 🟩
🟩 🪴 🟩 🚪 🟩 🪴 🟩 ⛔ 🪴 🪴 🟩 🚪 🟩 🪴 🪴
🟩 🟩 🟩 ⛔ 🟩 🟩 🟩 🚪 🟩 🟩 🟩 ⛔ 🟩 🟩 🟩
⛔ 🚪 ⛔ ⛔ ⛔ 🚪 ⛔ ⛔ ⛔ 🚪 ⛔ ⛔ ⛔ ⛔ 🚪
🟩 🟩 🟩 ⛔ 🟩 🟩 🟩 ⛔ 🟩 🟩 🟩 ⛔ 🟩 🪴 🟩
🟩 🟩 🟩 🚪 🟩 🪴 🟩 🚪 🟩 🪴 🟩 🚪 🟩 🪴 🟩
🟩 🟩 🟩 ⛔ 🟩 🪴 🟩 ⛔ 🥤 🟩 🟩 ⛔ 🟩 🟩 🟩
🚪 ⛔ ⛔ ⛔ 🚪 ⛔ ⛔ ⛔ ⛔ 🚪 ⛔ ⛔ ⛔ 🚪 ⛔
🟩 🟩 🟩 🚪 🟩 🟩 🟩 ⛔ 🟩 🟩 🟩 ⛔ 🟩 🟩 🟩
🟩 A 🟩 ⛔ 🪴 🪴 🟩 🚪 🟩 🪴 🟩 🚪 🟩 🪴 🟩
🟩 🟩 🟩 ⛔ 🟩 🟩 🟩 ⛔ 🟩 🟩 🟩 ⛔ 🟩 🟩 C
"""The agents can perform various actions, including:
up,down,left,right– Move between rooms.cross_door_up,cross_door_down,cross_door_left,cross_door_right- Cross the employee's door.manager_door_up,manager_door_down,manager_door_left,manager_door_right- Cross the manager's door.cell_up,cell_down,cell_left,cell_right- Move within cells of rooms.wait- Stay in the current position.
You can also use MAPRL as a Python library to build your own experiments:
from multiagentplanning_rl.environments.integration_planning_and_learning.ma_environment import MAP_RL_Env
from multiagentplanning_rl.multi_agent.wrappers.rm_environment_wrapper import RMEnvironmentWrapper
env = MAP_RL_Env(width=4, height=4, walls=..., plant=..., cell_size=3)
rm_env = RMEnvironmentWrapper(env, agents=[...])
obs, infos = rm_env.reset()
for _ in range(100):
actions = {ag.name: ag.sample_action(obs[ag.name]) for ag in rm_env.agents}
obs, rewards, done, trunc, infos = rm_env.step(actions)
if all(done.values()):
breakIf you use this code or ideas from the paper, please cite:
Alessandro Trapasso and Anders Jonsson.
Concurrent Multiagent Reinforcement Learning with Reward Machines.
ECAI 2025, Frontiers in Artificial Intelligence and Applications.
@inproceedings{Trapasso2025MAPRL,
author = {Trapasso, Alessandro and Jonsson, Anders},
title = {Concurrent Multiagent Reinforcement Learning with Reward Machines},
booktitle = {{ECAI} 2025 - 28th European Conference on Artificial Intelligence},
series = {Frontiers in Artificial Intelligence and Applications},
volume = {413},
pages = {3735--3742},
publisher = {IOS Press},
year = {2025},
doi = {10.3233/FAIA251253},
}MAPRL is released under the Apache 2.0 License.
See the LICENSE file for details.



