-
Notifications
You must be signed in to change notification settings - Fork 418
Description
I have created a custom Parallel API PettingZoo environment with MultiDiscrete action spaces. The env.action_spec() function succeeds.
I am using the Multi-Agent PPO tutorial of TorchRL, but I’m struggling to understand how to modify the architecture so it supports MultiDiscrete action spaces. Specifically, I’d like to know how to correctly adapt the MultiAgentMLP, TensorDictModule, and ProbabilisticActor so that the policy network outputs a MultiDiscrete (or equivalently, MultiCategorical) action distribution for each agent.
Should I create number of ProbabilisticActor modules as the length of the MultiDiscrete action space? In the case where a single ProbabilisticActor module is used, which distribution class should replace Categorical to support a MultiDiscrete action space? Is there an existing script or tutorial in TorchRL that demonstrates how to handle MultiDiscrete action spaces (or MultiCategorical distributions) in a multi-agent setup?