Skip to content

Actor loss function #4

@SimonBurmer

Description

@SimonBurmer

Hi Coac, i really like your BicNet implementation! My goal is to run your BicNet implementation on an environment where every agent gets -1 reward for each time step it needs to finish the env. But there is a problem with your actor loss implementation, because the loss of the actor is defined as the prediction of the critic, the rewards needs to converges to zero if the agents performs perfect, isn't it?

loss_actor = -self.critic(state_batches, clear_action_batches).mean()

Can you explain to me why you implemented it this way? Also, is there a possibility that the reward doesn't converges to 0 when the Agents performs good (linke in the environment i mentioned above)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions