-
Notifications
You must be signed in to change notification settings - Fork 6.9k
[rllib] Improve test_single_agent_env_runner to prevent flaky tests
#58397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[rllib] Improve test_single_agent_env_runner to prevent flaky tests
#58397
Conversation
Signed-off-by: Mark Towers <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request improves the tests for SingleAgentEnvRunner.sample to prevent flakiness and updates the documentation to reflect the correct behavior. The changes are generally good and address the stated goals. I've identified a couple of areas in the tests where the assertions could be more precise to better validate the functionality. My review includes suggestions to tighten these assertions.
Signed-off-by: Mark Towers <[email protected]>
test_single_agent_env_runner to prevent flaky teststest_single_agent_env_runner to prevent flaky tests
Signed-off-by: Mark Towers <[email protected]>
simonsays1980
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for the refinement @pseudo-rnd-thoughts !
|
|
||
| # Sample n timesteps. | ||
| if num_timesteps is not None: | ||
| assert num_timesteps >= 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
| num_timesteps=10, num_episodes=10, random_actions=True | ||
| ), | ||
| ) | ||
| # Verify that an error is raised if a negative number is used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
| env_runner.stop() | ||
|
|
||
| def test_env_context(self): | ||
| """Tests, whether SingleAgentEnvRunner can pass kwargs to the environments correctly.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks for adding this. This was missing but is very important.
…_env` (#58410) ## Description Allow users to use environments that are already vectorized for `SingleAgentEnvRunner` With `gymnasium.make_vec`, users have the option to either use the `SyncVectorEnv` to vectorize a base environment or to directly create a vector environment using the `vectorize_mode: gymnasium.VectorizeMode`. This PR utilises the `env_runners(gym_env_vectorize_mode=...)` argument to support `VectorizeMode.VECTOR_ENTRY_POINT` ``` import gymnasium as gym config = ... config.env_runners( gym_env_vectorize_mode=gym.VectorizeMode.VECTOR_ENTRY_POINT, ) ``` An important change related to this PR is that the values accepted for the vectorize mode is either the enum (`VectorizedMode.ASYNC`, etc) or the enum values (`"async"`, etc) as before it was the string version was the enum name (`"ASYNC"`) rather than the enum value itself. ## Related issues Completion of #57643, ## Additional information #58397 must be merged first We should apply a similar change to the `MultiAgentEnvRunner.make_env` --------- Signed-off-by: Mark Towers <[email protected]> Co-authored-by: Mark Towers <[email protected]>
Description
In improving the
SingleEnvRunner.make_env, I found that some of the tests could be flaky.This PR improves the testing, in particular, to
sampleto ensure that the tests don't fail occasionally and the documentation to reflect this.The primary flaky problem I found is that
sample(num_timesteps=X)will not always return a total ofXtimesteps, rather at least X timesteps up to the number of environments more.I'm updated the documentation to clarify this for users.
In addition, I've added tests for when neither the number of timesteps or episodes are given and for the
force_resetargument