-
Notifications
You must be signed in to change notification settings - Fork 2
RL Training #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
RL Training #7
Conversation
scripts/run_training.sh
Outdated
|
|
||
| # Colocated GRPO training+generation for Qwen3-8B on the SWE-Bench task. | ||
| # Uses 1 node with 8 GPUs. | ||
| # uv run --isolated examples/mini_swe_agent/preprocess_swegym.py --output_dir ~/data/swe_gym_subset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is in SkyRL repo - is it intentional or just a copy paste error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, leftover from a skyrl example
Removed commented instructions and notes related to colocated GRPO training.
| llm=LLM( | ||
| service_id="agent", | ||
| model=litellm_model_name, | ||
| base_url=f"http://localhost:8080/v1/", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| base_url=f"http://localhost:8080/v1/", | |
| base_url=litellm_base_url, |
| messages = [] | ||
|
|
||
| agent = get_default_agent( | ||
| llm=LLM( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also pass litellm_extra_body = {"return_token_ids": true} to avoid retokenization drift in vllm
and then we could also avoid doing tokenization by ourselves
| initial_input_len = 0 | ||
| past_trajectory_len = 0 | ||
| for idx, message in enumerate(messages): | ||
| current_prompt_ids = message["prompt_tokens_ids"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused here: where does this field come from?
vllm returns prompt_token_ids and token_ids if we pass return_token_ids: true as litellm_extra_body, but since you are not doing so, I suppose you need to actually run tokenization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am hardcoding return_token_ids=True in the openhands submodule. prompt_token_ids and token_ids are the tokens that the vllm inference engine sees an generates. So that's why I didn't have to retokenize.
I had to make a custom event processor (also hardcoded in the agent-sdk submodule directory) so I can get the keys by doing this
messages = [msg for msg in messages if msg["kind"] == "TokenEvent"]
will prepare a PR for openhands for that it's more clean. But for now, the code is doing what you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am hardcoding return_token_ids=True in the openhands submodule
Gotcha. You might wanna try litellm_extra_body = {"return_token_ids": true} LLM config - I think it shall just work.
Also shouldn't it be prompt_token_ids instead of prompt_tokens_ids and token_ids instead of response_token_ids? Did you rename them in the agent-sdk submodule (sry I don't know how to view the diff there).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this was a recent update (or atleast new compared to the version of openhands I was using). Will make changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep it was a recent update I made
|
|
||
| trajectory_ids = current_prompt_ids[initial_input_len:] | ||
| past_response_len = len(response_ids_list[idx-1]) if idx > 0 else 0 | ||
| past_response_observation_ids = trajectory_ids[past_trajectory_len+past_response_len:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you test if this works? I was playing around with vllm and it seems their token_ids field includes EOS token at the end, which could make this offset off by one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not error out. Is the EOS suppose to be masked or not? I will need to adjust this accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is not (just) about masking. trajectory_ids won't contain this EOS token in the middle but response_ids does, which means here you may need to change it into
past_response_observation_ids = trajectory_ids[past_trajectory_len+past_response_len-1:]
if the finish reason was "stop".
That being said, I am not 100% sure on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created an issue in vllm for clarification: vllm-project/vllm#28185
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I might be totally wrong. I tested using vllm library directly and found EOS tokens appearing normally in the prompt_token_ids
Maybe litellm was doing something weird, maybe I did something stupid... let's ignore it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I used a wrong chat template during testing. You should be good!
No description provided.