Skip to content

MoonOutCloudBack/CLARIFY_PbRL

Repository files navigation

CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries

This is the official implementation of CLARIFY (ICML 2025 poster, arxiv, OpenReview).

Requirements

Experiments require MuJoCo. Please follow the instructions in the mujoco-py repo to install.

Then, dependencies can be installed with the following command:

conda env create -f conda_env.yml

Downloading datasets

Please download the datasets from LiRE, and put the dataset in ./data/metaworld_data/<task_name> and ./data/dmcontrol_data/<task_name>.

Run experiments

Train the reward model using CLARIFY, for hammer task with $\epsilon=0.5$:

python train_contrastive_reward.py  --env hammer --gpu <GPU number> --seed <seed>  \
    --max_feedback 1000 --teacher_eps_skip 0.5 --feed_type "c"

Train the reward model using OPRL, for walker-walk task with $\epsilon=0.7$:

python train_contrastive_reward.py  --env walker-walk --gpu <GPU number> --seed <seed>  \
    --max_feedback 200 --teacher_eps_skip 0.7 --feed_type "d"

Train the offline policy based on CLARIFY's reward model, for hammer task with $\epsilon=0.5$::

python scripts/reward_model_mapping.py
python policy_learning/oprl_policy.py --env hammer --gpu <GPU number> --seed <seed>  \
    --teacher_eps_skip 0.5 --feed_type "c" \
    --reward_model_name_mapping "scripts/reward_model_map_q50.json" \

Train the offline policy based on OPRL's reward model, for walker-walk task with $\epsilon=0.7$:

python scripts/reward_model_mapping.py
python policy_learning/oprl_policy.py --env walker-walk --gpu <GPU number> --seed <seed>  \
    --teacher_eps_skip 0.7 --feed_type "d" \
    --reward_model_name_mapping "scripts/reward_model_map_q50.json" \

Acknowledgement

This repo benefits from LiRE, HIM and BPref. Thanks for their wonderful work.

Citation

If you find this project helpful, please consider citing the following paper:

@inproceedings{mu2025clarify,
    title={CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries},
    author={Mu, Ni and Hu, Hao and Hu, Xiao and Yang, Yiqin and XU, Bo and Jia, Qing-Shan},
    booktitle={Forty-second International Conference on Machine Learning},
    year={2025}
}

About

[ICML 2025] CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published