CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries

This is the official implementation of CLARIFY (ICML 2025 poster, arxiv, OpenReview).

Requirements

Experiments require MuJoCo. Please follow the instructions in the mujoco-py repo to install.

Then, dependencies can be installed with the following command:

conda env create -f conda_env.yml

Downloading datasets

Please download the datasets from LiRE, and put the dataset in ./data/metaworld_data/<task_name> and ./data/dmcontrol_data/<task_name>.

Run experiments

Train the reward model using CLARIFY, for hammer task with $\epsilon=0.5$:

python train_contrastive_reward.py  --env hammer --gpu <GPU number> --seed <seed>  \
    --max_feedback 1000 --teacher_eps_skip 0.5 --feed_type "c"

Train the reward model using OPRL, for walker-walk task with $\epsilon=0.7$:

python train_contrastive_reward.py  --env walker-walk --gpu <GPU number> --seed <seed>  \
    --max_feedback 200 --teacher_eps_skip 0.7 --feed_type "d"

Train the offline policy based on CLARIFY's reward model, for hammer task with $\epsilon=0.5$::

python scripts/reward_model_mapping.py
python policy_learning/oprl_policy.py --env hammer --gpu <GPU number> --seed <seed>  \
    --teacher_eps_skip 0.5 --feed_type "c" \
    --reward_model_name_mapping "scripts/reward_model_map_q50.json" \

Train the offline policy based on OPRL's reward model, for walker-walk task with $\epsilon=0.7$:

python scripts/reward_model_mapping.py
python policy_learning/oprl_policy.py --env walker-walk --gpu <GPU number> --seed <seed>  \
    --teacher_eps_skip 0.7 --feed_type "d" \
    --reward_model_name_mapping "scripts/reward_model_map_q50.json" \

Acknowledgement

This repo benefits from LiRE, HIM and BPref. Thanks for their wonderful work.

Citation

If you find this project helpful, please consider citing the following paper:

@inproceedings{mu2025clarify,
    title={CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries},
    author={Mu, Ni and Hu, Hao and Hu, Xiao and Yang, Yiqin and XU, Bo and Jia, Qing-Shan},
    booktitle={Forty-second International Conference on Machine Learning},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
decision_transformer		decision_transformer
human_exp		human_exp
policy_learning		policy_learning
scripts		scripts
.gitignore		.gitignore
README.md		README.md
conda_env.yml		conda_env.yml
env_wrapper.py		env_wrapper.py
reward_model.py		reward_model.py
train_contrastive_reward.py		train_contrastive_reward.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries

Requirements

Downloading datasets

Run experiments

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

MoonOutCloudBack/CLARIFY_PbRL

Folders and files

Latest commit

History

Repository files navigation

CLARIFY: Contrastive Preference Reinforcement Learning for Untangling Ambiguous Queries

Requirements

Downloading datasets

Run experiments

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages