When Are RL Hyperparameters Benign? Disentangling Objective and Data Quality

TL;DR: AUsing offline goal-conditioned RL to isolate the effects of exploration and objective design, this study shows that hyperparameter landscapes can be surprisingly “benign.” Quasimetric learning (QRL) stays stable with just one knob (learning rate) that matters, while bootstrapped TD-learning (HIQL) shows drifting sharp optima that force practitioners to retune as data quality shifts.