TL;DR: New toy benchmarks enable better study of RL agents performance and allows us to compare against ground truth optimal policies.