VLA Simulation Benchmarks · Task & Robustness Atlases

The Benchmark
Atlas Index

Jie Wang · everloom-129.github.io · GRASP Lab, UPenn  ·  Co-authored with Claude Code  ·  2026.6.1

One front door to every dashboard in this collection — interactive, single-file, blueprint-style atlases of the task suites and leaderboards used to evaluate Vision-Language-Action models. Grouped from capability (what models do in-distribution) through robustness and memory to the sim ↔ real reality gap. Click any card to open its atlas in a new tab.

dashboards

tiers

20+

benchmarks