Abstract

This paper presents a comprehensive survey of deep learning approaches for dexterous robotic grasping, emphasizing recent progress enabled by multi-modal models and data-driven techniques. These developments have enabled the generation and execution of stable, context-aware grasps that can be conditioned on natural language, generalize across robot embodiments, and perform effectively in real-world settings. We organize our survey into three parts: (1) Datasets, the foundation for data-driven approaches, covering large-scale efforts that support learning-based grasping; (2) Grasp Synthesis, including diverse representation methods, generative modeling, and optimization-based techniques; and (3) Grasp Execution, encompassing reinforcement learning, imitation learning, heuristic control, and hybrid frameworks that translate grasps into executable actions. We also examine existing benchmarks and metrics for evaluating grasp plausibility, stability, and task alignment. We identify persistent challenges that bottleneck progress and discuss promising future directions to guide researchers toward building more general-purpose, robust dexterous manipulation systems.

Graph of Citations

Grasp Datasets

A summary of key dexterous grasping datasets, highlighting their source, scale, and unique features.

Dataset (Year) Type Hand #Objects #Grasps/Data Modalities Eval. Annotations Unique Features
DexGraspNet (2022) Sim ShadowHand 5,355 1.32M grasps None (poses only) Force-closure (optimized) Very large simulated dataset; validated via an Isaac Gym checker
DexGrasp Anything (2024) Hybrid (Real+Sim) ShadowHand (+ human) 15,698 3.40M grasps Meshes, 6-DOF poses Force closure, penetration Largest-to-date dexterous dataset; curated and physics-expanded
RealDex (2024) Real ShadowHand 52 ~59K grasps RGB-D (multi-view) Human annotation (quality) Teleoperated dexterity; multi-view capture; human-like patterns
MultiDex (2023) Sim ShadowHand 58 16K grasps None Optimized grasps Small-scale high-quality set for dexterous pre-training
UniDexGrasp (2023) Sim ShadowHand 5,519 1.12M grasps None Optimized grasps Million-scale dataset; 200+ grasps per object
GRAB (2020) Real Human (MANO retarget) 51 1.64M grasps RGB-D (multi-view) Human grasp labels Real human grasps; learn hand–object interactions
HO3D (2021) Real Human (MANO) 10 77K frames RGB-D (hand+object) 6D hand+object poses Egocentric videos; synchronized 6D pose annotations
DexYCB (2021) Real Human (MANO) 20 582K frames RGB-D 6D hand+object poses Multi-camera capture; dense annotations
Visual overview of key dexterous-grasp datasets
Figure – Dataset coverage at a glance.

Grasp Synthesis

Methods for generating grasp candidates from object and task information.

Grasp Representation

Contact Maps

  • Use heatmaps to show contact regions.
  • Guides subsequent grasp optimization.

Hand Parameters

  • Directly predict pose and joint angles.
  • Reduces output space for faster training.

Geometric

  • Embeds object geometry into the model.
  • Uses SDFs or point cloud distances.

Grasp Execution Paradigms

Grasp execution strategies fall into two main families: classical methods that rely on explicit models and rules, and learning-based methods that acquire skills from data. Click a button to compare them on the interactive chart.

What makes a good Grasp? Metrics and Evaluation

Commonly‑used metrics for stability, accuracy, task‑alignment, and diversity in dexterous grasping.

Category Metric Description (incl. formula) Goal
Stability & Plausibility Simulation Success Rate % of grasps that stay stable in a physics engine under gravity & disturbances.
Q1 / Epsilon Quality Radius of the largest ball in the grasp‑wrench space: \[ Q_1 = \max\bigl\{r\;|\;\{w: \|w\|\le r\}\subset W\bigr\} \]
Penetration Vol./Dist. Overlap between hand & object meshes. \[ PD(H,O)=\max_{x\in H\cap O} d(x),\quad V_{\text{int}}=\text{Vol}(H\cap O) \]
Dataset Accuracy Chamfer Distance (CD) Point‑cloud distance \[ \tfrac1n\sum_{x\in X}\!\min_{y\in Y}\|x-y\|^2 + \tfrac1m\sum_{y\in Y}\!\min_{x\in X}\|y-x\|^2 \]
MPJPE / MRRPE Pose error for MANO / human hands. \[ \text{MPJPE}=\tfrac1K\sum_{k}\| (\hat J^k-\hat J^r)-(J^k-J^r)\|_2 \]
Task Alignment Fréchet Inception Dist. Distributional gap between generated & reference grasps (images or 3‑D).
LLM / VLM Score Vision‑or‑language‑model rating of grasp ↔ prompt consistency.
Diversity Std. Dev. of Joint Angles Variation of generated grasps for a single input.

↑ higher is better  |  ↓ lower is better

Common Reward Components in Dexterous Manipulation

Typical shaping terms used in reinforcement-learning or optimization-based grasp controllers. All rewards are maximized.

Reward Type Purpose Example Formulation
Approach Move hand toward demonstrated pre-grasp pose \[ r_{\text{approach}} = -\bigl\|\hat{\mathbf p}_{\text{pre}}-\mathbf p_{\text{hand}}\bigr\|_2, \qquad \text{if } \|\cdot\| > 0.2 \]
Pre-grasp Imitation Align hand pose with a demonstrated pre-grasp \[ r_{\text{pre}} = - \bigl(\|\mathbf p-\hat{\mathbf p}\|_2 + \| \mathbf q-\hat{\mathbf q}\|_2\bigr) \]
Trajectory Following Track a desired object trajectory after grasp \[ r_{\text{obj}} = - \bigl\|\mathbf x_{\text{obj}} - \mathbf x_{\text{target}}\bigr\|_2, \qquad t \ge \lambda \]
Contact Ensure multi-finger contact with the object \[ r_{\text{contact}} = \mathbf 1[\text{thumb} + 2\ \text{fingers contact}] \]
Force Encourage proper force distribution on graspable surfaces Weighted sum of contact forces on graspable
vs. non-graspable regions
Lift Promote upward motion of the grasped object \[ r_{\text{lift}} = z_{\text{obj}} - z_{\text{goal}} \]
Anatomical & Reg. Maintain joint limits and smooth motions Penalties on joint-limit violations and high velocities
Success Bonus Reward on final task completion \[ r_{\text{success}} = \mathbf 1[\text{goal reached}] \times b \]

Evaluating Grasps

Evaluation Code ↗

Evaluator Outcomes

Simulation success uses the height criterion; if max_height_achieved - z_table ≥ 0.25m at any timestep.

Evaluator Modality Primary metrics Pass condition(s) Pass (%) Mean latency
Simulation (comprehensive) Physics success rate; max height; slip rate height ≥ 0.25 m 58.7 86 ms/grasp
Mesh Plausibility (SDF) Geometry penetration volume; contact area; #contacts penetration ≤ 1×10⁻⁵; contacts ≥ 3 72.4 19 ms/grasp
Intention (VLM) Semantics alignment; visual consistency; task score (0--1) each score ≥ 0.70 69.6 2.3 s/grasp

Composite GraspEval: 0.5·Sim + 0.3·Mesh + 0.2·Intention = 0.5·0.587 + 0.3·0.724 + 0.2·0.696 = 0.643.

Simulation Outcomes

Success: max_height_achieved - z_table ≥ 0.25m. Slip: post-lift COM velocity spike > 0.2 m/s with drop > 0.15m.

Dataset (subset) Hand(s) YCB objs Success (%) Med. height (m) Slip (%)
GraspXL (ECCV 2024) Allegro, LEAP 200 51 0.35 8.2
MultiDex v1.0 (CVPR 2023) Allegro, LEAP 29 48 0.33 9.0
DexFuncGrasp (AAAI 2024) Allegro, LEAP 280 44 0.32 9.7

Metric Definitions

Metric Definition Pass condition Units
Sim success Object lifted relative to table height max_t z_obj(t) - z_table ≥ 0.25 m
Slip rate Post-lift instability event COM velocity spike > 0.2 with drop > 0.15 m/s; m
Penetration volume Integrated negative SDF inside mesh ≤ 1×10⁻⁵
Contact area Union of fingertip--object contact patches report only; used by mesh check cm²
#Contacts Distinct fingertip contacts ≥ 3 count
Intention scores VLM alignment, visual consistency, task proxy (each in [0,1]) each ≥ 0.70 unitless

BibTeX

@misc2025deeplearningdexgrasp,
  title  = {Deep Learning for Dexterous Robot Grasping},
  author = {Leen, Hrishit and Aneja, Kunal and
            Reddy, Chetan and Tamilselvan, Priyadarshini and
            Nguyen, Nhi and Chakaravarthy, Sri Siddarth and
            Collins, Jeremy and Bogdanovic, Miroslav and Garg, Animesh},
  note   = {Under review at Advanced Robotics Research},
  year   = {2025}
}