WARP — Geometric Whole-Body Retargeting

Whole-Body Retargeting for Learning from Offline Human Demonstrations

Anonymous project page for review

TL;DR: WARP converts offline human demonstrations into precise, consistent whole-body robot actions in closed form — enabling open-loop replay and policy learning for whole-body mobile manipulation, with no teleoperation in the loop.

Abstract

Direct transfer from human demonstration to learnable robot action is a crucial step towards scalable whole-body mobile manipulation. While human data scales better than mobile teleoperation, it requires overcoming significant embodiment gaps. Existing retargeting methods yield imprecise or inconsistent solutions, causing action multi-modality that prevents supervised policies from reliably converging. We present Whole-body-Aware Retargeting from human Pose (WARP), an offline pipeline that explicitly models embodiment differences to extract precise, unique whole-body actions. WARP leverages a closed-form Shoulder–Elbow–Wrist (SEW) geometric solver for exact end-effector tracking while preserving whole-body structural intent. Paired with lazy mobile-base control, it extracts accurate, consistent robot trajectories. Evaluations show WARP provides highly reliable data for open-loop real-world replay. To our knowledge, WARP is the first framework to achieve zero-shot whole-body mobile manipulation directly from offline human demonstrations, eliminating the need for human-in-the-loop teleoperation action data.

Why Offline Whole-Body Retargeting?

Whole-body demonstrations are the bottleneck. Teleoperation is expensive, slow, and unnatural; offline human data is cheap, fast, and natural — with no robot in the loop.

Online teleoperation closes the loop in real time; offline retargeting must replay open-loop, with no operator to correct errors.

Why Existing Offline Retargeting Falls Short

Open-loop replay is unforgiving — the retargeted trajectory is the supervision. Existing retargeters break two ways:

Failure mode 1 · Imprecision Weighted IK objectives trade end-effector accuracy against whole-body similarity. Either way, retargeting error becomes supervision error.
Failure mode 2 · Inconsistency Redundant humanoids map nearly identical human poses to different robot configurations across solver seeds — similar observations, divergent actions.

End-effector-only retargeting (MINK-EF) yields imprecise, non-human-like poses; adding elbow/torso targets makes the baseline inconsistent. WARP stays precise and consistent. Plays automatically when in view.

Method

WARP's core, c-SEW, solves retargeting in closed form over the Shoulder–Elbow–Wrist representation: the palm is matched exactly as a hard constraint, the remaining freedom preserves the human pose, and joint angles follow analytically — unique, exact, microseconds per frame, no calibration.

Animated walkthrough of c-SEW: from the SEW skeleton and end-effector mismatch, through the elbow nullspace, to the unique solved robot configuration. Plays automatically when in view.

Adaptive offset Shift the robot upper-body origin so palm centroids coincide with the human's — link-length differences absorbed in closed form.
Palm alignment Hand orientation transfers directly, the wrist follows analytically, and the elbow lands on the human's SEW half-plane — at most one valid configuration.
Lazy mobile base The 6-DoF torso absorbs small adjustments; the base moves only for genuine relocation — stable manipulation, smooth base motion.
Hierarchical policy One flow-matching head, block-causal attention over base ≼ torso ≼ arm ≼ hand — same inference cost.

Data: a single Meta Quest — whole body + hands at 60 Hz, no mocap rig or robot in the loop.  Robot: RB-Y1 (holonomic base, 6-DoF torso, two 7-DoF arms, XHands) at 100 Hz.

Results

Retargeting Quality

514 BONES-SEED-SOMA manipulation clips · baselines: MINK-EF (EEF-only), MINK-TE (+torso/elbow), SEW-M · lower is better, bold = best, underline = second.

Method JL Tracking error ↓ Feasibility ↓ Solver variation ↓
Palm
mm
P95
mm
Ori.
deg
Torso
frac.
Limit
frac.
Coll.
frac.
NNAD
PCA
eig.
RMS
deg
SEW-Moff 178.979201.0567.89e-6 0.0000.01260.243 0.4881.20e-256.83e-14
MINK-EFoff 0.7011.8530.0107 0.6250.16100.977 0.368173.493.117
MINK-TEoff 18.55773.9800.157 0.0270.08520.640 1.0261119.116.106
WARPoff 0.00460.0468.74e-6 0.0000.00470.163 0.2891.14e-256.66e-14
SEW-Mon 215.641272.8426.304 0.1620.01310.084 0.4421.09e-256.56e-14
MINK-EFon 0.7512.4780.0112 0.6110.12470.222 0.354227.293.169
MINK-TEon 19.49269.3450.188 0.0490.04200.478 0.454613.474.964
WARPon 24.04882.0363.259 0.1300.00600.017 0.2661.06e-256.46e-14

WARP cuts palm error >150× vs MINK-EF at machine-precision orientation, violates joint limits and self-collides the least, is orders of magnitude more consistent, and solves ~30× faster (an hour vs a day for SEED).

Qualitative comparison — pick a task:

Retargeting the same human clip with each method. Left to right: Human input, WARP (ours), SEW-M, MINK-EF, MINK-TE.

Policy Learning in Simulation

WARP retargets one robot motion to different robot embodiments

200 DexMimicGen demos per task, retargeted GR1 → RB-Y1 with WARP and MINK; identical policy per retargeter. Replay is comparable — but policies trained on WARP data succeed +12% on average. Nuances invisible at replay become decisive downstream.

Method can_sort pouring coffee average
replaypolicy replaypolicy replaypolicy replaypolicy
MINK 99.5%94% 88.5%74% 50.5%8% 79.5%59%
WARP 98.5%100% 90.5%78% 51.0%34% 80.0%71%

Replay and policy rollout success (%) on DexMimicGen tasks.

Real-World Evaluation

Human demonstrations and robot executions across four real-world tasks

50 human demos per task: laundry (bimanual wrists), cart (base–arm contact), box (torso–arm twist), fridge — closed with the elbow, which EEF-only methods cannot even express.

Real-world replay and policy rollout results, 10 trials each

WARP wins on all four tasks: 90% replay on the elbow-contact fridge task, and stronger policies even at equal replay rates (65% vs 40% on rotate-box) — motion quality, not replay completion, governs downstream success.

Limitations

Our current policy training lacks image observations, which limits the range of tasks we can attempt. We plan to investigate visual-motor policy learning next, and how WARP-retargeted offline human data can support it.

BibTeX

@inproceedings{warp2026anonymous,
      title     = {WARP: Whole-Body Retargeting for Learning from Offline Human Demonstrations},
      author    = {Anonymous Authors},
      booktitle = {Conference on Robot Learning (CoRL)},
      year      = {2026},
      note      = {Under review}
}