WARP

Whole-Body Retargeting for Learning from Offline Human Demonstrations

Anonymous project page for review

Paper (coming soon) ArXiv (coming soon) Citation

TL;DR: WARP converts offline human demonstrations into precise, consistent whole-body robot actions in closed form — enabling open-loop replay and policy learning for whole-body mobile manipulation, with no teleoperation in the loop.

Abstract

Direct transfer from human demonstration to learnable robot action is a crucial step towards scalable whole-body mobile manipulation. While human data scales better than mobile teleoperation, it requires overcoming significant embodiment gaps. Existing retargeting methods yield imprecise or inconsistent solutions, causing action multi-modality that prevents supervised policies from reliably converging. We present Whole-body-Aware Retargeting from human Pose (WARP), an offline pipeline that explicitly models embodiment differences to extract precise, unique whole-body actions. WARP leverages a closed-form Shoulder–Elbow–Wrist (SEW) geometric solver for exact end-effector tracking while preserving whole-body structural intent. Paired with lazy mobile-base control, it extracts accurate, consistent robot trajectories. Evaluations show WARP provides highly reliable data for open-loop real-world replay. To our knowledge, WARP is the first framework to achieve zero-shot whole-body mobile manipulation directly from offline human demonstrations, eliminating the need for human-in-the-loop teleoperation action data.

Why Offline Whole-Body Retargeting?

Whole-body demonstrations are the bottleneck. Teleoperation is expensive, slow, and unnatural; offline human data is cheap, fast, and natural — with no robot in the loop.

Online teleoperation closes the loop in real time; offline retargeting must replay open-loop, with no operator to correct errors.

Why Existing Offline Retargeting Falls Short

Open-loop replay is unforgiving — the retargeted trajectory is the supervision. Existing retargeters break two ways:

Failure mode 1 · Imprecision Weighted IK objectives trade end-effector accuracy against whole-body similarity. Either way, retargeting error becomes supervision error.

Failure mode 2 · Inconsistency Redundant humanoids map nearly identical human poses to different robot configurations across solver seeds — similar observations, divergent actions.

End-effector-only retargeting (MINK-EF) yields imprecise, non-human-like poses; adding elbow/torso targets makes the baseline inconsistent. WARP stays precise and consistent. Plays automatically when in view.

Method

WARP's core, c-SEW, solves retargeting in closed form over the Shoulder–Elbow–Wrist representation: the palm is matched exactly as a hard constraint, the remaining freedom preserves the human pose, and joint angles follow analytically — unique, exact, microseconds per frame, no calibration.

Animated walkthrough of c-SEW: from the SEW skeleton and end-effector mismatch, through the elbow nullspace, to the unique solved robot configuration. Plays automatically when in view.

Adaptive offset Shift the robot upper-body origin so palm centroids coincide with the human's — link-length differences absorbed in closed form.

Palm alignment Hand orientation transfers directly, the wrist follows analytically, and the elbow lands on the human's SEW half-plane — at most one valid configuration.

Lazy mobile base The 6-DoF torso absorbs small adjustments; the base moves only for genuine relocation — stable manipulation, smooth base motion.

Hierarchical policy One flow-matching head, block-causal attention over base ≼ torso ≼ arm ≼ hand — same inference cost.

Data: a single Meta Quest — whole body + hands at 60 Hz, no mocap rig or robot in the loop. Robot: RB-Y1 (holonomic base, 6-DoF torso, two 7-DoF arms, XHands) at 100 Hz.

Results

Retargeting Quality

514 BONES-SEED-SOMA manipulation clips · baselines: MINK-EF (EEF-only), MINK-TE (+torso/elbow), SEW-M · lower is better, bold = best, underline = second.

Method	JL	Tracking error ↓			Feasibility ↓			Solver variation ↓
Method	JL	Palm mm	P95 mm	Ori. deg	Torso frac.	Limit frac.	Coll. frac.	NNAD —	PCA eig.	RMS deg
SEW-M	off	178.979	201.056	7.89e-6	0.000	0.0126	0.243	0.488	1.20e-25	6.83e-14
MINK-EF	off	0.701	1.853	0.0107	0.625	0.1610	0.977	0.368	173.49	3.117
MINK-TE	off	18.557	73.980	0.157	0.027	0.0852	0.640	1.026	1119.11	6.106
WARP	off	0.0046	0.046	8.74e-6	0.000	0.0047	0.163	0.289	1.14e-25	6.66e-14
SEW-M	on	215.641	272.842	6.304	0.162	0.0131	0.084	0.442	1.09e-25	6.56e-14
MINK-EF	on	0.751	2.478	0.0112	0.611	0.1247	0.222	0.354	227.29	3.169
MINK-TE	on	19.492	69.345	0.188	0.049	0.0420	0.478	0.454	613.47	4.964
WARP	on	24.048	82.036	3.259	0.130	0.0060	0.017	0.266	1.06e-25	6.46e-14

WARP cuts palm error >150× vs MINK-EF at machine-precision orientation, violates joint limits and self-collides the least, is orders of magnitude more consistent, and solves ~30× faster (an hour vs a day for SEED).

Qualitative comparison — pick a task:

Retargeting the same human clip with each method. Left to right: Human input, WARP (ours), SEW-M, MINK-EF, MINK-TE.

Policy Learning in Simulation

WARP retargets one robot motion to different robot embodiments

200 DexMimicGen demos per task, retargeted GR1 → RB-Y1 with WARP and MINK; identical policy per retargeter. Replay is comparable — but policies trained on WARP data succeed +12% on average. Nuances invisible at replay become decisive downstream.

Method	can_sort		pouring		coffee		average
Method	replay	policy	replay	policy	replay	policy	replay	policy
MINK	99.5%	94%	88.5%	74%	50.5%	8%	79.5%	59%
WARP	98.5%	100%	90.5%	78%	51.0%	34%	80.0%	71%

Replay and policy rollout success (%) on DexMimicGen tasks.

Real-World Evaluation

Human demonstrations and robot executions across four real-world tasks

50 human demos per task: laundry (bimanual wrists), cart (base–arm contact), box (torso–arm twist), fridge — closed with the elbow, which EEF-only methods cannot even express.

Real-world replay and policy rollout results, 10 trials each

WARP wins on all four tasks: 90% replay on the elbow-contact fridge task, and stronger policies even at equal replay rates (65% vs 40% on rotate-box) — motion quality, not replay completion, governs downstream success.

Limitations

Our current policy training lacks image observations, which limits the range of tasks we can attempt. We plan to investigate visual-motor policy learning next, and how WARP-retargeted offline human data can support it.

BibTeX

@inproceedings{warp2026anonymous,
      title     = {WARP: Whole-Body Retargeting for Learning from Offline Human Demonstrations},
      author    = {Anonymous Authors},
      booktitle = {Conference on Robot Learning (CoRL)},
      year      = {2026},
      note      = {Under review}
}