Yihao Wang*, Yang Miao*, Wenshuai Zhao, Wenyan Yang, Zihan Wang, Joni Pajarinen, Luc Van Gool, Danda Pani Paudel, Juho Kannala, Xi Wang†, Arno Solin
*Equal contribution †Co-advisor
Aalto University | INSAIT, Sofia University | ETH Zurich | TU Munich | MCML | ELLIS Institute Finland | University of Oulu
PAWS perceives object articulations from in-the-wild egocentric video via hand interaction and geometric cues, enabling downstream applications including articulation model fine-tuning and robot manipulation.
We propose PAWS, a method that directly extracts object articulations from hand–object interactions in large-scale in-the-wild egocentric videos. PAWS is an unsupervised articulation detection pipeline that uses only hand interactions and sparse 3D information, requiring no annotated data. It produces scalable articulation labels covering a wide range of objects and environments. We evaluate our method on HD-EPIC and Arti4D datasets, achieving significant improvements over baselines, and further demonstrate downstream applications in 3D articulation prediction and real-world robot manipulation.
See the project website at https://aaltoml.github.io/PAWS/.
- HaWoR — World-Space Hand Motion Reconstruction from Egocentric Videos (CVPR 2025)
- VidBot — Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation (CVPR 2025)
- Articulate3D — Zero-Shot Text-Driven 3D Object Posing
Coming soon.