PICO: Reconstructing 3D
People In Contact with Objects

CVPR 2025

*equal contribution  project lead
1Max Planck Institute for Intelligent Systems, Tübingen, Germany 2Meshcapade
3Carnegie Mellon University, USA 4UT Austin, USA 5University of Amsterdam, the Netherlands

We present PICO, a novel framework for joint human-object reconstruction in 3D.
PICO includes PICO-db, a unique dataset that pairs natural images with dense vertex-level 3D contact correspondences on both the human and the object. We leverage this dataset for building PICO-fit, an optimization-based method that fits 3D body and object meshes to an image guided by rich contact constraints. Here, we show reconstruction results of PICO-fit: 3D human pose and shape (blue), 3D object pose and shape (orange), and contact correspondences.

PICO-db Dataset Examples

Left to right: Color image, contacts (shown in various colors) annotated on the body and object.
Contact annotations establish bijective body-object correspondences, denoted with color-coding.

You can explore all annotations of the dataset 🔎here🔎👀.

Method Overview

Overview of PICO-fit, a novel method for fitting interacting 3D body and object meshes to an image. It initializes 3D body shape and pose via OSX, 3D object shape via OpenShape, and body-object contacts via retrieval from PICO-db. Then, it takes three steps: (1) It exploits contacts to solve for object pose, to register the object to the body. (2) It refines object pose and (3) body pose to align these to an object and human mask, respectively, detected in the image while satisfying contacts and avoiding penetrations. For every stage we show inputs, outputs, losses, and optimizable variables.

Reconstruction Examples

Comparison with SOTA:

From left to right: Color image, reconstructions from CONTHO, PHOSA* and PICO-fit*.

PICO-fit* examples:

Reconstruction examples of PICO-fit*, using PICO-db contact annotations.

PICO-fit examples:

Reconstruction examples of PICO-fit, relying on contact lookup from PICO-db.

Abstract

Recovering 3D Human-Object Interaction (HOI) from single images is challenging due to depth ambiguities, occlusions, and the huge variation in object shape and appearance. Thus, past work requires controlled settings such as known object shapes and contacts, and tackles only limited object classes. Instead, we need methods that generalize to natural images and novel object classes.

We tackle this in two main ways:

(1) We collect PICO-db, a new dataset of natural images uniquely paired with dense 3D contact correspondences on both body and object meshes. To this end, we use images from the recent DAMON dataset that are paired with annotated contacts, but only on a canonical 3D body. In contrast, we seek contact labels on both the body and the object. To infer these, given an image, we retrieve an appropriate 3D object mesh from a database by leveraging vision foundation models. Then, we project DAMON's body contact patches onto the object via a novel method needing only 2 clicks per patch. This minimal human input establishes rich contact correspondences between bodies and objects.

(2) We exploit our new dataset in a novel render-and-compare fitting method, called PICO-fit, to recover 3D body and object meshes in interaction. PICO-fit infers contact for the SMPL-X body, retrieves a likely 3D object mesh and contact from PICO-db for that object, and uses the contact to iteratively fit the 3D body and object meshes to image evidence via optimization. Uniquely, PICO-fit works well for many object classes that no existing method can tackle. This is crucial for scaling HOI understanding in the wild.

Intro Video

Coming soon!

Acknowledgments & Disclosure

We thank Felix Grüninger for advice on mesh preprocessing, Jean-Claude Passy and Valkyrie Felso for advice on the data collection, and Xianghui Xie for advice on HDM evaluation. We also thank Tsvetelina Alexiadis, Taylor Obersat, Claudia Gallatz, Asuka Bertler, Arina Kuznetcova, Suraj Bhor, Tithi Rakshit, Tomasz Niewiadomski, Valerian Fourel and Florentin Doll for their immense help in the data collection and verification process, Benjamin Pellkofer for IT support, and Nikos Athanasiou for the helpful discussions. This work was funded in part by the International Max Planck Research School for Intelligent Systems (IMPRS-IS). D. Tzionas is supported by the ERC Starting Grant (project STRIPES, 101165317).

DT has received a research gift fund from Google. While MJB is a co-founder and Chief Scientist at Meshcapade, his research in this project was performed solely at, and funded solely by, the Max Planck Society.

Contact

For technical questions, please contact pico@tue.mpg.de
For commercial licensing, please contact ps-licensing@tue.mpg.de

BibTeX

@misc{cseke_tripathi_2025_cvpr_pico,
    title     = {{PICO}: Reconstructing {3D} People In Contact with Objects}, 
    author    = {Alp\'{a}r Cseke and Shashank Tripathi and Sai Kumar Dwivedi and Arjun Lakshmipathy and Agniv Chatterjee and Michael J. Black and Dimitrios Tzionas},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
}