PICO: Reconstructing 3D People In Contact with Objects

CVPR 2025

^*equal contribution ^†project lead

¹Max Planck Institute for Intelligent Systems, Tübingen, Germany ²Meshcapade
³Carnegie Mellon University, USA ⁴UT Austin, USA ⁵University of Amsterdam, the Netherlands

PICO-db Dataset Examples

Left to right: Color image, contacts (shown in various colors) annotated on the body and object.
Contact annotations establish bijective body-object correspondences, denoted with color-coding.

You can explore all annotations of the dataset 🔎here🔎👀.

Method Overview

Overview of PICO-fit, a novel method for fitting interacting 3D body and object meshes to an image. It initializes 3D body shape and pose via OSX, 3D object shape via OpenShape, and body-object contacts via retrieval from PICO-db. Then, it takes three steps: (1) It exploits contacts to solve for object pose, to register the object to the body. (2) It refines object pose and (3) body pose to align these to an object and human mask, respectively, detected in the image while satisfying contacts and avoiding penetrations. For every stage we show inputs, outputs, losses, and optimizable variables.

Comparison with SOTA:

From left to right: Color image, reconstructions from CONTHO, PHOSA* and PICO-fit*.

PICO-fit* examples:

Reconstruction examples of PICO-fit*, using PICO-db contact annotations.

PICO-fit examples:

Reconstruction examples of PICO-fit, relying on contact lookup from PICO-db.

Abstract

Recovering 3D Human-Object Interaction (HOI) from single images is challenging due to depth ambiguities, occlusions, and the huge variation in object shape and appearance. Thus, past work requires controlled settings such as known object shapes and contacts, and tackles only limited object classes. Instead, we need methods that generalize to natural images and novel object classes.

We tackle this in two main ways:

(1) We collect PICO-db, a new dataset of natural images uniquely paired with dense 3D contact correspondences on both body and object meshes. To this end, we use images from the recent DAMON dataset that are paired with annotated contacts, but only on a canonical 3D body. In contrast, we seek contact labels on both the body and the object. To infer these, given an image, we retrieve an appropriate 3D object mesh from a database by leveraging vision foundation models. Then, we project DAMON's body contact patches onto the object via a novel method needing only 2 clicks per patch. This minimal human input establishes rich contact correspondences between bodies and objects.

(2) We exploit our new dataset in a novel render-and-compare fitting method, called PICO-fit, to recover 3D body and object meshes in interaction. PICO-fit infers contact for the SMPL-X body, retrieves a likely 3D object mesh and contact from PICO-db for that object, and uses the contact to iteratively fit the 3D body and object meshes to image evidence via optimization. Uniquely, PICO-fit works well for many object classes that no existing method can tackle. This is crucial for scaling HOI understanding in the wild.

BibTeX

@misc{cseke_tripathi_2025_cvpr_pico, title = {{PICO}: Reconstructing {3D} People In Contact with Objects}, author = {Alp\'{a}r Cseke and Shashank Tripathi and Sai Kumar Dwivedi and Arjun Lakshmipathy and Agniv Chatterjee and Michael J. Black and Dimitrios Tzionas}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2025}, }

PICO: Reconstructing 3D
People In Contact with Objects

CVPR 2025

PICO-db Dataset Examples

Method Overview

Reconstruction Examples

Comparison with SOTA:

PICO-fit* examples:

PICO-fit examples:

Abstract

Intro Video

Acknowledgments & Disclosure

Contact

BibTeX