ScoreHMR:
Score-Guided Diffusion for 3D Human Recovery

Anastasis Stathopoulos      Ligong Han      Dimitris Metaxas     
Rutgers University
CVPR 2024

Paper
Paper
Github code
Code
Google Colab Demo
Colab

We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model. Here, we show the application of our approach on videos, utilizing keypoint detections and score guidance with keypoint reprojection and temporal smoothness terms.



Abstract

We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. These inverse problems involve fitting a human body model to image observations, traditionally solved through optimization techniques. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model. The diffusion model is trained to capture the conditional distribution of the human model parameters given an input image. By guiding its denoising process with a task-specific score, ScoreHMR effectively solves inverse problems for various applications without the need for retraining the task-agnostic diffusion model. We evaluate our approach on three settings/applications. These are: (i) single-frame model fitting; (ii) reconstruction from multiple uncalibrated views; (iii) reconstructing humans in video sequences. ScoreHMR consistently outperforms all optimization baselines on popular benchmarks across all settings.



Approach

Top row: Overview of ScoreHMR, which iteratively refines an initial regression estimate in a DDIM inversion -- DDIM guided sampling loop until the human body model aligns with the available observation. Bottom row: Applications. (a): Body model fitting to 2D keypoints. (b): Multi-view refinement of individual per-frame predictions with cross-view consistency guidance. (c): Recovering smooth and consistent 3D human motion from a video given initial per-frame estimates.


Results

Comparisons with HMR 2.0 and ProHMR-fitting

We present comparisons to an optimization approach (ProHMR-fitting) for temporal model fitting to 2D keypoint detections. ScoreHMR and ProHMR-fitting are initialized by the regression estimate of HMR 2.0b. ScoreHMR reconstructions are temporally stable, and better aligned to the input video than those of HMR 2.0b and ProHMR-fitting. ProHMR-fitting has more jitter and can sometimes fail on hard poses or unusual viewpoints.

Comparisons with ProHMR-regression and ProHMR-fitting

We present comparisons to an optimization approach (ProHMR-fitting) for temporal model fitting to 2D keypoint detections. ScoreHMR and ProHMR-fitting are run on top of ProHMR-regression. ScoreHMR cann effectively refine the less accurate ProHMR-regression estimate, and results in more faithful 3D reconstructions than the baselines. ProHMR-fitting has more jitter and can sometimes fail on hard poses or unusual viewpoints.

Comparison with single-frame model fitting methods

We compare our approach (green) with ProHMR-fitting (blue) and SMPLify (grey). All model fitting algorithms are initialized with regression from ProHMR (pink) or HMR 2.0b (white). ScoreHMR achieves more faithful reconstructions than the optimization baselines.


Citation



Acknowledgements

This project was inspired by ProHMR and DPS. This webpage template was borrowed from some colorful folks. Icons: Flaticon.