HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics

Abstract

Synthetic human dynamics aims to generate photorealistic videos of human subjects performing expressive, intention-driven motions. However, current approaches face two core challenges: (1) geometric inconsistency and coarse reconstruction, due to limited 3D modeling and detail preservation; and (2) motion generalization limitations and scene inharmonization, stemming from weak generative capabilities. To address these, we present HumanGenesis, a framework that integrates geometric and generative modeling through four collaborative agents: (1) Reconstructor builds 3D-consistent human-scene representations from monocular video using 3D Gaussian Splatting and deformation decomposition. (2) Critique Agent enhances reconstruction fidelity by identifying and refining poor regions via multi-round MLLM-based reflection. (3) Pose Guider enables motion generalization by generating expressive pose sequences using time-aware parametric encoders. (4) Video Harmonizer synthesizes photorealistic, coherent video via a hybrid rendering pipeline with diffusion, refining the Reconstructor through a Back-to-4D feedback loop. HumanGenesis achieves state-of-the-art performance on tasks including text-guided synthesis, video reenactment, and novel-pose generalization, significantly improving expressiveness, geometric fidelity, and scene integration.

Method

The Reconstructor first recovers the 3D human and scene from monocular video by modeling rigid and non-rigid deformations. The Critique Agent then evaluates the rendered outputs to identify and refine low-quality regions, enabling fine-grained reconstruction. Next, the Pose Guider generates temporally-aware embeddings from novel parametric pose sequences using a time-aware encoder, allowing expressive motion synthesis. Finally, the Video Harmonizer leverages Spatial Feature Transform (SFT) within a video diffusion pipeline to produce photorealistic sequences and forms a feedback loop that enhances the Reconstructor’s input.

HumanGenesis : Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics

Abstract

Method

Comparisons with sota Methods

Expressive Human Dynamics

Comparisons

Multi-video consistency on HumanVid Dataset

Animated by unseen motions (e.g. text-generated motion)