Li Weiqi
Li Weiqi  李伟祺
M.S. Student · Computer Technology (推免)
Sun Yat-sen University, Guangzhou, China

I am a first-year M.S. student at Sun Yat-sen University (SYSU), admitted via the direct-recommendation (推免) pathway. I received my B.S. in Software Engineering from South China University of Technology (SCUT) in 2024. Currently, I am a research intern at Tencent RoboticX, focusing on VLA-based mobile manipulation.

My research interests lie in Embodied AI, Vision-Language-Action (VLA) models, Multimodal Large Language Models, and Controllable Video Generation. I aim to build robust, generalizable embodied agents that can seamlessly operate in diverse real-world environments.

Embodied AI VLA Models Mobile Manipulation Multimodal LLMs Video Diffusion 3D Vision

News

  • 2025.05 🤖 Joined Tencent RoboticX as a research intern, working on VLA-based mobile manipulation.
  • 2025.12 🎉 One paper accepted at CVPR 2026 (CCF-A): VLA Models Are More Generalizable Than You Think.
  • 2025.12 📄 New preprint: ACD submitted to IJCV. [arXiv:2512.21268]
  • 2025.09 📄 HumanGenesis submitted to NeurIPS 2026. [arXiv:2508.09858]
  • 2024.09 🎓 Started M.S. at Sun Yat-sen University (direct recommendation / 推免).

Publications

* denotes equal contribution  |  underline denotes corresponding author

VLA Models Are More Generalizable Than You Think: Revisiting Physical and Spatial Modeling
Li Weiqi, et al.
CVPR 2026 CCF-A
Investigates why VLA models (e.g., π0.5) fail dramatically under novel viewpoints. Decouples the issue into physical vs. spatial modeling failures, showing that the pretrained model retains strong physical understanding while spatial representation mismatch is the key bottleneck. Proposes two lightweight adaptation methods — FTM (token affine modulation) and FLA (low-rank ViT update) — that recover cross-viewpoint performance by updating only 4K–4.7M parameters, achieving 90.8% success rate on the LIBERO-V benchmark with a 99× parameter efficiency gain over LoRA. One-shot sim-to-real transfer validated on a real Franka arm.
Collaborative Real2Sim–Sim2Real Agential Learning for Geometric and Generative Human Dynamics
Li Weiqi, et al. (HumanGenesis)
Proposes HumanGenesis, a multi-agent collaborative framework that unifies Real2Sim and Sim2Real in a closed loop for human dynamics modeling. The system integrates a 3DGS+SMPL+Learnable-LBS Reconstructor, a Qwen2.5-VL-driven Critique Agent with multi-round self-reflection for fine-grained reconstruction refinement, and a Video Harmonizer that enhances human-scene consistency and temporal coherence in rendered videos. Achieves state-of-the-art results on HumanVid and NeuMan benchmarks.
ACD: Direct Conditional Control for Video Diffusion Models via Attention Supervision
..., Li Weiqi, et al.
Proposes Attention-Conditional Diffusion (ACD), a controllable video generation framework built on CogVideoX. Unlike traditional guidance-level conditioning, ACD directly supervises cross-attention maps inside the diffusion model via a dual-branch (masked/unmasked) shared-parameter fine-tuning scheme, moving conditioning from output-level to attention-level and eliminating common artifacts. Uses sparse 3D-aware object layouts as control signals with a layout ControlNet, supported by an automated annotation pipeline on 20K RealEstate training clips. Outperforms AC3D and other baselines on FID/FVD and camera error metrics.

Research Experience

Tencent RoboticX
Research Intern — Embodied AI
Working on Vision-Language-Action (VLA) models for mobile manipulation tasks, investigating generalization, spatial understanding, and sim-to-real transfer in whole-body robot control pipelines.
2025 – Present
Sun Yat-sen University — Graduate Research
M.S. Researcher — Embodied AI & Multimodal Generation
Research on generalizable VLA models (CVPR 2026) and multi-agent frameworks for human dynamics modeling (ICML 2026 submission).
2024.09 – Present

Education

Sun Yat-sen University (中山大学)
M.S. in Computer Technology  ·  Direct Recommendation (推免)
School of Computer Science and Engineering
Sep 2024 – Jun 2027
South China University of Technology (华南理工大学)
B.S. in Software Engineering
School of Software Engineering
Sep 2020 – Jun 2024

Technical Skills

Frameworks: PyTorch · HuggingFace Transformers · Diffusers · LoRA / PEFT · 3D Gaussian Splatting · SMPL
Research Areas: Embodied AI · VLA Policy Learning · Multimodal Modeling · Controllable Video Generation · Large Model Fine-tuning
Languages: Python · C++ · CUDA

Last updated: May 2026