Sicheng Mo

I am a second-year PhD student in Computer Science at UCLA, where I am being advised by Prof. Bolei Zhou. I earned my Bachelor’s degree in Computer Science and Applied Mathematics from the University of Wisconsin-Madison, during which I conducted research under the guidance of Prof. Yin Li and Prof. Fred Sala.

My research lies in Computer Vision. I am particularly interested in large generative models for visual content generation and understanding. I also worked on scalable language-driven video understanding and 3D reconstruction before.

In the past, my work extended to the field of machine learning, specifically on causal inference for Out-of-distribution generalization.

News

Feb 21, 2026	Two papers accepted to CVPR 2026! Checkout our GroupDiff and Relational Visual Similarity.
Jun 25, 2025	X-Fusion has been accepted to ICCV 2025 and received the Best Paper Award at the CVPR 2025 T4V Workshop.
Jun 09, 2025	I will join Adobe Research again as an intern this summer, working with Dr. Yuheng Li.

Selected publications

* indicates equal contribution

CVPR

Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration

Sicheng Mo, Thao Nguyen, Richard Zhang, Nicholas Kolkin, Siddharth Srinivasan Iyer, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, and Yuheng Li

In Computer Vision and Pattern Recognition (CVPR), 2026

Website PDF Code
Arxiv

Dreamland: Controllable World Creation with Simulator and Generative Models

Sicheng Mo^*, Ziyang Leng^*, Leon Liu, Weizhen Wang, Honglin He, Zhang Huizhi, and Bolei Zhou

In ArXiv, 2025

Website PDF
Arxiv

X-Fusion: Introducing New Modality to Frozen Large Language Models

Sicheng Mo, Thao Nguyen, Xun Huang, Siddharth Srinivasan Iyer, Yijun Li, Yuchen Liu, Abhishek Tandon, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, and Yuheng Li

In International Conference on Computer Vision (ICCV) , 2025

Best Paper at CVPR 2025 T4V Workshop

Website PDF
NeurIPS

Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance

Kuan Heng Lin^*, Sicheng Mo^*, Ben Klingher, Fangzhou Mu, and Bolei Zhou

In Advances in Neural Information Processing Systems (NeurIPS), 2024

Website PDF
CVPR

FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition

Sicheng Mo^*, Fangzhou Mu^*, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, and Bolei Zhou

In Computer Vision and Pattern Recognition (CVPR), 2024

Website PDF Code
CVPR

SnAG: Scalable and Accurate Video Grounding

Fangzhou Mu^*, Sicheng Mo^*, and Yin Li

In Computer Vision and Pattern Recognition (CVPR), 2024

Website PDF