Sicheng Mo

sicheng/photo_01.jpg

I am a second-year PhD student in Computer Science at UCLA, where I am being advised by Prof. Bolei Zhou. I earned my Bachelor’s degree in Computer Science and Applied Mathematics from the University of Wisconsin-Madison, during which I conducted research under the guidance of Prof. Yin Li and Prof. Fred Sala.

My research lies in Computer Vision. I am particularly interested in large generative models for visual content generation and understanding. I also worked on scalable language-driven video understanding and 3D reconstruction before.

In the past, my work extended to the field of machine learning, specifically on causal inference for Out-of-distribution generalization.

News

Jun 25, 2025 X-Fusion has been accepted to ICCV 2025 and received the Best Paper Award at the CVPR 2025 T4V Workshop.
Jun 09, 2025 I will join Adobe Research again as an intern this summer, working with Dr. Yuheng Li.
Oct 01, 2024 Two papers accepted to NeurIPS 2024! Check out our Ctrl-X and SimGen.

Selected publications

* indicates equal contribution

  1. groupdiff_thumbnail.jpg
    Arxiv
    Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration
    Sicheng Mo, Thao Nguyen, Richard Zhang, Nicholas Kolkin, Siddharth Srinivasan Iyer, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, and Yuheng Li
    In ArXiv, 2025
  2. dreamland_thumbnail.png
    Arxiv
    Dreamland: Controllable World Creation with Simulator and Generative Models
    Sicheng Mo*, Ziyang Leng*, Leon Liu, Weizhen Wang, Honglin He, Zhang Huizhi, and Bolei Zhou
    In ArXiv, 2025
  3. xfusion_thumbnail.png
    Arxiv
    X-Fusion: Introducing New Modality to Frozen Large Language Models
    Sicheng Mo, Thao Nguyen, Xun Huang, Siddharth Srinivasan Iyer, Yijun Li, Yuchen Liu, Abhishek Tandon, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, and Yuheng Li
    In International Conference on Computer Vision (ICCV) , 2025
    Best Paper at CVPR 2025 T4V Workshop
  4. ctrl-x.jpg
    NeurIPS
    Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance
    Kuan Heng Lin*, Sicheng Mo*, Ben Klingher, Fangzhou Mu, and Bolei Zhou
    In Advances in Neural Information Processing Systems (NeurIPS), 2024
  5. freecontrol1.jpg
    CVPR
    FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
    Sicheng Mo*, Fangzhou Mu*, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, and Bolei Zhou
    In Computer Vision and Pattern Recognition (CVPR), 2024
  6. snag.png
    CVPR
    SnAG: Scalable and Accurate Video Grounding
    Fangzhou Mu*, Sicheng Mo*, and Yin Li
    In Computer Vision and Pattern Recognition (CVPR), 2024