publications

2025

  1. groupdiff_thumbnail.jpg
    Arxiv
    Group Diffusion: Enhancing Image Generation by Unlocking Cross-Sample Collaboration
    Sicheng Mo, Thao Nguyen, Richard Zhang, Nicholas Kolkin, Siddharth Srinivasan Iyer, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, and Yuheng Li
    In ArXiv, 2025
  2. relsim.gif
    Arxiv
    Relational Visual Similarity
    Thao Nguyen, Sicheng Mo, Krishna Kumar Singh, Yilin Wang, Jing Shi, Nicholas Kolkin, Eli Shechtman, Yong Jae Lee, and Yuheng Li
    In ArXiv, 2025
  3. dreamland_thumbnail.png
    Arxiv
    Dreamland: Controllable World Creation with Simulator and Generative Models
    Sicheng Mo*, Ziyang Leng*, Leon Liu, Weizhen Wang, Honglin He, Zhang Huizhi, and Bolei Zhou
    In ArXiv, 2025
  4. xfusion_thumbnail.png
    Arxiv
    X-Fusion: Introducing New Modality to Frozen Large Language Models
    Sicheng Mo, Thao Nguyen, Xun Huang, Siddharth Srinivasan Iyer, Yijun Li, Yuchen Liu, Abhishek Tandon, Eli Shechtman, Krishna Kumar Singh, Yong Jae Lee, Bolei Zhou, and Yuheng Li
    In International Conference on Computer Vision (ICCV) , 2025
    Best Paper at CVPR 2025 T4V Workshop

2024

  1. simgen.jpeg
    NeurIPS
    SimGen: Simulator-conditioned Driving Scene Generation
    Yunsong Zhou, Michael Simon, Zhenghao Peng, Sicheng Mo, Hongzhi Zhu, Minyi Guo, and Bolei Zhou
    In Advances in Neural Information Processing Systems (NeurIPS), 2024
  2. ctrl-x.jpg
    NeurIPS
    Ctrl-X: Controlling Structure and Appearance for Text-To-Image Generation Without Guidance
    Kuan Heng Lin*, Sicheng Mo*, Ben Klingher, Fangzhou Mu, and Bolei Zhou
    In Advances in Neural Information Processing Systems (NeurIPS), 2024
  3. freecontrol1.jpg
    CVPR
    FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition
    Sicheng Mo*, Fangzhou Mu*, Kuan Heng Lin, Yanli Liu, Bochen Guan, Yin Li, and Bolei Zhou
    In Computer Vision and Pattern Recognition (CVPR), 2024
  4. snag.png
    CVPR
    SnAG: Scalable and Accurate Video Grounding
    Fangzhou Mu*, Sicheng Mo*, and Yin Li
    In Computer Vision and Pattern Recognition (CVPR), 2024

2022

  1. vigformer.png
    arXiv
    A Simple Transformer-Based Model for Ego4D Natural Language Queries Challenge
    Sicheng Mo, Fangzhou Mu, and Yin Li
    In arXiv, Oct 2022
  2. actionformer.png
    arXiv
    Where a Strong Backbone Meets Strong Features–ActionFormer for Ego4D Moment Queries Challenge
    Fangzhou Mu, Sicheng Mo, Gillian Wang, and Yin Li
    In arXiv, Oct 2022
  3. cas.png
    ICMLW
    Causal Omnivore: Fusing Noisy Estimates of Spurious Correlations
    Dyah Adila, Sonia Cromp, Sicheng Mo, and Frederic Sala
    In ICML’22 Workshop on Spurious Correlations, Invariance, and Stability, Jun 2022
  4. iccp.jpg
    TPAMI
    Physics to the Rescue: Deep Non-line-of-sight Reconstruction for High-speed Imaging
    Fangzhou Mu, Sicheng Mo, Jiayong Peng, Xiaochun Liu, Ji Hyun Nam, Siddeshwar Raghavan, Andreas Velten, and Yin Li
    In ICCP/TPAMI, Jun 2022