Zuyan Liu (刘祖炎)

I am a third-year Ph.D. student at the Intelligent Vision Group (IVG), Department of Automation, Tsinghua University, advised by Prof. Jiwen Lu . Prior to that, I received my Bachelor's degree from the Department of Automation, Tsinghua University in 2023 (Ranking 1/170).

I am broadly interested in large language model and computer vision. My current research focuses on multi-modal large language models and large vision models.

Email  /  Google Scholar  /  GitHub

profile photo

Publications

* indicates equal contribution

dise Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu*, Yuhao Dong*, Ziwei Liu, Winston Hu, Jiwen Lu, Yongming Rao,
International Conference on Learning Representations (ICLR), 2025
[arXiv] [Code] [Project Page] [中文解读]

Oryx offers an on-demand solution to seamlessly and efficiently process visual inputs with arbitrary spatial sizes and temporal lengths.

dise Ola: Pushing the Frontiers of Omni-Modal Language Model
Zuyan Liu*, Yuhao Dong*, Jiahui Wang, Ziwei Liu, Winston Hu, Jiwen Lu, Yongming Rao
arXiv, 2025
[arXiv] [Code] [Project Page] [中文解读] [Rank 1st on OpenCompass Leaderboard (<15B)]

Ola is an Omni-modal Language model that achieves competitive performance across image, video, and audio understanding compared to specialized models, pushing the frontiers of the omni-modal language model.

dise SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs
Jiahui Wang*, Zuyan Liu*, Yongming Rao, Jiwen Lu
IEEE International Conference on Computer Vision (ICCV), 2025
[arXiv] [Code] [Project Page] [中文解读]

SparseMM observe the sparsity of attention heads for vision-language multi-modal models, termed Visual Heads, and applies asymmetric operations to achieve model pruning and reasoning acceleration.

dise Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Yuhao Dong*, Zuyan Liu*, Hailong Sun, Jingkang Yang, Winston Hu, Yongming Rao, Ziwei Liu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025
Highlight
[arXiv] [Code] [中文解读]

Insight-V is a multi-agent system consisting of a reasoning agent dedicated to performing long-chain reasoning and a summary agent trained to judge and summarize reasoning results.

dise Efficient High-Order Spatial Interactions for Visual Perception
Zuyan Liu, Yongming Rao, Wenliang Zhao, Jie Zhou, Jiwen Lu
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF:18.6), 2025
[Paper] [Code] [Project Page]

We propose the general HorNet-Family, including HorNet, Hor3D, and HorCLIP for a comprehensive visual fundamental architecture with better performance-efficiency trade-off.

dise Efficient Inference of Vision Instruction-Following Models with Elastic Cache
Zuyan Liu, Benlin Liu, Jiahui Wang, Yuhao Dong, Guangyi Chen, Ranjay Krishna, Yongming Rao, Jiwen Lu
European Conference on Computer Vision (ECCV), 2024
[arXiv] [Code] [Project Page]

Elastic Cache is a novel approach for KV Cache acceleration in multi-modal large language models that benefits from applying distinct acceleration methods for instruction encoding and output generation stages.

dise Chain-of-Spot: Interactive Reasoning Improves Large Vision-Language Models
Zuyan Liu*, Yuhao Dong*, Yongming Rao, Jie Zhou, Jiwen Lu
arXiv, 2024
[arXiv] [Code] [Project Page]

The Chain-of-Spot (CoS) method is an approach that enhances feature extraction by focusing on key regions of interest (ROI) within the image, corresponding to the posed questions or instructions.

dise Unleashing Text-to-Image Diffusion Models for Visual Perception
Wenliang Zhao*, Yongming Rao*, Zuyan Liu*, Benlin Liu Jie Zhou, Jiwen Lu
IEEE International Conference on Computer Vision (ICCV), 2023
[arXiv] [Code] [Project Page] [Rank 1st on NYUv2 Depth Estimation]

VPD (Visual Perception with Pre-trained Diffusion Models) is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.

dise Dynamic Spatial Sparsification for Efficient Vision Transformers and Convolutional Neural Networks
Yongming Rao*, Zuyan Liu*, Wenliang Zhao*, Jie Zhou, Jiwen Lu
IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI, IF:24.31), 2023
[arXiv] [Code] [Project Page]

The dynamic spatial sparsification framework can be applied to general visual architectures (e.g. Transformers, ConvNeXt, Swin Transformers) and visual tasks (e.g. classification, object detection, semantic segmentation) for efficient inference.

dise DiffSwap: High-Fidelity and Controllable Face Swapping via 3D-Aware Masked Diffusion
Wenliang Zhao, Yongming Rao*, Weikang Shi, Zuyan Liu, Wenliang Zhao*, Jie Zhou, Jiwen Lu
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2023
[arXiv] [Code]

DiffSwap is a diffusion model based framework for high-fidelity and controllable face swapping.

dise PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers
Xumin Yu*, Yongming Rao*, Ziyi Wang, Zuyan Liu, Jiwen Lu , Jie Zhou
IEEE International Conference on Computer Vision (ICCV), 2021
Oral Presentation
[arXiv] [Code] [中文解读]

PoinTr is a transformer-based framework that reformulates point cloud completion as a set-to-set translation problem.

Experiences
dise Tencent Hunyuan
Multi-Modal Model Group, Research Intern
Topic: Multi-Modal
dise ByteDance Seed
Seed Vision Group, Research Intern
Topic: Video Generation
dise Beijing Academy of Artificial Intelligence
Vision Model Research Center, Research Intern
Topic: Multi-Modal
dise ByteDance
Intelligent Creation Group, Research Intern
Topic: Human AIGC
Honors and Awards

  • 2025 China National Scholarship (PhD Student) / 国家奖学金(博士生)
  • 2025 Hunyuan Scholarship, Tencent / 混元学者(中国电子学会-腾讯博士生科研激励计划)
  • 2023 Outstanding Undergraduate, Tsinghua University / 清华大学优秀毕业生
  • 2023 Outstanding Undergraduate, Beijing / 北京市优秀毕业生
  • 2022 China National Scholarship (Undergraduate) / 国家奖学金(本科生)
  • 2021 Jiang Nanxiang Scholarship, Tsinghua University / 蒋南翔奖学金
  • 2020 December 9th Scholarship, Tsinghua University / 一二·九奖学金
  • 2022,2021,2020 Comprehensive Excellence Scholarship, Tsinghua University / 清华大学综合优秀奖学金
  • Academic Services

  • Conference Reviewer: ICLR 2026, WACV 2026, NeurIPS 2025, ICLR 2025, ICCV 2025, CVPR 2025, ECCV 2024, CVPR 2024, ICCV 2023
  • Journal Reviewer: IEEE Transactions on Multimedia, Pattern Recognition

  • Website Template


    © Zuyan Liu | Last updated: May 25, 2025