Welcome to
Computer Vision and Learning Group.

...
...
...
...
...

Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

We study computational models that enable machines to perceive and analyze human activities from visual input. We leverage machine learning and optimization techniques to build statistical models of humans and their behaviors. Our goal is to advance algorithmic foundations of scalable and reliable human digitalization, enabling a broad class of real-world applications. Our group is part of the Institute for Visual Computing (IVC) at the Department of Computer Science of ETH Zurich.

Featured Projects

In-depth look at our work.

RISE-SDF: a Relightable Information-Shared Signed Distance Field for Glossy Object Inverse Rendering

Conference: The 12th International Conference on 3D Vision (3DV 2025)

Authors:Deheng Zhang*Jingyu Wang*Shaofei WangMarko MihajlovicSergey ProkudinHendrik P.A. LenschSiyu Tang (*equal contribution)

We present RISE-SDF, a method for reconstructing the geometry and material of glossy objects while achieving high-quality relighting.

DART: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

Conference: The Thirteenth International Conference on Learning Representations (ICLR 2025)

Authors:Kaifeng ZhaoGen LiSiyu Tang

DART is a Diffusion-based Autoregressive motion model for Real-time Text-driven motion control. Furthermore, DART enables various motion generation applications with spatial constraints and goals, including motion in-between, waypoint goal reaching, and human-scene interaction generation.

SplatFormer: Point Transformer for Robust 3D Gaussian Splatting

Conference: The Thirteenth International Conference on Learning Representations (ICLR 2025)

Authors:Yutong ChenMarko MihajlovicXiyi ChenYiming WangSergey ProkudinSiyu Tang

We analyze the performance of novel view synthesis methods in challenging out-of-distribution (OOD) camera views and introduce SplatFormer, a data-driven 3D transformer designed to refine 3D Gaussian splatting primitives for improved quality in extreme camera scenarios.

Degrees of Freedom Matter: Inferring Dynamics from Point Trajectories

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024)

Authors:Yan ZhangSergey ProkudinMarko MihajlovicQianli MaSiyu Tang

DOMA is an implicit motion field modeled by a spatiotemporal SIREN network. The learned motion field can predict how novel points move in the same field.

DNO: Optimizing Diffusion Noise Can Serve As Universal Motion Priors

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024)

Authors:Korrawe KarunratanakulKonpat PreechakulEmre AksanThabo BeelerSupasorn SuwajanakornSiyu Tang

Diffusion Noise Optimization (DNO) can leverage the existing human motion diffusion models as universal motion priors. We demonstrate its capability in the motion editing tasks where DNO can preserve the content of the original model and accommodates a diverse range of editing modes, including changing trajectory, pose, joint location, and avoiding newly added obstacles.

RoHM: Robust Human Motion Reconstruction via Diffusion

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024) oral presentation

Authors:Siwei ZhangBharat Lal BhatnagarYuanlu XuAlexander WinklerPetr KadlecekSiyu TangFederica Bogo

Conditioned on noisy and occluded input data, RoHM reconstructs complete, plausible motions in consistent global coordinates.

EgoGen: An Egocentric Synthetic Data Generator

ConferenceConference on Computer Vision and Pattern Recognition (CVPR 2024) oral presentation

Authors:Gen LiKaifeng ZhaoSiwei ZhangXiaozhong LyuMihai DusmanuYan ZhangMarc PollefeysSiyu Tang

EgoGen is new synthetic data generator that can produce accurate and rich ground-truth training data for egocentric perception tasks.

Latest News

Here’s what we've been up to recently.