Welcome to
Computer Vision and Learning Group.

...
...
...
...
...

Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

We study computational models that enable machines to perceive and analyze human activities from visual input. We leverage machine learning and optimization techniques to build statistical models of humans and their behaviors. Our goal is to advance algorithmic foundations of scalable and reliable human digitalization, enabling a broad class of real-world applications. Our group is part of the Institute for Visual Computing (IVC) at the Department of Computer Science of ETH Zurich.

Featured Projects

In-depth look at our work.

Neural Texture Splatting: Expressive 3D Gaussian Splatting for View Synthesis, Geometry, and Dynamic Reconstruction

ConferenceSIGGRAPH Asia 2025 Conference Track

Authors:Yiming WangShaofei WangMarko MihajlovicSiyu Tang

Neural Texture Splatting is an expressive extension of 3D Gaussian Splatting that introduces a local neural RGBA field for each primitive.

Learning Efficient Fuse-and-Refine for Feed-Forward 3D Gaussian Splatting

ConferenceNeurIPS 2025

Authors:Yiming WangLucy ChaiXuan LuoMichael NiemeyerManuel LagunasStephen LombardiSiyu TangTiancheng Sun

SplatVoxel is a hybrid Splat-Voxel representation that fuses and refines Gaussian Splatting, improving static scene reconstruction and enabling history-aware streaming reconstruction in a zero-shot manner.

DNF-Avatar: Distilling Neural Fields for Real-time Animatable Avatar Relighting

ConferenceICCV 2025 Findings Workshop

Authors:Zeren JiangShaofei WangSiyu Tang

DNF-Avatar, a novel framework to distill knowledge from implicit model to explicit one for real-time rendering and relighting.

UniPhys: Unified Planner and Controller with Diffusion for Flexible Physics-Based Character Control

ConferenceInternational Conference on Computer Vision (ICCV 2025) highlight

Authors:Yan WuKorrawe KarunratanakulZhengyi LuoSiyu Tang

UniPhys is a diffusion-based unified planner and text-driven controller for physics-based character control. It generalizes across diverse tasks using a single model—from short-term reactive control tasks to long-term planning tasks, without requiring task-specific training.


DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction

ConferenceInternational Conference on Computer Vision (ICCV 2025)

Authors:Rui WangQuentin LohmeyerMirko MeboldtSiyu Tang

With gaussian splatting based self-supervised dynamic-static decomposition, DeGauss models SOTA distractor-free static scene from occluded inputs as casual captured images & challenging egocentric videos, and simultaneously yields high-quality & Efficient dynamic scene representation.

VolumetricSMPL: A Neural Volumetric Body Model for Efficient Interactions, Contacts, and Collisions

ConferenceInternational Conference on Computer Vision (ICCV 2025) highlight

Authors:Marko MihajlovicSiwei ZhangGen LiKaifeng ZhaoLea MüllerSiyu Tang

VolumetricSMPL is a lightweight extension that adds volumetric capabilities to SMPL(-X) models for efficient 3D interactions and collision detection.

EgoM2P: Egocentric Multimodal Multitask Pretraining

ConferenceInternational Conference on Computer Vision (ICCV 2025)

Authors:Gen LiYutong Chen*Yiqian Wu*Kaifeng Zhao*Marc PollefeysSiyu Tang (*equal contribution)

EgoM2P: A large-scale egocentric multimodal and multitask model, pretrained on eight extensive egocentric datasets. It incorporates four modalities—RGB and depth video, gaze dynamics, and camera trajectories—to handle challenging tasks like monocular egocentric depth estimation, camera tracking, gaze estimation, and conditional egocentric video synthesis

Latest News

Here’s what we've been up to recently.