VLG | Computer Vision and Learning Group

Spline Deformation Field

Conference: SIGGRAPH 2025 Conference Track

Authors:Mingyang Song, Yang Zhang, Marko Mihajlovic, Siyu Tang, Markus Gross, Tunc Aydin

We combine splines, a classical tool from applied mathematics, with implicit Coordinate Neural Networks to model deformation fields, achieving strong performance across multiple datasets. The explicit regularization from spline interpolation enhances spatial coherency in challenging scenarios. We further introduce a metric based on Moran's I to quantitatively evaluate spatial coherence.

Aug 10, 2025

GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering

Conference: SIGGRAPH 2025 Conference Track

Authors:Zinuo You, Stamatios Georgoulis, Anpei Chen, Siyu Tang, Dengxin Dai

GaVS reformulates video stabilization task with feed-forward 3DGS reconstruction, ensuring robustness to diverse motions, full-frame rendering and high geometry consistency.

Aug 10, 2025

Relightable Full-body Gaussian Codec Avatars

Conference: SIGGRAPH 2025 Conference Track

Authors:Shaofei Wang, Tomas Simon, Igor Santesteban, Timur Bagautdinov, Junxuan Li, Vasu Agrawal, Fabian Prada, Shoou-I Yu, Pace Nalbone, Matt Gramlich, Roman Lubachersky, Chenglei Wu, Javier Romero, Jason Saragih, Michael Zollhoefer, Andreas Geiger, Siyu Tang, Shunsuke Saito

RFGCA learns high-fidelity relightable and drivable full-body avatars from light stage captures.

Aug 10, 2025

EgoM2P: Egocentric Multimodal Multitask Pretraining

Conference: International Conference on Computer Vision (ICCV 2025)

Authors:Gen Li, Yutong Chen*, Yiqian Wu*, Kaifeng Zhao*, Marc Pollefeys, Siyu Tang (*equal contribution)

EgoM2P: A large-scale egocentric multimodal and multitask model, pretrained on eight extensive egocentric datasets. It incorporates four modalities—RGB and depth video, gaze dynamics, and camera trajectories—to handle challenging tasks like monocular egocentric depth estimation, camera tracking, gaze estimation, and conditional egocentric video synthesis

Oct 19, 2025

Text-based Animatable 3D Avatars with Morphable Model Alignment

Conference: SIGGRAPH 2025 Conference Track

Authors:Yiqian Wu, Malte Prinzler, Xiaogang Jin, Siyu Tang

AnimPortrait3D is a novel method for text-based, realistic, animatable 3DGS avatar generation with morphable model alignment.

Mar 31, 2025

RISE-SDF: a Relightable Information-Shared Signed Distance Field for Glossy Object Inverse Rendering

Conference: The 12th International Conference on 3D Vision (3DV 2025)

Authors:Deheng Zhang^*, Jingyu Wang^*, Shaofei Wang, Marko Mihajlovic, Sergey Prokudin, Hendrik P.A. Lensch, Siyu Tang (*equal contribution)

We present RISE-SDF, a method for reconstructing the geometry and material of glossy objects while achieving high-quality relighting.

Jan 24, 2025

DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

Conference: The Thirteenth International Conference on Learning Representations (ICLR 2025) spotlight presentation

Authors:Kaifeng Zhao, Gen Li, Siyu Tang

DART is a Diffusion-based Autoregressive motion model for Real-time Text-driven motion control. Furthermore, DART enables various motion generation applications with spatial constraints and goals, including motion in-between, waypoint goal reaching, and human-scene interaction generation.

Jan 23, 2025

SplatFormer: Point Transformer for Robust 3D Gaussian Splatting

Conference: The Thirteenth International Conference on Learning Representations (ICLR 2025) spotlight presentation

Authors:Yutong Chen, Marko Mihajlovic, Xiyi Chen, Yiming Wang, Sergey Prokudin, Siyu Tang

We analyze the performance of novel view synthesis methods in challenging out-of-distribution (OOD) camera views and introduce SplatFormer, a data-driven 3D transformer designed to refine 3D Gaussian splatting primitives for improved quality in extreme camera scenarios.

Welcome to
Computer Vision and Learning Group.

Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

Featured Projects

Spline Deformation Field

Aug 10, 2025

GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering

Aug 10, 2025

Relightable Full-body Gaussian Codec Avatars

Aug 10, 2025

EgoM2P: Egocentric Multimodal Multitask Pretraining

Oct 19, 2025

Text-based Animatable 3D Avatars with Morphable Model Alignment

Mar 31, 2025

RISE-SDF: a Relightable Information-Shared Signed Distance Field for Glossy Object Inverse Rendering

Jan 24, 2025

DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

Jan 23, 2025

SplatFormer: Point Transformer for Robust 3D Gaussian Splatting

Jan 23, 2025

Latest News

7 papers (2 orals) accepted at CVPR 2024

5 papers (2 orals) accepted at ICCV 2023

Congratulations to Siwei for winning Qualcomm Innovation Fellowship (QIF) Europe 2023!

2 papers accepted at ICRA 2023

Welcome to Computer Vision and Learning Group.

Our group conducts research in Computer Vision, focusing on perceiving and modeling humans.

Featured Projects

Spline Deformation Field

Aug 10, 2025

GaVS: 3D-Grounded Video Stabilization via Temporally-Consistent Local Reconstruction and Rendering

Aug 10, 2025

Relightable Full-body Gaussian Codec Avatars

Aug 10, 2025

EgoM2P: Egocentric Multimodal Multitask Pretraining

Oct 19, 2025

Text-based Animatable 3D Avatars with Morphable Model Alignment

Mar 31, 2025

RISE-SDF: a Relightable Information-Shared Signed Distance Field for Glossy Object Inverse Rendering

Jan 24, 2025

DartControl: A Diffusion-Based Autoregressive Motion Model for Real-Time Text-Driven Motion Control

Jan 23, 2025

SplatFormer: Point Transformer for Robust 3D Gaussian Splatting

Jan 23, 2025

Latest News

7 papers (2 orals) accepted at CVPR 2024

5 papers (2 orals) accepted at ICCV 2023

Congratulations to Siwei for winning Qualcomm Innovation Fellowship (QIF) Europe 2023!

2 papers accepted at ICRA 2023

Welcome to
Computer Vision and Learning Group.