Xiongjun Guan's Homepage

Xiongjun Guan

I am a fourth year Ph.D student in i-VisionGroup in the Department of Automation at Tsinghua University, advised by Prof. Jianjiang Feng and Prof. Jie Zhou .
In 2021, I received my B.S. degree from the Department of Automation and a minor from Academy of Arts and Design , Tsinghua University.

I have a broad interest in computer vision, pattern recognition and human computer interaction. At present, my research mainly focus on MLLMs, computer vision and Image Retrieval.

Email / Google Scholar / Github

News

2025-07: 1 paper on fingerprint indexing is submitted to T-BIOM.

2025-07: We achieved the 2nd place on MER25 @ ACM MM (multi-modal affective computing).

2025-07: 1 paper on finger photo pose estimation is accepted by IJCB 2025 as Oral.

2025-06: 1 paper about our runner-up solution on Ego4D EgoSchema Challenge @ CVPR 2025 is accepted by EgoVis @ CVPR 2025.

2025-06: We achieved the 2nd place on CVRR @ CVPR 2025 (complex video reasoning & robustness evaluation).

2025-05: We achieved the 2nd place on Ego4D EgoSchema Challenge @ CVPR 2025 (very long-form video understanding & question-answering).

2025-05: 1 paper on fixed-length fingerprint descriptor is submitted to T-IFS.

2025-05: 1 paper on pose estimation for under-screen fingerprint sensor is submitted to T-IFS.

2025-05: 1 paper on finger pose interaction is submitted to T-MC.

2024-11: 1 paper on partial fingerprint is accepted by T-IFS.

2024-06: 1 paper on latent fingerprint desctiptor is accepted by IJCB 2024 as Poster.

2024-05: 1 paper on fingerprint dense registration is accepted by T-IFS.

2023-07: 1 paper on fingerprint distortion rectification is accepted by T-IFS.

2022-08: 1 paper on fingerprint distortion rectification is accepted by IJCB 2022 as Oral.

2021-09: 1 paper on 3D fingerprint unfolding and visualization is accepted by CCBR 2021 as Oral.

2021-09: Joined Intelligent Vision Group (i-VisionGroup) as a Ph.D. candidate supervised by Prof. Jianjiang Feng & Prof. Jie Zhou.

Research

* indicates equal contribution

Preprints

	Minutiae-Anchored Local Dense Representation for Fingerprint Matching ZhiyuPan , Xiongjun Guan , Jianjiang Feng , Jie Zhou [Paper] [Code] we propose DMD, a minutiae-anchored local dense representation which captures both fine-grained ridge textures and discriminative minutiae features in a spatially structured manner. Specifically, descriptors are extracted from local patches centered and oriented on each detected minutia, forming a three-dimensional tensor, where two dimensions represent spatial locations on the fingerprint plane and the third encodes semantic features.
	ZeroES: Zero-Shot Ensemble for Open-Vocabulary Video Emotion Recognition with Large Multimodal Models 2nd place in International Conference on Multimedia (MM) competition , 2025 Jun Xie , Xiaohui Fan , Zhenghao Zhang, Feng Chen, Hongzhu Yi, Yinjian Zhu, Xiongjun Guan, Xinming Wang, Yue Bi, Tao Zhang, Zhepeng Wang [Paper] [Challenge] Emotion recognition has long grappled with the inherent subjectivity and open-ended nature of human affect, where predefined taxonomies falter against the vast, evolving spectrum of emotional expression. We present ZeroES, a Zero-Shot Ensemble framework that redefines open-vocabulary video emotion recognition by leveraging the raw capacity of large-scale vision-language models (VLMs) without task-specific optimization.
	More Is Better: A MoE-Based Emotion Recognition Framework with Human Preference Alignment Jun Xie , Yingjian Zhu , Feng Chen, Zhenghao Zhang, Xiaohui Fan, Hongzhu Yi, Xinming Wang, Chen Yu, Yue Bi, Zhaoran Zhao, Xiongjun Guan (corresponding author), Zhepeng Wang 2nd place in International Conference on Multimedia (MM) competition , 2025 [Paper] [Code] [Challenge] We propose a robust Mixture of Experts (MoE) emotion recognition framework that integrates diverse modalities, including Vision-Language Models and Action Unit information. Using consensus-based pseudo-labeling and a two-stage training process, we enhance label quality and reduce bias through multi-expert voting and rule-based re-ranking for human-aligned predictions.
	Team of One: Cracking Complex Video QA with Model Synergy Jun Xie , Zhaoran Zhao , Xiongjun Guan , Yingjian Zhu, Hongzhu Yi, Xinming Wang, Feng Chen, Zhepeng Wang 2nd place in Computer Vision and Pattern Recognition (CVPR) competition , 2025 [Paper] [Challenge] We propose a novel framework for open-ended video question answering that enhances reasoning depth and robustness in complex real-world scenarios, as benchmarked on the CVRR-ES dataset.
	Finger Pose Estimation for Under-screen Fingerprint Sensor Xiongjun Guan , ZhiyuPan , Jianjiang Feng , Jie Zhou [Paper] [Code] We introduce a partial fingerprint pose estimation framework that leverages the collaborative potential of Dual-modal guidance from Ridge patches And Capacitive images to Optimize the feature extraction, fusion and representation. Several simple but effective strategies and mechanisms are introduced, including knowledge transfer, MoE, and decoupled probability distribution, to enhance the network's capacity for information mining and interaction.
	Fixed-Length Dense Fingerprint Representation ZhiyuPan , Xiongjun Guan , Jianjiang Feng , Jie Zhou [Paper] [Code] In this work, we propose a fixed-length dense descriptor of fingerprints, and introduce FLARE—a fingerprint matching framework that integrates the Fixed-Length dense descriptor with pose-based Alignment and Robust Enhancement. This fixed-length representation employs a three-dimensional dense descriptor to effectively capture spatial relationships among fingerprint ridge structures, enabling robust and locally discriminative representations.
	BiFingerPose: Bimodal Finger Pose Estimation for Touch Device Interaction Xiongjun Guan , ZhiyuPan , Jianjiang Feng , Jie Zhou (under review) In this paper, we estimate the 2D finger pose using a multimodal network and map it to a standardized UV space, followed by nearly lossless mapping to 3D space using simple polynomial functions. We further highlight the applicability and appeal of finger pose in enhancing interactive experiences, and develop several prototypes to demonstrate the potential for interaction.

Publications

	Contactless Fingerprint Recognition Guided by 3D Finger Pose Haoxiang Pei, ZhiyuPan , Xiongjun Guan , Jianjiang Feng , Jie Zhou International Joint Conference on Biometrics (IJCB), 2025 Oral Presentation [Paper] [Code] We demonstrate that 3D pose information of contactless fingerprints can be utilized to enhance the robustness and performance of existing recognition systems by guiding the acquisition process and constraining finger poses.
	Four Eyes Are Better Than Two: Harnessing the Collaborative Potential of Large Models via Differentiated Thinking and Complementary Ensembles Jun Xie , Xiongjun Guan (first student author),Yingjian Zhu, Zhaoran Zhao, Xinming Wang, Hongzhu Yi, Feng Chen, Zhepeng Wang 2nd place** in Computer Vision and Pattern Recognition (CVPR) competition , 2025 [Paper] [Challenge] [Code] We present the runner-up solution for the Ego4D EgoSchema Challenge at CVPR 2025 (Confirmed on May 20, 2025). Inspired by the success of large models, we evaluate and leverage leading accessible multimodal large models and adapt them to video understanding tasks via few-shot learning and model ensemble strategies.
	Joint Identity Verification and Pose Alignment for Partial Fingerprints Xiongjun Guan , ZhiyuPan , Jianjiang Feng , Jie Zhou IEEE Transactions on Information Forensics and Security (T-IFS), 2025 [Paper] [Code] A novel framework for joint partial fingerprint identity verification and pose alignment of partial fingerprint pairs is proposed, which utilizes a multi-task CNN-Transformer hybrid network and a pre-training task on enhancement.
	Latent Fingerprint Matching via Dense Minutia Descriptor ZhiyuPan , Yongjie Duan , Xiongjun Guan , Jianjiang Feng , Jie Zhou International Joint Conference on Biometrics (IJCB), 2024 [Paper] [Code] Latent fingerprint matching is a daunting task, primarily due to the poor quality of latent fingerprints. In this study, we propose a deep-learning based dense minutia descriptor (DMD) for latent fingerprint matching.
	Phase-aggregated Dual-branch Network for Efficient Fingerprint Dense Registration Xiongjun Guan , Jianjiang Feng , Jie Zhou IEEE Transactions on Information Forensics and Security (T-IFS), 2024 [Paper] [Code] We propose a Phase-aggregated Dual-branch Registration Network to combine the strengths of traditional fingerprint dense registration methods and deep learning.
	Regression of Dense Distortion Field from a Single Fingerprint Image Xiongjun Guan , Yongjie Duan , Jianjiang Feng , Jie Zhou IEEE Transactions on Information Forensics and Security (T-IFS), 2023 [Paper] [Code] We proposed an end-to-end network to directly estimate a dense distortion field instead of its low dimensional representation, from a single fingerprint.
	Direct Regression of Distortion Field from a Single Fingerprint Image Xiongjun Guan , Yongjie Duan , Jianjiang Feng , Jie Zhou International Joint Conference on Biometrics (IJCB), 2022 Oral Presentation [Paper] [Code] [Slide] We proposed an end-to-end network to directly estimate a dense distortion field from a single fingerprint instead of its low dimensional representation.
	Pose-Specific 3D Fingerprint Unfolding Xiongjun Guan , Jianjiang Feng , Jie Zhou Chinese Conference on Biometric Recognition (CCBR), 2021 Oral Presentation [Paper] [Slide] We proposed a visualization and pose-specific unfolding method for 3D fingerprints, which can improve the compatibility between 3D and 2D fingerprints in recognition.

Honors and Awards

Comprehensive Excellence Award, Tsinghua University, 2022 & 2023 & 2024

Silver Award for Social Practice of Ph.D Students, Tsinghua University, 2023

Excellent Graduates, Tsinghua University, 2021

Outstanding Graduates, Tsinghua University (Dept. of Automation), 2021

Academic Excellence Award, Tsinghua University, 2019 & 2020

1st Prize in the 35th China Regional College Students Physics Competition, 2018

Gold Medal in the 33th Chinese Physics Olympiad (Top 100 in the country), 2016

Teaching

Teaching Assistant, Programming Fundamentals, 2024 Spring Semester

Teaching Assistant, Interdisciplinary Research and Practice: Image Processing, 2023 Fall Semester

Teaching Assistant, Basic of Information Theory, 2023 Spring Semester

Teaching Assistant, Digital Image Processing, 2022 Fall Semester

© Xiongjun Guan | Last updated: Aug. 13, 2025