News
2025-07: 1 paper on fingerprint indexing is
submitted
to T-BIOM.
2025-07: We achieved the 2nd
place
on MER25
@ ACM MM
(multi-modal affective computing).
2025-07: 1 paper on finger photo pose estimation is accepted by
IJCB 2025 as Oral.
2025-06: 1 paper about our runner-up solution on Ego4D EgoSchema
Challenge @ CVPR 2025 is accepted by EgoVis @ CVPR 2025.
2025-06: We achieved the 2nd
place
on CVRR @ CVPR 2025
(complex video reasoning & robustness evaluation).
2025-05: We achieved the 2nd
place
on Ego4D EgoSchema
Challenge @ CVPR 2025
(very
long-form video understanding &
question-answering).
2025-05: 1 paper on fixed-length fingerprint descriptor is
submitted
to T-IFS.
2025-05: 1 paper on pose estimation for under-screen fingerprint
sensor is submitted to T-IFS.
2025-05: 1 paper on finger pose interaction is submitted to T-MC.
2024-11: 1 paper on partial fingerprint is accepted by T-IFS.
2024-06: 1 paper on latent fingerprint desctiptor is accepted by
IJCB 2024 as Poster.
2024-05: 1 paper on fingerprint dense registration is accepted by
T-IFS.
2023-07: 1 paper on fingerprint distortion rectification is
accepted by
T-IFS.
2022-08: 1 paper on fingerprint distortion rectification is
accepted by
IJCB
2022 as Oral.
2021-09: 1 paper on 3D fingerprint unfolding and visualization is
accepted by
CCBR 2021 as Oral.
2021-09: Joined Intelligent
Vision Group (i-VisionGroup) as a Ph.D. candidate supervised by
Prof. Jianjiang Feng & Prof. Jie Zhou.
|
Research
* indicates equal contribution
|
Preprints
|
Minutiae-Anchored Local Dense Representation for
Fingerprint Matching
ZhiyuPan ,
Xiongjun Guan ,
Jianjiang Feng
,
Jie Zhou
[Paper]
[Code]
we propose DMD, a minutiae-anchored local dense
representation which captures both fine-grained ridge
textures and discriminative minutiae features in a spatially
structured manner. Specifically, descriptors are extracted
from local patches centered and oriented on each detected
minutia, forming a
three-dimensional tensor, where two dimensions represent
spatial locations on the fingerprint plane and the third
encodes semantic features.
|
|
ZeroES: Zero-Shot Ensemble for Open-Vocabulary
Video Emotion
Recognition with Large Multimodal Models
2nd place
in International Conference on Multimedia
(MM) competition
, 2025
Jun Xie *, Xiaohui Fan *, Zhenghao Zhang, Feng Chen, Hongzhu Yi,
Yinjian Zhu, Xiongjun Guan,
Xinming Wang, Yue Bi, Tao Zhang,
Zhepeng Wang
[Paper]
[Challenge]
Emotion recognition has long grappled with the inherent
subjectivity and open-ended nature of human affect, where
predefined taxonomies falter against the vast, evolving
spectrum of emotional expression. We present ZeroES, a
Zero-Shot Ensemble framework that
redefines open-vocabulary video emotion recognition by
leveraging the raw capacity of large-scale vision-language
models (VLMs)
without task-specific optimization.
|
|
More Is Better: A MoE-Based Emotion Recognition
Framework with Human Preference Alignment
Jun Xie *, Yingjian Zhu *, Feng Chen, Zhenghao Zhang, Xiaohui
Fan, Hongzhu Yi, Xinming Wang, Chen Yu, Yue Bi, Zhaoran Zhao,
Xiongjun Guan (corresponding author),
Zhepeng Wang
2nd place
in International Conference on Multimedia
(MM) competition
, 2025
[Paper]
[Code]
[Challenge]
We propose a robust Mixture of Experts (MoE) emotion
recognition framework that integrates diverse modalities,
including Vision-Language Models and Action Unit
information. Using consensus-based pseudo-labeling and a
two-stage training process, we enhance label quality and
reduce bias through multi-expert voting and rule-based
re-ranking for human-aligned predictions.
|
|
Team of One: Cracking Complex Video QA with Model
Synergy
Jun Xie *, Zhaoran Zhao *, Xiongjun Guan ,
Yingjian Zhu, Hongzhu Yi, Xinming Wang,
Feng Chen,
Zhepeng Wang
2nd place
in Computer Vision
and
Pattern Recognition (CVPR) competition
, 2025
[Paper]
[Challenge]
We propose a novel framework for open-ended video question
answering that enhances reasoning depth and robustness in
complex real-world scenarios, as benchmarked on the CVRR-ES
dataset.
|
|
Finger Pose Estimation for Under-screen Fingerprint
Sensor
Xiongjun Guan ,
ZhiyuPan ,
Jianjiang Feng
,
Jie Zhou
[Paper]
[Code]
We introduce a partial fingerprint pose estimation framework
that
leverages the collaborative potential of Dual-modal guidance
from Ridge
patches
And Capacitive images to Optimize the feature extraction,
fusion and
representation. Several simple but effective strategies and
mechanisms
are introduced, including knowledge transfer, MoE, and
decoupled probability
distribution, to enhance the network's capacity for
information mining and
interaction.
|
|
Fixed-Length Dense Fingerprint Representation
ZhiyuPan ,
Xiongjun Guan ,
Jianjiang Feng
,
Jie Zhou
[Paper]
[Code]
In this work, we propose a fixed-length dense descriptor of
fingerprints,
and
introduce FLARE—a fingerprint matching framework that
integrates the
Fixed-Length dense descriptor with pose-based Alignment and
Robust
Enhancement.
This fixed-length representation employs a three-dimensional
dense
descriptor to
effectively capture spatial relationships among fingerprint
ridge
structures,
enabling robust and locally discriminative representations.
|
|
BiFingerPose: Bimodal Finger Pose Estimation for
Touch Device
Interaction
Xiongjun Guan ,
ZhiyuPan ,
Jianjiang Feng
,
Jie Zhou
(under review)
In this paper, we estimate the 2D finger pose using a
multimodal network and
map
it to a standardized UV space, followed by nearly lossless
mapping to 3D
space
using simple polynomial functions. We further highlight the
applicability
and
appeal of finger pose in enhancing interactive experiences,
and develop
several
prototypes to demonstrate the potential for interaction.
|
|
Publications
|
Contactless Fingerprint Recognition Guided by 3D
Finger Pose
Haoxiang Pei,
ZhiyuPan ,
Xiongjun Guan ,
Jianjiang Feng
,
Jie Zhou
International Joint Conference on Biometrics
(IJCB), 2025
Oral Presentation
[Paper]
[Code]
We demonstrate
that 3D pose information of contactless fingerprints can be
utilized to enhance the robustness and performance of
existing recognition systems by guiding the acquisition
process
and constraining finger poses.
|
|
Four Eyes Are Better Than Two: Harnessing the
Collaborative
Potential
of Large Models via Differentiated Thinking and
Complementary Ensembles
Jun Xie *, Xiongjun Guan * (first
student author),Yingjian Zhu, Zhaoran Zhao, Xinming
Wang,
Hongzhu Yi,
Feng Chen,
Zhepeng Wang
2nd place
in Computer Vision
and
Pattern Recognition (CVPR) competition
, 2025
[Paper]
[Challenge]
[Code]
We present the runner-up solution for the Ego4D EgoSchema
Challenge at CVPR
2025
(Confirmed on May 20, 2025). Inspired by the success of
large models, we
evaluate and leverage leading accessible multimodal large
models and adapt
them
to video understanding tasks via few-shot learning and model
ensemble
strategies.
|
|
Joint Identity Verification and Pose Alignment for
Partial Fingerprints
Xiongjun Guan ,
ZhiyuPan ,
Jianjiang Feng
,
Jie Zhou
IEEE Transactions on Information Forensics and Security
(T-IFS), 2025
[Paper]
[Code]
A novel framework for joint partial fingerprint identity
verification and
pose alignment of partial fingerprint pairs is proposed,
which utilizes a
multi-task
CNN-Transformer hybrid network and a pre-training task on
enhancement.
|
|
Latent Fingerprint Matching via Dense Minutia
Descriptor
ZhiyuPan ,
Yongjie Duan ,
Xiongjun Guan ,
Jianjiang Feng
,
Jie Zhou
International Joint Conference on Biometrics
(IJCB), 2024
[Paper]
[Code]
Latent fingerprint matching is a daunting task, primarily
due to the poor
quality of latent fingerprints.
In this study, we propose a deep-learning based dense
minutia descriptor (DMD)
for latent fingerprint matching.
|
|
Phase-aggregated Dual-branch Network for Efficient
Fingerprint Dense Registration
Xiongjun Guan ,
Jianjiang Feng
,
Jie Zhou
IEEE Transactions on Information Forensics and Security
(T-IFS), 2024
[Paper]
[Code]
We propose a Phase-aggregated Dual-branch Registration
Network to combine the
strengths of
traditional fingerprint dense registration methods and deep
learning.
|
|
Regression of Dense Distortion Field from a Single
Fingerprint Image
Xiongjun Guan ,
Yongjie Duan ,
Jianjiang Feng
,
Jie Zhou
IEEE Transactions on Information Forensics and Security
(T-IFS), 2023
[Paper]
[Code]
We proposed an end-to-end network to directly estimate a
dense distortion field
instead of
its low dimensional representation, from a single
fingerprint.
|
|
Direct Regression of Distortion Field from a Single
Fingerprint Image
Xiongjun Guan ,
Yongjie Duan ,
Jianjiang Feng
,
Jie Zhou
International Joint Conference on Biometrics
(IJCB), 2022
Oral Presentation
[Paper]
[Code]
[Slide]
We proposed an end-to-end network to directly estimate a
dense distortion field
from a single
fingerprint instead of its low dimensional representation.
|
|
Pose-Specific 3D Fingerprint Unfolding
Xiongjun Guan ,
Jianjiang Feng
,
Jie Zhou
Chinese Conference on Biometric Recognition
(CCBR), 2021
Oral Presentation
[Paper]
[Slide]
We proposed a visualization and pose-specific unfolding
method for 3D
fingerprints, which can
improve the compatibility between 3D and 2D fingerprints in
recognition.
|
|
Honors and Awards
Comprehensive Excellence Award, Tsinghua University,
2022 & 2023 & 2024
Silver Award for Social Practice of Ph.D Students,
Tsinghua University,
2023
Excellent Graduates, Tsinghua University, 2021
Outstanding Graduates, Tsinghua University (Dept. of
Automation), 2021
Academic Excellence Award, Tsinghua University, 2019 &
2020
1st Prize in the 35th China Regional College Students
Physics Competition,
2018
Gold Medal in the 33th Chinese Physics Olympiad (Top
100 in the country),
2016
|
Teaching
Teaching Assistant, Programming Fundamentals, 2024
Spring
Semester
Teaching Assistant, Interdisciplinary Research and
Practice: Image Processing, 2023 Fall
Semester
Teaching Assistant, Basic of Information Theory, 2023
Spring
Semester
Teaching Assistant, Digital Image Processing, 2022
Fall Semester
|
|