-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathindex.html
More file actions
543 lines (518 loc) · 33 KB
/
index.html
File metadata and controls
543 lines (518 loc) · 33 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
<!DOCTYPE html>
<html lang="eng">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<meta http-equiv="X-UA-Compatible" content="chrome=1">
<meta name="viewport" content="width=device-width">
<link rel="stylesheet" href="./styles.css">
<title>Fei Pan @ Univ. of Michigan</title>
</head>
<body>
<div class='wrapper'>
<table>
<tbody><tr>
<td width="120">
<img src="./files/fei_bio.png">
</td>
<td>
<h2>Fei Pan</h2>
feipan [at] umich.edu<br>
<a href="./files/fei_cv.pdf">CV</a> |
<a href="https://scholar.google.com/citations?hl=en&user=VGE3DlYAAAAJ"> Google Scholar</a>
</td>
</tr>
</tbody></table>
<p align="justify">I am a Research Fellow in EECS at <a href="https://umich.edu/">University of Michigan</a>
and fortunate to work with Prof. <a href="https://scholar.google.com/citations?user=uqWkLzMAAAAJ"> Stella X. Yu</a>.
My research lies in Computer Vision and Machine Learning.
I am interested in developing large-scale learning algorithms
for visual tasks with strong generalizability, vigorous robustness, and minimal human supervision.
I obtained my Ph.D. degree in 2023 under the supervision from Prof. <a href="https://scholar.google.com/citations?user=XA8EOlEAAAAJ&hl=en">In So Kweon</a> at <a href="https://www.kaist.ac.kr/en/">KAIST</a>.
I've received <a href="https://www.qualcomm.com/research/university-relations/innovation-fellowship">Innovation Fellowship</a> from <a href="https://www.qualcomm.com/">Qualcomm</a> and
Ph.D. scholarship from <a href="https://www.bosch.com/">BOSCH</a> during my Ph.D. course. <br>
</p>
<p>
<h3 id="research_interests">Research Interest</h3>
<ul>
<li>Grouping and Segmentation</li>
<li>Large-Scale Vision & Language Models</li>
<li>Adaptation & Generalization of Deep Learning Models</li>
</ul>
</p>
<h3 id="publications">Publications</h3>
<!-- <h4 id="2023">2023</h4> -->
<div class="read-more-container">
<ul>
<!-- Every paper starts with <li> and ends with </li> -->
<!-- Paper boundary -->
<li>
<div class="container">
<p>
MoDA: Leveraging Motion Prior from Videos for Advancing Unsupervised Domain
Adaptation in Semantic Segmentation.<br>
<strong>Fei Pan</strong>, Xu Yin, Seokju Lee, Axi Niu, Sungeui Yoon, In So Kweon.<br>
IEEE/CVF Computer Vision and Pattern Recognition Conference Workshop (CVPRW), 2024. <a href="https://arxiv.org/pdf/2309.11711.pdf">[pdf]</a><a href="https://github.com/feipanir/MoDA/tree/main">[code]</a> <br>
<i>Learning with Limited Labelled Data for Image and Video Understanding.</i><br>
<b style="color:red;">Best Paper Award</b> <br>
<span class="read-more-text">
<b>Abstract</b>
<br>
Unsupervised domain adaptation (UDA) is an effective approach to handle the
lack of annotations in the target domain for the semantic segmentation task.
In this work, we consider a more practical UDA setting where the target domain
contains sequential frames of the unlabeled videos which are easy to collect
in practice. A recent study suggests self-supervised learning of the object motion
from unlabeled videos with geometric constraints. We design a motion-guided domain
adaptive semantic segmentation framework (MoDA), that utilizes self-supervised object
motion to learn effective representations in the target domain. MoDA differs from
previous methods that use temporal consistency regularization for the target domain frames.
Instead, MoDA deals separately with the domain alignment on the foreground and
background categories using different strategies. Specifically, MoDA contains foreground
object discovery and foreground semantic mining to align the foreground domain gaps by
taking the instance-level guidance from the object motion.
Additionally, MoDA includes background adversarial training which contains a background
category-specific discriminator to handle the background domain gaps.
Experimental results on multiple benchmarks highlight the effectiveness of
MoDA against existing approaches in the domain adaptive image segmentation and
domain adaptive video segmentation. Moreover, MoDA is versatile and can be used in
conjunction with existing state-of-the-art approaches to further improve performance.
<br>
<br>
<b>Key Words:</b>
Unsupervised Domain Adaptation, Semantic Segmentation, Motion Understanding, Geometric Learning.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
<!-- Paper boundary -->
<li>
<div class="container">
<p>
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object.<br>
Chenshuang Zhang, <strong>Fei Pan</strong>, Junmo Kim, In So Kweon, Chengzhi Mao.<br>
IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2024. <a href="https://arxiv.org/pdf/2403.18775.pdf">[pdf]</a> <a href="https://github.com/chenshuang-zhang/imagenet_d">[code]</a><br>
<b style="color:red;">Highlight Poster</b> <br>
<span class="read-more-text">
<b>Abstract</b>
<br>
We establish rigorous benchmarks for visual perception robustness.
Synthetic images such as ImageNet-C, ImageNet-9, and Stylized ImageNet provide specific type of
evaluation over synthetic corruptions, backgrounds, and textures,
yet those robustness benchmarks are restricted in specified variations and have low synthetic quality.
In this work, we introduce generative model as a data source for synthesizing hard images that
benchmark deep models' robustness.
Leveraging diffusion models, we are able to generate images with more diversified backgrounds,
textures, and materials than any prior work, where we term this benchmark as ImageNet-D.
Experimental results show that ImageNet-D results in a significant accuracy drop
to a range of vision models, from the standard ResNet visual classifier to the
latest foundation models like CLIP and MiniGPT-4, significantly reducing their accuracy
by up to 64%. Our work suggests that diffusion models can be an effective source to test vision models.
<br>
<br>
<b>Key Words:</b>
Diffusion Models, Large-Scale Vision and Language Models, Robustness and Generalization,.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
<!-- Paper boundary -->
<li>
<div class="container">
<p>
Zero-shot Building Attribute Extraction from Large-Scale Vision and Language Models.<br>
<strong>Fei Pan</strong>, Sangryul Jeon, Brian Wang, Frank Mckenna, Stella Yu.<br>
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2024. <a href="https://arxiv.org/pdf/2312.12479.pdf">[pdf]</a> <a href="https://github.com/BuildingInfoSys/zeroshot_attribute_extraction">[code]</a> <a href="https://drive.google.com/file/d/1t8vMpSuvm7KgLj0hKt0I6jxb6gzlYrep/view">[poster]</a><br>
<span class="read-more-text">
<b>Abstract</b>
<br>
Modern building recognition methods, exemplified by the BRAILS framework,
utilize supervised learning to extract information from satellite and
street-view images for image classification and semantic segmentation tasks.
However, each task module requires human-annotated data,
hindering the scalability and robustness to regional variations and annotation imbalances.
In response, we propose a new zero-shot workflow for building attribute extraction
that utilizes large-scale vision and language models to mitigate reliance on external annotations.
The proposed workflow contains two key components: image-level captioning and
segment-level captioning for the building images based on the vocabularies
pertinent to structural and civil engineering.
These two components generate descriptive captions by computing feature
representations of the image and the vocabularies,
and facilitating a semantic match between the visual and textual representations.
Consequently, our framework offers a promising avenue to enhance AI-driven
captioning for building attribute extraction in the structural and
civil engineering domains, ultimately reducing reliance on human annotations
while bolstering performance and adaptability.
<br>
<br>
<b>Key Words:</b>
Zero-shot Leanring, Building Attribute Extraction, Large-Scale Vision & Language Models.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
<!-- Paper boundary -->
<li>
<div class="container">
<p>
Masking-augmented Collaborative Domain Congregation for
Multi-target Domain Adaptation in Semantic Segmentation.<br>
<strong>Fei Pan</strong><sup>*</sup>, Dong He<sup>*</sup>, Xu Yin, Chenshuang Zhang, Munchurl Kim.<br>
IEEE Intelligent Vehicles Symposium (IV), 2024.<br>
<b style="color:red;">Best Paper Nominated</b> <br>
<span class="read-more-text">
<b>Abstract</b> <br>
This paper addresses the challenges in multi-target domain adaptive segmentation
which aims at learning a single model that adapts to multiple diverse target domains.
Existing methods show limited performance as they only consider the difference in visual appearance (style)
while ignoring the (contextual) variations among multiple target domains.
In contrast, we propose a novel approach termed Masking-augmented Collaborative Domain Congregation (MacDC)
to handle the style gap and contextual gap altogether.
The proposed MacDC comprises two key parts: collaborative domain congregation (CDC) and multi-context masking consistency (MCMC).
Our CDC handles the style and contextual gaps among target domains by data mixing, which generates image-level and region-level
intermediate domains among target domains. To further strengthen contextual alignment,
our MCMC applies a masking-based self-supervised augmentation consistency that enforces the model's understanding of
diverse contexts together.
MacDC directly learns a single model for multi-target domain adaptation without requiring multiple network training and subsequent distillation.
Despite its simplicity, MacDC shows efficacy in mitigating the style and contextual gap among multiple target domains and demonstrates
superior performance on multi-target domain adaptation for segmentation benchmarks compared to existing state-of-the-art approaches.
<br>
<br>
<b>Key Words:</b> Multi-target Domain Adaptation, Semantic Segmentation, Masking Consistency, Self-supervised Data Augmentation.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
<!-- Paper boundary -->
<li>
<div class="container">
<p>
CCTV-Calib: a Toolbox to Calibrate Surveillance Cameras Around the Globe.<br>
Francois Rameau, Jaesung Choe, <strong>Fei Pan</strong>, Seokju Lee, In So Kweon.<br>
Machine Vision and Applications, 2023. <a href="https://trebuchet.public.springernature.app/get_content/52aff0d9-9afd-4117-a037-d0e8e34fd66c?utm_source=rct_congratemailt&utm_medium=email&utm_campaign=nonoa_20231021&utm_content=10.1007/s00138-023-01476-1">[pdf]</a> <a href="https://github.com/rameau-fr/CCTV-Calib">[code]</a> <br>
<span class="read-more-text">
<b>Abstract</b>
<br>
In this paper, we propose CCTV-Calib, a user-friendly toolbox to calibrate
traffic cameras using satellite views.
Specifically, CCTV-Calib can estimate the intrinsic and extrinsic
parameters as well as the GPS location of one or multiple CCTV
cameras in a few clicks. Previous surveillance camera calibration
strategies rely on various assumptions on the camera parameters
(e.g., absence of radial distortion), location, or detected objects
in the scene. In contrast, our system is able to calibrate both
perspective and fisheye cameras without restrictive structural
or semantic assumptions. In fact, only a few correspondences
between an image and its satellite view are sufficient to accurately
calibrate a camera. Such kind of camera geo-localization and
calibration via satellite imaging has yet attracted narrow attention.
As a result, most existing techniques naively rely on manually
clicked keypoint correspondences between the satellite view and
the CCTV image, leading to poor accuracy and repeatability. To
cope with these limitations and to ease the calibration process, we
propose an automated keypoints matching stage and a refinement
process improving the accuracy of the computed parameters. Our
toolbox has been qualitatively and quantitatively evaluated using
synthetic and real data from various traffic cameras around the
globe. We made these unique datasets freely available to the
community. Finally, in order to illustrate the relevance of our
calibration strategy, we demonstrate its applicability to 3D vehicle
geolocalization. Our novel calibration pipeline is integrated in a
easy to use GUI and is freely available via the following link:
https://github.com/rameau-fr/CCTV-Calib.
<br>
<br>
<b>Key Words:</b>
Camera Calibration, CCTV, Vehicle Geolocalization.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
<!-- Paper boundary -->
<li>
<div class="container">
<p>
ML-BPM: Multi-teacher Learning with Bidirectional Photometric Mixing for Open
Compound Domain Adaptation in Semantic Segmentation.<br>
<strong>Fei Pan</strong>, Sungsu Hur, Seokju Lee, Junsik Kim, In So Kweon.<br>
European Conference on Computer Vision (ECCV), 2022. <a href="https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136940228.pdf">[pdf]</a> <br>
<span class="read-more-text">
<b>Abstract</b>
<br>
Open compound domain adaptation (OCDA) considers the target domain as the
compound of multiple unknown homogeneous subdomains.
The goal of OCDA is to minimize the domain gap between the labeled source domain
and the unlabeled compound target domain, which benefits the model generalization
to the unseen domains. Current OCDA for semantic segmentation methods adopt manual
domain separation and employ a single model to simultaneously adapt to all the
target subdomains. However, adapting to a target subdomain might hinder the model
from adapting to other dissimilar target subdomains, which leads to limited performance.
In this work, we introduce a multi-teacher framework with bidirectional photometric
mixing to separately adapt to every target subdomain. First, we present an automatic
domain separation to find the optimal number of subdomains. On this basis, we propose
a multi-teacher framework in which each teacher model uses bidirectional photometric
mixing to adapt to one target subdomain. Furthermore, we conduct an adaptive distillation
to learn a student model and apply consistency regularization to improve the student
generalization. Experimental results on benchmark datasets show the efficacy of the
proposed approach for both the compound domain and the open domains against existing
state-of-the-art approaches.
<br>
<br>
<b>Key Words:</b>
Domain Adaptation, Open Compound Domain Adaptation, Semantic Segmentation, Multi-teacher Distillation.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
<!-- Paper boundary -->
<li>
<div class="container">
<p>
Labeling Where Adapting Fails: Cross-Domain Semantic Segmentation with Point
Supervised via Active Learning.<br>
<strong>Fei Pan</strong>, Francois Rameau, Junsik Kim, In So Kweon. <br>
arXiv, 2022. <a href="https://browse.arxiv.org/pdf/2206.00181.pdf">[pdf]</a><br>
<span class="read-more-text">
<b>Abstract</b>
<br>
Training models dedicated to semantic segmentation requires a large amount
of pixel-wise annotated data. Due to their costly nature, these annotations
might not be available for the task at hand. To alleviate this problem,
unsupervised domain adaptation approaches aim at aligning the feature
distributions between the labeled source and the unlabeled target data.
While these strategies lead to noticeable improvements, their effectiveness
remains limited. To guide the domain adaptation task more efficiently, previous
works attempted to include human interactions in this process under the form of
sparse single-pixel annotations in the target data. In this work, we propose a
new domain adaptation framework for semantic segmentation with annotated points
via active selection. First, we conduct an unsupervised domain adaptation of the
model; from this adaptation, we use an entropy-based uncertainty measurement for
target points selection. Finally, to minimize the domain gap, we propose a domain
adaptation framework utilizing these target points annotated by human annotators.
Experimental results on benchmark datasets show the effectiveness of our methods
against existing unsupervised domain adaptation approaches. The propose pipeline
is generic and can be included as an extra module to existing domain adaptation strategies.
<br>
<br>
<b>Key Words:</b>
Active Learning, Unsupervised Domain Adaptation, Semantic Segmentation.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
<!-- Paper boundary -->
<li>
<div class="container">
<p>
Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation.<br>
Seokju Lee, Francois Rameau, <strong>Fei Pan</strong>, In So Kweon.<br>
International Conference on Computer Vision (ICCV), 2021. <a href="https://openaccess.thecvf.com/content/ICCV2021/papers/Lee_Attentive_and_Contrastive_Learning_for_Joint_Depth_and_Motion_Field_ICCV_2021_paper.pdf">[pdf]</a> <a href="https://github.com/SeokjuLee/Insta-DM">[code]</a> <br>
<span class="read-more-text">
<b>Abstract</b>
<br>
Estimating the motion of the camera together with the 3D
structure of the scene from a monocular vision system is a
complex task that often relies on the so-called scene rigidity
assumption. When observing a dynamic environment, this
assumption is violated which leads to an ambiguity between
the ego-motion of the camera and the motion of the objects.
To solve this problem, we present a self-supervised learning
framework for 3D object motion field estimation from
monocular videos. Our contributions are two-fold. First, we
propose a two-stage projection pipeline to explicitly disentangle
the camera ego-motion and the object motions with
dynamics attention module, called DAM. Specifically, we
design an integrated motion model that estimates the motion
of the camera and object in the first and second warping stages,
respectively, controlled by the attention module
through a shared motion encoder. Second, we propose an
object motion field estimation through contrastive sample
consensus, called CSAC, taking advantage of weak semantic
prior (bounding box from an object detector) and geometric
constraints (each object respects the rigid body motion
model). Experiments on KITTI, Cityscapes, and Waymo
Open Dataset demonstrate the relevance of our approach
and show that our method outperforms state-of-the-art algorithms
for the tasks of self-supervised monocular depth
estimation, object motion segmentation, monocular scene
flow estimation, and visual odometry.
<br>
<br>
<b>Key Words:</b>
Motion Field Estimation, Monocular Depth Prediction, Geometric Learning.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
<!-- Paper boundary -->
<li>
<div class="container">
<p>
Two-phase Pseudo Label Densification for Self-training based Domain Adaptation.<br>
Inkyu Shin, Sanghyun Woo, <strong>Fei Pan</strong>, In So Kweon.<br>
European Conference on Computer Vision (ECCV), 2020. <a href="https://www.ecva.net/papers/eccv_2020/papers_ECCV/papers/123580528.pdf">[pdf]</a> <br>
<span class="read-more-text">
<b>Abstract</b>
<br>
Recently, deep self-training approaches emerged as a powerful solution to
the unsupervised domain adaptation. The self-training scheme involves
iterative processing of target data; it generates target pseudo labels
and retrains the network. However, since only the confident predictions
are taken as pseudo labels, existing self-training approaches inevitably
produce sparse pseudo labels in practice. We see this is critical because
the resulting insufficient training-signals lead to a suboptimal,
error-prone model. In order to tackle this problem, we propose a novel
Two-phase Pseudo Label Densification framework, referred to as TPLD.
In the first phase, we use sliding window voting to propagate the confident
predictions, utilizing intrinsic spatial-correlations in the images.
In the second phase, we perform a confidence-based easy-hard classification.
For the easy samples, we now employ their full pseudo labels.
For the hard ones, we instead adopt adversarial learning to enforce hard-to-easy
feature alignment. To ease the training process and avoid noisy predictions,
we introduce the bootstrapping mechanism to the original self-training loss.
We show the proposed TPLD can be easily integrated into existing self-training
based approaches and improves the performance significantly.
Combined with the recently proposed CRST self-training framework, we achieve
new state-of-the-art results on two standard UDA benchmarks.
<br>
<br>
<b>Key Words:</b>
Self-training, Domain Adaptation, Pseudo Label Correction.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
<!-- Paper boundary -->
<li>
<div class="container">
<p>
Unsupervised Intra-domain Adaptation for Semantic Segmentation through Self-supervision.<br>
<strong>Fei Pan</strong>, Inkyu Shin, Francois Rameau, Seokju Lee, In So Kweon.<br>
IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2020. <a href="https://openaccess.thecvf.com/content_CVPR_2020/papers/Pan_Unsupervised_Intra-Domain_Adaptation_for_Semantic_Segmentation_Through_Self-Supervision_CVPR_2020_paper.pdf">[pdf]</a> <a href="https://github.com/feipanir/IntraDA">[code]</a> <br>
<b style="color:red;">Oral Presentation</b><br>
<span class="read-more-text">
<b>Abstract</b>
<br>
Convolutional neural network-based approaches have achieved remarkable progress
in semantic segmentation. However, these approaches heavily rely on annotated
data which are labor intensive. To cope with this limitation, automatically
annotated data generated from graphic engines are used to train segmentation
models. However, the models trained from synthetic data are difficult to transfer
to real images. To tackle this issue, previous works have considered directly
adapting models from the source data to the unlabeled target data (to reduce the
inter-domain gap). Nonetheless, these techniques do not consider the large
distribution gap among the target data itself (intra-domain gap).
In this work, we propose a two-step self-supervised domain adaptation approach
to minimize the inter-domain and intra-domain gap together.
First, we conduct the inter-domain adaptation of the model;
from this adaptation, we separate the target domain into an easy and hard split
using an entropy-based ranking function. Finally, to decrease the intra-domain
gap, we propose to employ a self-supervised adaptation technique from the easy to
the hard split. Experimental results on numerous benchmark datasets highlight the
effectiveness of our method against existing state-of-the-art approaches.
The source code is available at https://github.com/feipanir/IntraDA.
<br>
<br>
<b>Key Words:</b>
Domain Adaptation, Adversarial Training, Semantic Segmentation, Self-supervised Learning.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
<!-- Paper boundary -->
<li>
<div class="container">
<p>
Variational Prototyping-Encoder: One-shot Learning with Prototypical Images.<br>
Junsik Kim, Tae-hyun Oh, Seokju Lee, <strong>Fei Pan</strong>, In So Kweon.<br>
IEEE/CVF Computer Vision and Pattern Recognition Conference (CVPR), 2019. <a href="https://openaccess.thecvf.com/content_CVPR_2019/papers/Kim_Variational_Prototyping-Encoder_One-Shot_Learning_With_Prototypical_Images_CVPR_2019_paper.pdf">[pdf]</a> <a href=" https://github.com/mibastro/VPE">[code]</a><br>
<span class="read-more-text">
<b>Abstract</b>
<br>
In daily life, graphic symbols, such as traffic signs and brand logos,
are ubiquitously utilized around us due to its intuitive expression beyond
language boundary. We tackle an open-set graphic symbol recognition problem
by one-shot classification with prototypical images as a single training example
for each novel class. We take an approach to learn a generalizable embedding space
for novel tasks. We propose a new approach called variational prototyping-encoder (VPE)
that learns the image translation task from real-world input images to their corresponding
prototypical images as a meta-task. As a result, VPE learns image similarity as well as
prototypical concepts which differs from widely used metric learning based approaches.
Our experiments with diverse datasets demonstrate that the proposed VPE performs favorably
against competing metric learning based one-shot methods. Also, our qualitative analyses
show that our meta-task induces an effective embedding space suitable for unseen data
representation.
<br>
<br>
<b>Key Words:</b>
One-Shot Learning, Prototypical Learning, Variational Auto-encoder.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
<!-- Paper boundary -->
<li>
<div class="container">
<p>
Driver Drowsiness Detection System Based on Feature Representation Learning Using Various Deep Networks.<br>
Sanghyuk Park, <strong>Fei Pan</strong>, Sunghun Kang, Chang D. Yoo.<br>
Asian Conference on Computer Vision Workshops (ACCVW), 2016. <a href="https://link.springer.com/chapter/10.1007/978-3-319-54526-4_12">[pdf]</a> <br>
<span class="read-more-text">
<b>Abstract</b>
<br>
Statistics have shown that 20% of all road accidents are fatigue-related,
and drowsy detection is a car safety algorithm that can alert a snoozing driver
in hopes of preventing an accident.
This paper proposes a deep architecture referred to as deep drowsiness detection (DDD)
network for learning effective features and detecting drowsiness given a RGB input
video of a driver. The DDD network consists of three deep networks for attaining global
robustness to background and environmental variations and learning local facial
movements and head gestures important for reliable detection.
The outputs of the three networks are integrated and fed to a softmax classifier for
drowsiness detection. Experimental results show that DDD achieves 73.06% detection accuracy on
NTHU-drowsy driver detection benchmark dataset.
<br>
<br>
<b>Key Words:</b>
Driver Drowsiness Detection, Representation learning.
<br>
</span>
<span class="read-more-btn">Read More</span>
</p>
</div>
</li>
</ul>
</div>
<strong>Academic Service</strong><br>
<ul>
<li>Journal Review: TPAMI, CVIU, Neurocomputing, Pattern Recognition Letters.</li>
<li>Conference Review: CVPR, ICCV, ECCV, NeurIPS, AAAI. </li>
</ul>
<p align="center">
<small><i>The three fundamental problems of computer vision are correspondence, correspondence, and correspondence! -- Takeo Kanade</i></small><br>
</p>
</div>
<script src="script.js"></script>
</body>
</html>