Semantic Guided Latent Parts Embedding for Few-Shot Learning

Fengyuan Yang, Ruiping Wang, Xilin Chen

January 2023

The illustration diagram shows the motivation of ours latent parts embedding.

Abstract

The ability of few-shot learning (FSL) is a basic requirement of intelligent agent learning in the open visual world. However, existing deep learning systems rely too heavily on large numbers of training samples, making it hard to learn new categories efficiently from limited size of training data. Two key challenges of FSL are insufficient comprehension and imperfect modeling of the few-shot novel class. For insufficient visual comprehension, semantic knowledge which is information from other modalities can help replenish the understanding of novel classes. But even so, most works still suffer from the second challenge because the single global class prototype they adopted is extremely unstable and imperfect given the larger intra-class variation and harder inter-class discrimination in FSL scenario. Thus, we propose to represent each class by its several different parts with the help of class semantic knowledge. Since we can never pre-define parts for unknown novel classes, we embed them in a latent manner. Concretely, we train a generator that takes the class semantic knowledge as input and outputs several filters of class-specific semantic latent parts. By applying each part filter, our model can pay attention to corresponding local regions containing each part. At the inference stage, the classification is conducted by comparing the similarities between those parts. Experiments on several FSL benchmarks demonstrate the effectiveness of our proposed method and show its potential to go beyond class recognition to class understanding. Furthermore, we also find when semantic knowledge is more visualized and customized, it will be more helpful in the FSL task.

Type

Conference paper

Publication

In Winter Conference on Applications of Computer Vision

Fengyuan Yang

PhD. student in Computer Science

My research interests encompass the field of Computer Vision, with a particular emphasis on Human Pose Estimation in the world coordinate during my Ph.D. Previously, during my Master’s studies, I explored the incorporation of semantic knowledge in Few-Shot Learning.