Skip to content

Visual-AI/Category-Discovery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Category Discovery: An Open-World Perspective

Zhenqi He , Yuanpei Liu , Kai Han


This repository serves as a supplementary resource for our survey paper on Category Discovery (CD) methods. It includes a comprehensive collection of key papers, frameworks, and approaches in the field of CD, summarizing the most recent advancements and techniques. The materials here aim to provide researchers with an accessible overview of current trends and methodologies in CD, along with references and additional insights to support further exploration.

We will continue to maintain and update this repository with new papers and resources as the field evolves. Contributions are welcome, and we encourage pull requests (PRs) to help expand and improve the content for the community.

Table of Contents

Introduction

Alt text

Category Discovery (CD) addresses the limitations of the closed-world assumption by embracing an open-world setting. As shown in above figure, CD differs from semi-supervised learning and OSR&OOD by clustering unlabelled data that contains unseen categories. It is motivated by that human beings are capable of discovering unknown species by transferring existing knowledge on explored species. CD proves highly applicable across various real-world scenarios. For example, in autonomous driving, vehicles must continuously detect and classify new objects—such as unfamiliar road signs or obstacles—beyond their initial training to ensure safe navigation. In retail, CD can automatically recognize newly introduced products in supermarkets without the need for manual labeling.

Roadmap

Alt text
In recent years, CD has garnered increasing attention, leading to a proliferation of research exploring various methodologies and settings. It was initially introduced as Novel Category Discovery (NCD) in to cluster unlabelled novel categories by leveraging knowledge from labelled base categories. This concept was later expanded into Generalized Category Discovery (GCD), which relaxed earlier constraints by assuming that the unlabelled data contains both novel and base categories, thereby more closely mirroring real-world scenarios. Further advancing the field, Han *etal.* proposed Semantic Category Discovery (SCD), aiming to assign semantic labels to unlabelled samples from an unconstrained vocabulary space. Additionally, CD has been applied to complex scenarios such as continual learning, where models learn incrementally over time, and federated learning, which focuses on training models across decentralized devices while ensuring data privacy. CD methods have also been explored in challenging settings, including few-shot learning, where limited labelled data is available, and with imbalanced distribution and domain-shifted data, making CD more applicable to real-world problems.

Category Discovery

Alt text

Novel Category Discovery (NCD)

The concept of NCD aims to transfer the knowledge learned from base categories to cluster unlabelled unseen categories, motivated by the observation where a child could easily distinguish novel categories (e.g., birds and elephants) after learning to classify base categories (e.g., dogs and cats).

Formally, given a dataset $\mathcal{D} = \mathcal{D}_L \cup \mathcal{D}_U$, where the labelled portion is $\mathcal{D}_L = \{(\mathbf{x}_i, y_i)\}_{i=1}^M \subset \mathcal{X} \times \mathcal{Y}_L$ and the unlabelled portion is $\mathcal{D}_U = \{(\mathbf{x}_i, \hat{y}_i)\}_{i=1}^K \subset \mathcal{X} \times \mathcal{Y}_U$ (with the labels $\hat{y}_i$ being inaccessible during training), the objective of NCD is to leverage the discriminative information learned from the annotated data to cluster the unlabelled data.

This setting presumes that the label spaces of the labelled and unlabelled data are disjoint, i.e., $\mathcal{Y}_L \cap \mathcal{Y}_U = \varnothing$, implying $\mathcal{C}_N = \mathcal{Y}_U$, while also assuming a high degree of semantic similarity between the base and novel categories.

Year Method Pub. Backbone Label Assignment # Unlabelled categories Dataset
2018 KCL ICLR ResNet Parametric Classifier Over-estimate Omniglot, ImageNeg-1K, Office31
2019 MCL ICLR ResNet, VGG, LeNet Parametric Classifier Over-estimate Omniglot, CIFAR-10&100, ImageNet-1K, MNIST
DTC ICCV ResNet, VGG Soft Assignment $k$-Means Omniglot, CIFAR-10&100, ImageNet-1K, SVHN
2020 RS, RS+ ICLR ResNet Parametric Classifier Known Omniglot, CIFAR-10&100, ImageNet-1K, SVHN
2021 Qing etal. Neural Networks ResNet Parametric Classifier Known CIFAR-10&100, SVHN
OpenMix CVPR ResNet, VGG Parametric Classifier Known CIFAR-10&100, ImageNet-1K
NCL CVPR ResNet Parametric Classifier Known CIFAR-10&100, ImageNet-1K
JOINT ICCV ResNet Parametric Classifier Known CIFAR-10&100, ImageNet-1K
UNO ICCV ResNet Parametric Classifier Known CIFAR-10&100, ImageNet-1K
DualRS ICCV ResNet Parametric Classifier Known CIFAR-10&100, ImageNet-100&1K, SSB
2022 SMI ICASSP VGG-16 $k$-Means Known CIFAR-10&100, ImageNet-1K
PSSCNNCD T'CYB N/A BKBH $k$-Means Progressive label propagation Coil20, Yeast, MSRA25, PalmData25, Abalone, USPS, Letter, MNIST
Li etal. NeurIPSW ResNet $k$-Means $k$-Means CIFAR-100, ImageNet-1K
2023 ResTune T'NNLS ResNet $k$-Means Known CIFAR-10&100, TinyImageNet
SK-Hurt TMLR ResNet $k$-Means $k$-Means CIFAR-100, ImageNet-1K
IIC CVPR ResNet Parametric Classifier $k$-Means CIFAR-10&100, ImageNet-1K
NSCL ICML ResNet $k$-Means $k$-Means CIFAR-100, ImageNet-1K
CRKD ICCV ResNet, ViT Parametric Classifier Known CIFAR-100, SSB
Feng etal. MICCAI ResNet Parametric Classifier Known ISIC2019
2024 RAPL CVPR ResNet $k$-Means Known SoyAgeing
SCKD ECCV ResNet, ViT Parametric Classifier Known CIFAR-10&100, ImageNet-100, SSB
APL T'PAMI ResNet Parametric Classifier Known CIFAR-10&100, Omniglot, ImageNet-1K
PrePrint Hasan etal. ArXiv ResNet Parametric Classifier $k$-Means CIFAR-10&100

Generalized Category Discovery (GCD)

Extending the NCD paradigm, Generalized Category Discovery relaxes the disjointness assumption between the base and novel categories, thereby presenting a more challenging and realistic scenario. In GCD, the labelled and unlabelled datasets may share common categories, i.e., $\mathcal{Y}_L \cap \mathcal{Y}_U \neq \varnothing$, and the set of novel categories is defined as a subset of $\mathcal{Y}_U$ (i.e., $\mathcal{C}_N \subset \mathcal{Y}_U$). This general formulation is particularly pertinent to practical applications such as plant species discovery, where an existing database of known species is augmented with newly observed species, necessitating the clustering of both known and novel instances.

Notably, an equivalent formulation has been introduced by Cao etal. under the designation of Open-World Semi-Supervised Learning. In the following context, we refer to both formulations under the umbrella term Generalized Category Discovery.

Year Method Pub. Backbone Label Assignment # Unlabelled categories Dataset
2022 GCD CVPR ViT Semi-$k$-Means $k$-Means CIFAR-10&100, ImageNet-100, SSB, Herb19
ORCA CVPR ResNet Parametric Classifier Known CIFAR-10&100, ImageNet-100, Single-Cell
ComEx CVPR ResNet Parametric Classifier Known CIFAR-10&100
OpenLDN ECCV ResNet Parametric Classifier Known CIFAR-10&100, ImageNet-100, TinyImage, Oxford Pets
TRSSL ECCV ResNet Parametric Classifier $k$-Means CIFAR-10&100, ImageNet-100, TinyImage, Oxford Pets, Scars, Aircrafts
NACH NeurIPS ResNet Parametric Classifier Known CIFAR-10&100, ImageNet-100
XCon BMVC ViT Semi-$k$-Means $k$-Means CIFAR-10&100, ImageNet-100, SSB, Oxford Pets
2023 OpenCon TMLR ResNet Prototype-based $k$-Means CIFAR-10&100, ImageNet-100
PromptCAL CVPR ViT Semi-$k$-Means Known CIFAR-10&100, ImageNet-100, SSB
DCCL CVPR ViT Infomap Infomap CIFAR-10&100, ImageNet-100, CUB, Scars, Oxford Pets
OpenNCD IJCAI ResNet Prototype-based Prototype Grouping CIFAR-10&100, ImageNet-100
SimGCD ICCV ViT Parametric Classifier $k$-Means CIFAR-10&100, ImageNet-100, SSB, Herb19
GPC ICCV ViT GMM GMM CIFAR-10&100, ImageNet-100, SSB
PIM ICCV ViT Parametric Classifier $k$-Means CIFAR-10&100, ImageNet-100, CUB, Scars, Herb19
TIDA NeurIPS ResNet Parametric Classifier Known CIFAR-10&100, ImageNet-100, TinyImageNet, Scars, Aircraft
$\mu$GCD NeurIPS ResNet, ViT, ViT $k$-Means Known Clevr-4
InfoSieve NeurIPS ViT $k$-Means $k$-Means CIFAR-10&100, ImageNet-100, SSB, Oxford Pets, Herb19
SORL NeurIPS ResNet $k$-Means Known CIFAR-10&100
Yang etal. ICONIP ViT Louvain Louvain CIFAR-10&100, ImageNet-100, CUB, Scars, Herb19
2024 AMEND WACV ViT Parametric Classifier Known CIFAR-10&100, ImageNet-100, SSB, Herb19
GCA WACV ViT Guided Cluster Aggregation $k$-Means CIFAR-10&100, ImageNet-100, SSB
SPT-Net ICLR ViT, ViT Parametric Classifier $k$-Means CIFAR-10&100, ImageNet-100, SSB
LegoGCD CVPR ViT Parametric Classifier Known CIFAR-10&100, ImageNet-100, -1K, SSB, Herb19
CMS CVPR ViT Agglomerative Clustering Agglomerative Clustering CIFAR-100, ImageNet-100, SSB, Herb19
ActiveGCD CVPR ViT Parametric Classifier Known CIFAR-10&100, ImageNet-100, SSB
TextGCD ECCV ViT Parametric Classifier Known CIFAR-10&100, ImageNet-100, -1K, SSB, Oxford Pets, Flowers102
LPS IJCAI ResNet Parametric Classifier Known CIFAR-10&100, ImageNet-100
Contextuality-GCD ICIP ViT Parametric Classifier Known CIFAR-10&100, ImageNet-100, -1K, SSB, Herb19
2025 MSGCD Information Fusion ViT Parametric Classifier Known CIFAR-100, SSB
CPT IJCV ViT Similarity-Based $k$-Means CIFAR-10&100, ImageNet-100,CUB, Scars, Herb19
PAL-GCD AAAI ViT Parametric Classifier DBSCAN CIFAR-100,ImageNet-100,SSB, Herb19
DebGCD ICLR ViT Parametric Classifier DBSCAN CIFAR-10&100,ImageNet-100&1K,SSB, Herb19,Oxford-Pets
ProtoGCD T'PAMI ViT Parametric Classifier $k$-Means CIFAR-10&100,ImageNet-100&1K,SSB, Herb19
MOS CVPR ViT Parametric Classifier Known SSB, Oxford-Pets
GET CVPR ViT Parametric Classifier Known CIFAR-10&100,ImageNet-100,SSB,Herb19
AptGCD CVPR ViT Parametric Classifier Known CIFAR-10&100,ImageNet-100,SSB,Herb19
Dai et al CVPR ViT - Known SSB, Herb19
HypCD CVPR ViT - Known CIFAR-10&100,ImageNet-100,SSB,Herb19
PrePrint CLIP-GCD ArXiv ViT Semi-$k$-Means $k$-Means CIFAR-10&100, ImageNet-100, -1K, SSB, Flowers102, DomainNet
MCDL ArXiv ViT Parametric Classifier Known CIFAR-10&100, ImageNet-100, -1K, CUB, SCars, Herb19
PNP ArXiv ViT Infomap Infomap CIFAR-10&100, ImageNet-100, -1K, SSB, Herb19
RPIM ArXiv ViT Parametric Classifier Known CIFAR-10&100, ImageNet-100, -1K, CUB, Scars, Herb19
OpenGCD ArXiv ViT Parametric Classifier $k$-Means CIFAR-10&100, CUB
ConceptGCD ArXiv ViT, ViT Parametric Classifier $k$-Means CIFAR-100, ImageNet-100, -1K, SSB, Herb19
GET ArXiv ViT Parametric Classifier Known CIFAR-10&100, ImageNet-100, SSB, Herb19

Continual Category Discovery (CCD)

CCD provides a continual setting of category discovery in which new categories are identified sequentially while retaining previously acquired knowledge. CCD presents several distinct scenarios based on the structure of the incoming data.

In the Class Incremental Scenario, the training set $\mathcal{D}_{\mathrm{train}}^t$ contains solely unlabelled instances from novel categories. In the Mixed Incremental Scenario, $\mathcal{D}_{\mathrm{train}}^t$ is composed exclusively of unlabelled data drawn from both novel and base categories. Finally, in the Semi-Supervised Mixed Incremental Scenario, $\mathcal{D}_{\mathrm{train}}^t$ comprises both labelled and unlabelled samples, which originate from the base as well as the novel categories.

Year Method Pub. Backbone Scenario Label Assignment # Unlabelled categories Dataset
2022 NCDwF ECCV ResNet Class Incremental Parametric Classifier Known CIFAR-10/100, ImageNet-1K
FRoST ECCV ResNet Class Incremental Parametric Classifier Known CIFAR-10/100, TinyImageNet
GM NeurIPS ResNet All Parametric Classifier Known CIFAR-100, ImageNet-100, CUB
2023 PA-GCD ICCV ViT, Resnet Mixed Incremental Parametric Classifier Affinity Propagation CUB, MIT67, Stanford Dogs, Aircraft
MetaGCD ICCV ViT Mixed Incremental $k$-Means $k$-Means CIFAR-10/100, TinyImageNet
iGCD ICCV ResNet Self-Supervised Mixed Incremental Soft Nearest Neighbor Density Peaks CUB, Aircraft, CIFAR-100
2024 Msc-iNCD ICPR ViT Class Incremental Parametric Classifier Known CIFAR-100, ImageNet-100/1K
ADM AAAI ResNet Class Incremental Parametric Classifier Known CIFAR-10/100, TinyImageNet
PromptCCD ECCV ViT Mixed Incremental GMM GMP CIFAR-100, ImageNet-100, TinyImageNet
DEAN ECCV ViT Mixed Incremental Parametric Classifier Affinity Propagation CUB, Aircraft, CIFAR-100
CAMP ECCV ViT Self-Supervised Mixed Incremental Nearest Centroid Classifier Known CUB, Aircraft, SCars, DomainNet, CIFAR-100
Happy NeurIPS ViT Mixed Incremental Parametric Classifier Silhouette Score CIFAR-100, ImageNet-100, TinyImageNet, CUB
Preprint FEA ArXiv ViT Class Incremental Parametric Classifier Known CIFAR-10/100, TinyImageNet

On-the-fly Category Discovery (OCD)

OCD extends conventional category discovery to an inductive learning paradigm with streaming inference. It trains on a labelled support set $D_S$ to cluster unlabelled query set $D_Q$ where $D_S$ is unavailable during training and its samples are individually at test time.

Year Method Pub. Backbone Label Assignment # Unlabelled categories Dataset
2023 SMILE CVPR ViT Hash-based Hash-coding CIFAR-10&100, ImageNet-100, CUB, Scars, Herb19
2024 PHE NeurIPS ViT Hamming Ball-Based Hamming Ball-Based CUB, Scars, Oxford Pets, Food-101, iNaturalist

Category Discovery with domain shift

This setting relaxes the conventional assumption that both labelled and unlabelled data are drawn from the same semantic domain. Formally, let $\mathcal{D}_L$ denote the labelled data, assumed to be exclusively drawn from the domain $\Omega_B$, and let $\mathcal{D}_U$ denote the unlabelled data, which may include samples originating from both $\Omega_B$ and an additional domain $\Omega_{N}$. The objective is to accurately classify images drawn from the combined domain $\Omega = \Omega_B \cup \Omega_{N}$, under the assumption that the novel domain is disjoint from the base domain (i.e., $\Omega_B \cap \Omega_{N} = \varnothing$). In practice, the novel domain $\Omega_{N}$ may encompass multiple subdomains.

Year Method Pub. Backbone $ \Omega_{\mathcal{U}} $ Label Assignment # Unlabelled categories Dataset $ \mathcal{Y_L} \cap \mathcal{Y_U} $
2022 Yu etal. AAAI ResNet Single New Domain Parametric Classifier $k$-Means Office, OfficeHome, VisDA $ \varnothing $
SCDA ICME ResNet Multiple New Domains Parametric Classifier $k$-Means Office, OfficeHome, DomainNet $ \varnothing $
2023 SAN ICCV ResNet Single New Domain Parametric Classifier N/A Office, OfficeHome, VisDA, DomainNet $ \varnothing $
2024 CDAD-Net CVPRW ViT Single New Domain Semi-$k$-Means Elbow OfficeHome, PACS, DomainNet, CIFAR-10&100, ImageNet-100 $ \neq \varnothing $
2025 HiLo ICLR ViT Multiple new Domains Parametric Classifier $k$-Means DomainNet, SSB-C $ \neq \varnothing $
ArXiv Wang etal. ArXiv ViT Single New Domain Parametric Classifier Known CIFAR-10, OfficeHome, DomainNet $ \varnothing $

Distribution-Agnostic Category Discovery (DA-CD)

DA-CD eliminates the requirement for a balanced distribution imposed on both labelled and unlabelled data in conventional category discovery. Instead, it acknowledges that the data may follow a skewed distribution, such that for certain categories $\mathcal{Y}_i$ and $\mathcal{Y}_j$ within the set $\mathcal{Y}$ it holds that $\mathbb{P}_{\mathcal{Y}_x}(\mathcal{Y}_i) > \mathbb{P}_{\mathcal{Y}_x}(\mathcal{Y}_j)$. In this formulation, the set $\mathcal{Y}_x$ may refer to either the labelled categories $\mathcal{Y}_L$ or the unlabelled categories $\mathcal{Y}_U$.

Year Method Pub. Backbone Scenario Label Assignment # Unlabelled categories Dataset
2023 NCDLR TMLR ViT Long-tailed Distribution for $ \mathcal{Y_L} & \mathcal{Y_U} $ Parametric Classifier $k$-Means CIFAR-10, ImageNet-100, Herb19, iNaturalist18
ImbaGCD CVPRW Resnet Imbalanced Distribution for $ \mathcal{Y_U} $ Parametric Classifier Known CIFAR-10&100, ImageNet-100
GCDLR ICCVW Resnet Imbalanced Distribution for $ \mathcal{Y_U} $ Parametric Classifier Known CIFAR-10&100, ImageNet-100
BYOP CVPR ResNet Imbalanced Distribution for $ \mathcal{Y_U} $ Parametric Classifier Known CIFAR-10&100, TinyImageNet
BaCon NeurIPS ViT Long-tailed Distribution for $ \mathcal{Y_L} & \mathcal{Y_U} $ $k$-Means Known CIFAR-10&100-LT, ImageNet-100-LT, Places-LT
2024 Fan etal CVPR ViT Long-tailed Distribution for $ \mathcal{Y_L} & \mathcal{Y_U} $ $k$-Means Spectral graph BioMedical Datasets

Semantic Category Discovery (SCD)

In contrast to NCD and GCD, which focus solely on grouping visually similar images without considering their semantic meaning, SCD extends these paradigms by also assigning a semantic label to each unlabelled instance. Specifically, SCD leverages an open vocabulary label space to achieve this goal. In this context, WordNet, comprising approximately 68,000 labels, is employed as a comprehensive and unconstrained vocabulary, facilitating the assignment of meaningful semantic labels.

Year Method Pub. Backbone Word Space Label Assignment # Unlabelled categories Dataset
2024 SCD CVPRW ViT ~Open KMeans+Top-$k$ Voting Known ImageNet-100&1K, SCars, CUB
SNCD AAAI ResNet $\mathcal{C}{base} + \mathcal{C}{novel}$ Parametric Classifier Known CIFAR-10&100, ImageNet-100

Few-Shots Category Discovery (FS-CD)

FS-CD addresses the challenge of identifying novel classes when only a very limited amount of labelled data is available. This setting extends traditional category discovery by integrating the principles of few-shot learning. In particular, FSCD adopts an $N$-way, $k$-shot framework in which the model is required to discriminate among $N$ distinct classes with merely $k$ labelled examples per class for base categories.

Chi etal. extend NCD to a few-shot setting by linking it to meta-learning, based on the shared assumption that base and novel categories possess high-level semantic features. By adapting meta-learning techniques such as Model-Agnostic Meta-Learning and Prototypical Networks (ProtoNet), their approach shifts the focus from classification to clustering tasks—a critical adjustment for few-shot category discovery. A key innovation is the introduction of the Clustering-rule-aware Task Sampler, which ensures that training tasks adhere to consistent clustering rules, thereby enabling the model to generalize better to novel categories despite the limited labelled data. However, this method assumes that the number of novel categories is known in advance.

Federated Category Discovery (FCD)

FCD extends Category Discovery in a federated learning setting, facilitating decentralized and collaborative model training among clients while safeguarding data privacy.

Year Method Pub. Backbone Label Assignment # Unlabelled categories Dataset
2023 FedoSSL ICML ResNet Parametric Classifier Known CIFAR-10/100, CINIC-10
2024 FedGCD CVPR ViT GMM Semi-FINCH CIFAR-10/100, ImageNet-100, CUB, SCars, Pets
Preprint GAL ArXiv ResNet&34 Parametric Classifier Potential Prototype Merge CIFAR-100, TinyImageNet, ImageNet-100

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published