img-sim: Contrastive Learning for Image Similarity

Objective

To replicate and extend a deep learning model using a contrastive learning framework for the task of image similarity. The underlying model learns meaningful visual representations through self-supervised learning techniques, enabling accurate similarity measurements between image pairs. This implementation is based on PyTorch.

Training & SimCLR Implementation

Fully implemented SimCLR framework from scratch (vectorized), including the NT-Xent loss:

$$ \ell_k = - \log \frac{\exp(\text{sim}(z_k, z_{p(k)}) / \tau)}{\sum_{j \ne k} \exp(\text{sim}(z_k, z_j) / \tau)} . $$

Experiments

Comparison of two CNN backbones:

Simple: 2 convolutional layers + 1 fully connected layer (LeNet-style), conv layers use kernel size 3, stride 2, padding 1.
ResNet: Residual blocks with skip connections, batch normalization, and ReLU activations.

The output vector space is $(\mathbb{R}^{128},+,\cdot)$.

Model	Train Loss	Test Loss	Duration
Simple	0.0350	0.0352	2h 50m
ResNet	0.0166	0.0184	4h 18m

Figure 1. Loss plots of the ResNet model on the train and validation sets during training.

Figure 2. The base image sampled from the test set used as a reference for similarity comparison.

Figure 3. Visualization showing the ranking of all test images in the test set based on similarity to the basis image by Ranking of top-5 and bottom-5 similar images compared to a basis image.

Hardware and Environment Specs

GPU: NVIDIA RTX 4060 Mobile
CPU: Intel i5-12500H
RAM: 40 GB DDR4

Deployment

Before running the program:

Prepare CIFAR Data
Download and extract the CIFAR-10 dataset into the following directory:
```
data/raw/cifar/
```
The directory should contain files such as:
```
data_batch_1, data_batch_2, ..., test_batch, batches.meta, etc.
```
Adjust the Data Loader
You must configure the load_data method of the Cifar object in src/data/dataset.py to correctly parse your data layout and preprocessing preferences.

The implementation of data loading is left intentionally flexible for users to define according to their needs.

Usage

All interactions with this project are done through the run.py script, which serves as the main entry point.

Configuration Format

Configuration files are written in JSON and define all the necessary parameters for model setup and training.

Default config files are located in the template/ directory. Below is the structure and meaning of each field:

{
  "model_architecture_cfg": {
    "type": "ResNet",               // Backbone encoder type (e.g., "ResNet")
    "instance_prsd_shape": [3, 32, 32], // Shape of one input instance (C, H, W)
    "N_repr": 128,                  // Dimensionality of the projection head output
    "detailed": null                // Reserved for detailed architecture overrides (optional)
  },
  "training_cfg": {
    "rng_seed": 0,                  // Random seed for reproducibility
    "max_epochs": 100,             // Maximum number of training epochs
    "data": "cifar",                // Dataset identifier
    "M_minibatch": 16,              // Mini-batch size
    "train_fraction": 0.6,          // Fraction of data used for training
    "subset_size": null,            // Optional subset size (useful for debugging)
    "temperature": 0.5,             // Temperature for contrastive loss
    "optimization": {
      "type": "adam",               // Optimizer type (e.g., "adam", "sgd")
      "lr": 1e-3                    // Learning rate
    }
  }
}

You can add or override parameters as needed.
subset_size can be used for partial data training.
temperature is critical for contrastive loss performance tuning.
The "detailed" field is reserved for custom architecture options (e.g., ResNet variants or layer-specific configs).

Refer to the default files under template/ (e.g., config-resnet.json, config-simple.json) as starting points.

Basic Command Format

python run.py --device <cpu|gpu> --mode <train|pred> [additional args]

Training Mode

To train a model, you must provide either:

a new configuration file path (--config), or
a previous checkpoint directory path (--checkpoint) to resume from.

# Start training from a config file
python run.py --device gpu --mode train --config config-resnet.json

# Resume training from an existing checkpoint
python run.py --device gpu --mode train --checkpoint checkpoint/Dt20250719124327UTC0

Checkpoint Stdout Format

During training, progress and evaluation results for each epoch are printed to standard output in a structured format. Below is an example and explanation of the key components:

[EPOCH 5/99 @ Dt20250719125139UTC0]

Iterative optimization state:
100% |==============================| S2249/2249 [46s<000ms; 020ms/it; L0.0677]                            
Epoch optimization completed; proceeding to model evaluation stage.

Train set evaluation state:
100% |==============================| S2249/2249 [36s<000ms; 016ms/it; L0.0598]                            

Validation set evaluation state:
100% |==============================| S1124/1124 [18s<000ms; 016ms/it; L0.1637]                            

[EPOCH CONCLUSIVE REPORT]

Epoch time: 01m:41s
Performance measurement time: 54s

+------+-------------+---------------+--------------+
|      | val_v_opt   | val_v_worst   | val_v_init   |
|------+-------------+---------------+--------------|
| loss | 2.88 %      | -96.62 %      | -96.62 %     |
+------+-------------+---------------+--------------+

+------+-----------------+-----------------+-----------------+-----------+-----------+
|      |   est_train_min |   est_train_max |   est_train_avg |     train |       val |
|------+-----------------+-----------------+-----------------+-----------+-----------|
| loss |       0.0111888 |        0.448736 |       0.0763622 | 0.0823384 | 0.0834423 |
+------+-----------------+-----------------+-----------------+-----------+-----------+

In the epoch summary table, the training loss columns are as follows:

Column	Description
`est_train_min`	The minimum loss observed during minibatch updates within the epoch.
`est_train_max`	The maximum loss observed during minibatch updates within the epoch.
`est_train_avg`	The average loss computed over all minibatch updates (estimated average) during the epoch.
`train`	The true average loss computed over the entire training dataset after the epoch ends.
`val`	The true average loss computed over the entire validation dataset after the epoch ends.

The est_* values reflect the loss as observed iteratively within minibatch steps during training.
The train and val values are computed precisely over the full respective datasets after completing all minibatch iterations.

This distinction helps diagnose training dynamics, comparing instantaneous minibatch losses with full-epoch averages.

Performance measurement time:
The duration (in seconds) spent measuring model performance on training and validation datasets after the epoch completes.

Validation metric comparisons (shown as percentages in the summary table):

Metric	Description
`val_v_opt`	Percentage difference between the current validation metric (e.g., loss) and the best (optimal) validation metric recorded so far.
`val_v_worst`	Percentage difference between the current validation metric and the worst validation metric recorded so far.
`val_v_init`	Percentage difference between the current validation metric and the validation metric measured before training started (initial weights).

These metrics help track how the model’s validation performance evolves relative to its best, worst, and initial states throughout training.

Prediction / Estimation Mode

To run inference or estimate representations, provide:

the config file used during training (--config)
a model weights file (--tparams)

python run.py --device gpu --mode pred --config config-resnet.json --tparams model_final.pt

Debug Options (Optional)

You can pass debug parameters for development purposes:

limit2SmallSubsetofData: Use a small subset of the dataset
clearExports: Remove exported checkpoints after execution

python run.py --device cpu --mode train --config config-test.json --debug limit2SmallSubsetofData clearExports

Acknowledgement

The current methodologies were highly relied on the work of [1].

References

[1] T. Chen, S. Kornblith, M. Norouzi and G. Hinton. "SimCLR: A Simple Framework for Contrastive Learning of Visual Representations". Proceedings of the 37th International Conference on Machine Learning, Vienna, Austria, PMLR 119. 2020. Available at: https://arxiv.org/abs/2002.05709

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
asset		asset
checkpoint		checkpoint
data		data
src		src
template		template
tmp		tmp
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

img-sim: Contrastive Learning for Image Similarity

Objective

Training & SimCLR Implementation

Experiments

Hardware and Environment Specs

Deployment

Usage

Configuration Format

Basic Command Format

Training Mode

Checkpoint Stdout Format

Prediction / Estimation Mode

Debug Options (Optional)

Acknowledgement

References

About

Uh oh!

Releases

Packages

Languages

fl0wxr/img-sim

Folders and files

Latest commit

History

Repository files navigation

img-sim: Contrastive Learning for Image Similarity

Objective

Training & SimCLR Implementation

Experiments

Hardware and Environment Specs

Deployment

Usage

Configuration Format

Basic Command Format

Training Mode

Checkpoint Stdout Format

Prediction / Estimation Mode

Debug Options (Optional)

Acknowledgement

References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages