Skip to content

The official PyTorch implementation of our paper "Multi-scale spatial pyramid attention mechanism for image recognition: An effective approach".

License

Notifications You must be signed in to change notification settings

ndsclark/MSPANet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multi-Scale Spatial Pyramid Attention Mechanism

The official PyTorch implementation of "Multi-scale spatial pyramid attention mechanism for image recognition: An effective approach".

Method

Figure 1: The overall architecture of the proposed MSPA module.

Description. It contains three core components: the HPC module, the SPR module, and the Softmax operation. The HPC module is designed to extract multi-scale spatial information. The SPR module is responsible for learning channel attention weights to build cross-dimension interaction. The Softmax operation is used to recalibrate channel-wise attention weights to establish long-range channel dependencies.

MSPA. Detail of implementations, including modules and the networks, can be found in Cifar-100 and ImageNet in this repository.

Our environments and toolkits

  • OS: Ubuntu 18.04.1
  • CUDA: 11.6
  • Python: 3.9.12
  • Toolkit: PyTorch 1.10
  • GPU: RTX A6000 (4x)
  • thop
  • ptflops
  • For generating GradCAM++ results, please follow the code on this repository

How to incorporate the proposed MSPA module into ResNets

Figure 2: Comparison between the original bottleneck residual block (left) and the basic building block of the proposed MSPANet (right).

Overview of Results

Comparison of the performance of MSPANet-50 with the change of s and ω on CIFAR-100 classification

Figure 3: Comparison of the performance of MSPANet-50 with the change of s and ω on CIFAR-100 classification.

Comparison of training and validation curves on ImageNet-1K

Figure 4: Comparisons of training and validation curves on ImageNet-1K for ResNet, MSPANet-S, and MSPANet-B architectures of 50 and 101 layers, respectively.

Classification performance on CIFAR-100

Table 1: Comparisons of various attention methods on the CIFAR-100 test set in terms of network parameters (Parameters), floating-point operations (FLOPs), and Top-1 accuracy (Top-1 Acc), using ResNet-50, ResNeXt-29, and PreActResNet-164 as baselines, respectively.

Classification performance on ImageNet-1K

Table 2: Comparisons of efficiency (i.e., Parameters and FLOPs) and effectiveness (i.e., Top-1/Top-5Acc) of various attention methods and different multi-scale representation architectures on the ImageNet-1K validation set.

Citation

If you find MSPA useful in your research, please consider citing:

@article{2024mspa,
	title={Multi-scale spatial pyramid attention mechanism for image recognition: An effective approach},
	author={Yu, Yang and Zhang, Yi and Cheng, Zeyu and Song, Zhe and Tang, Chengkai},
	journal={Engineering Applications of Artificial Intelligence},
	volume={133},
	pages={108261},
	year={2024},
	publisher={Elsevier}
}

Contact Information

If you have any suggestion or question, you can leave a message here or contact us directly: yang_y9802@163.com. Thanks for your attention!

About

The official PyTorch implementation of our paper "Multi-scale spatial pyramid attention mechanism for image recognition: An effective approach".

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages