Skip to content

Relevant works and papers for image classification, object detection, image captioning and Scene Understanding

Notifications You must be signed in to change notification settings

FonyaBrandone/computer-vision-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 

Repository files navigation

Deep Learning, Computer Vision and Medical Imaging Papers

I review, track, then document interesting and relevant works/papers for image classification, object detection, image captioning and Image Segmentation, Generative models, Vision-Language Models, 3D Vision and Medical Imaging - Using Convolution networks, Deep Neural networks, Transformer architectures

Image Classification

  • ☑ CNN paper - (LeNet) - Gradient-Based Learning Applied to Document Recognition (CNN Foundation paper by Yann LeCun)

  • ☑ VGG paper - Very Deep Convolutional Networks for Large-Scale Image Recognition (By Visual Geometry Group, University of Oxford)

  • ☑ ResNet paper - Deep Residual Learning for Image Recognition (By Microsoft Research team)

  • ☑ Vision Transformers (ViT) paper - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

  • ☑ A ConvNet for the 2020s

Image Captioning

  • ☑ Show and Tell: A Neural Image Caption Generator (By Google Team)

  • ☑ Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Object Detection

  • ☑ R-CNN paper - Rich feature hierarchies for accurate object detection and semantic segmentation

  • ☑ YOLO paper - You Only Look Once: Unified, Real-Time Object Detection

Image Segmentation

  • ☑ Mask R-CNN (Instance Segmentation)

  • ☑ SAM paper - Segment Anything Model, By Google team (Instance Segmentation)

  • ☑ U-Net, for medical imaging (Semantic Segmentation)

Generative Models

  • ☑ Pixel Recurrent Neural Networks, by Google DeepMind, 2016 (Autoregressive Generative model paper) - Explicit Probability density approach (Direct from training images, employs tractable density)

  • ☑ Auto-Encoding Variational Bayes, 2013 (Variational Autoencoders paper) - (Explicit Probability density approach, Approximate density measurement)

  • ☑ Generative Adversarial Nets, NeurIPS 2014 - Generative Adversarial Networks (GANs paper) - (Implicit Probability density approach)

  • ☑ Denoising Diffusion Probabilistic Models (DDPM), 2020 - Diffusion models paper

Foundational Vision Language Models (VLMs)

  • ☑ CLIP paper: "Learning Transferable Visual Models From Natural Language Supervision", 2020

  • ☑ Flamingo paper: "Flamingo: A Visual Language Model for Few-Shot Learning", 2022

  • ☑ BLIP paper: Bootstrapped language-image pretraining, 2022

VLMs for Medical Imaging

  • ☑ MedCLIP paper: "MedCLIP: contrastive Learning from Unpaired Medical images and text" (Extends CLIP pretraining by Decoupling image-text pairs not previously used to increase training size)

  • ☑ SAM-Med3D: Towards General-purpose Segmentation Models for Volumetric Medical Images (Uses prompt points for guided 3D segmentation on Alzheimers dataset)

  • ☑ MedBLIP paper: "MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts"

  • ☑ Text3DSAM: Text-Guided 3D Medical Image Segmentation Using SAM-Inspired Architecture (CVPR 2025 challenge winner)

Other Visual Learning and representation papers

  • ☑ DINOv3 (By Meta, August 2025)

  • ☑ Densely Connected Convolutional Networks (DenseNet paper, CVPR 2017)

Depth Estimation (3D Vision)

  • MiDaS: Learning Robust Monocular Depth Estimation Combining Diverse Datasets

  • Monodepth2: Self-Supervised Monocular Depth Estimation with Left-Right Consistency

3D Reconstruction

  • Mesh R-CNN paper

  • Occupancy networks paper

  • Neural Radiance Fields (NeRF paper)

Conferences and Venues

A summary of top conferences for deep learning and medical imaging, including typical timelines.

  • NeurIPS

  • ICML

  • ICLR

  • CVPR

  • MICCAI (International Conference on Medical Image Computing and Computer Assisted Intervention)

  • Medical Imaging with Deep Learning (MIDL) Conference

  • ML4H

  • International Conference on Pattern Recognition (ICPR)

  • ICCV (IEEE-CVF International Conference on Computer Vision)

  • ECCV

  • AAAI

  • Neurocomputing

  • IEEE Transactions on Medical Imaging

Conference Acronym Typical Submission Deadline Typical Conference Date
Neural Information Processing Systems NeurIPS Early May (Abstract) / Mid-May (Full) Early December
International Conference on Machine Learning ICML Late January / Early February Late July
International Conference on Learning Representations ICLR Late September / Early October Early May
Conference on Computer Vision and Pattern Recognition CVPR Mid-November Mid-June
International Conference on Computer Vision ICCV Mid-March Mid-October (odd years)
European Conference on Computer Vision ECCV Early March Late October (even years)
AAAI Conference on Artificial Intelligence AAAI Early September End of February
Medical Image Computing and Computer Assisted Intervention MICCAI Early March Mid-October
Medical Imaging with Deep Learning MIDL Mid-February Early July
International Symposium on Biomedical Imaging ISBI Mid-November April / May
Machine Learning for Health (Symposium) ML4H Late August Early December

N/B: Dates are based on historical patterns. More From here: https://github.com/khairulislam/ML-conferences?tab=readme-ov-file Conference Acceptance rates: https://github.com/lixin4ever/Conference-Acceptance-Rate

About

Relevant works and papers for image classification, object detection, image captioning and Scene Understanding

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published