Skip to content

noahred16/DeepTeeth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeepTeeth

Tooth disease classification using deep learning on the DENTEX dataset.

Sample images from each class

Getting started

Install dependencies and run the data preprocessing scripts:

pip install -r requirements.txt
python split_data.py 
python augment_images.py 
python balance_training_data.py 

File structure after running the above scripts:

data (5311)
data_balanced_train (12704 - oversampled)
data_train (3686 - 70%)
data_validation (791 - 15%)
data_test (794 - 15%)

This should result in a 70/15/15 train/validation/test split into the data_train, data_validation, and data_test directories.

A version of the training set that been balanced by oversampling the minority classes is labeled data_balanced_train. The class distribution in the balanced training set is shown below: Balanced Training Set Class Distribution

The test set class distribution is shown below: Test Set Class Distribution

The super classes were created from the following mapping:

SUPER_CLASSES = {
    "Caries": ["Caries", "CariesTest"],
    "DeepCaries": ["DeepCaries", "Curettage"],
    "Impacted": ["Impacted"],
    "Lesion": ["PeriapicalLesion", "Lesion"],
    "RootCanal": ["RootCanal"],
    "Healthy": ["Intact"],
}
EXCLUDED_CLASSES = ["Extraction", "Fracture"]

Dataset

The DENTEX dataset has 3 types of training data:

  • quadrant only (693 images)
  • quadrant + enumeration (634 images)
  • quadrant + enumeration + diagnosis (1005 images)
  • 4th bonus type: 1571 unlabeled images

For our classification task, we only care about the 3rd type: quadrant + enumeration + diagnosis (1005 images).

There are also validation (50 images) and test (250 images) sets.

The training and validation sets use the 4 labels:

  • Caries
  • Deep Caries
  • Periapical Lesion
  • Impacted Tooth

The test set which is stored under the DENTEX/disease directory uses 9 labels (which have been translated to English). With a bit of speculation, we can make use of some of these images by mapping them to the 4 labels we care about. Interestingly, we now have some samples of "healthy" teeth (labeled as "Intact") which we may consider using as a 5th class.

  • Caries
    • Caries
  • Deep Caries
    • Curettage
  • Periapical Lesion
    • Lesion
  • Impacted Tooth
    • Impacted
  • Healthy (potential 5th class)
    • Intact
  • Other (potential 6th class, probably not useful)
    • RootCanal
    • Extraction
    • Fracture

Simple CNN Baseline

A lightweight CNN implementation for quick experimentation and baseline performance testing.

Architecture

  • 2 convolutional layers (16 → 32 channels)
  • 3 max pooling layers
  • 2 fully connected layers (64 hidden units)
  • ~1.3M parameters

Training

python simple_cnn.py

Configuration:

  • Input: 160x256 RGB images
  • Batch size: 32
  • Epochs: 5
  • Optimizer: Adam (lr=0.001)
  • Device: CUDA if available, otherwise CPU

Output

  • Model: models/simple_cnn_best.pth (best validation accuracy)
  • Metrics: metrics/simple_cnn_results.txt (training history, test results)
  • Confusion Matrix: metrics/simple_cnn_confusion_matrix.png

EDA - Exploratory Data Analysis

The eda.py script does some basic EDA and generates a pie chart of the class distribution in the training set. In our anaylsis the smallest two classes (Fracture and Extraction) were excluded due to their very low representation in the dataset. Pie chart of class distribution in training set Core 4 Class Distribution

Total images: 5311
Class distribution:
  Caries: 2290 (43.1%)
  Impacted: 865 (16.3%)
  CariesTest: 747 (14.1%)
  DeepCaries: 610 (11.5%)
  Curettage: 265 (5.0%)
  PeriapicalLesion: 167 (3.1%)
  RootCanal: 161 (3.0%)
  Intact: 91 (1.7%)
  Lesion: 75 (1.4%)
  Extraction: 29 (0.5%)
  Fracture: 11 (0.2%)

Source distribution:
  train: 3529 (66.4%)
  validation: 182 (3.4%)
  test: 1600 (30.1%)

All files in the data directory follow the format: sourcetype_classname_idx_imagefilename.png where sourcetype is one of "train", "val", or "test"

The "source" name is how the data was organized in huggingface, but it does not necessarily represent how we will do our training/validation/testing splits.

DENTEX

File Structure:

├── README.md
├── data
│   ├── validation_PeriapicalLesion_1_val_14.png
│   ├── validation_PeriapicalLesion_1_val_14.png
│   └── validation_PeriapicalLesion_9_val_38.png
├── eda.py
├── load_test_data.py
└── load_train_data.py

Setup

Source: Dental Enumeration and Diagnosis on Panoramic X-rays Challenge DENTEX

prereq: git-lfs installed.
alternatively, download the zip files directly from the DENTEX dataset page or use the huggingface datasets library.

git clone https://huggingface.co/datasets/ibrahimhamamci/DENTEX
cd DENTEX
unzip training_data.zip
unzip validation_data.zip
unzip test_data.zip

File Structure:

DENTEX
├── disease
│   ├── input
│   └── label
├── test_data.zip
├── training_data
│   ├── quadrant
│   ├── quadrant-enumeration-disease
│   ├── quadrant_enumeration
│   └── unlabelled
├── training_data.zip
├── validation_data
│   └── quadrant_enumeration_disease
├── validation_data.zip
└── validation_triple.json

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •