The goal of this project is to propose a method to automatically detect and classify standard planes (SP) in liver ultrasound (US) videos. The operator - commonly a nurse - should detect the most informative images within each US video, to later provide them to physicians for diagnostic purposes. Such frames are known as standard planes and are identified by the presence of specific anatomical structures within the image. Given the nature of this imaging technique (being highly noisy and subject to device settings and manual skills of the operator) and the resulting challenge of recognising anatomical structures (often not clearly visible even by expert physicians), the standard plane detection task is non-trivial and strongly operator-dependent. Nonetheless, one aspect that seems to aid expert users is the temporal evolution of the data within the performed motion scan (combined with some prior background knowledge of human anatomy). Our aim is hence to develop a deep learning pipeline for the automatic classification SP from single frames and sequences of frames within US videos.
We start by following a 2D approach with a 2D CNN architecture named SonoNet [1], which proved to achieve state-of-the-art results on the US fetal standard plane detection task. As a first approach concerning the usage of time information, instead, we propose to employ a 3D CNN model to exploit both spatial and temporal information on a short timescale. Specifically, we implemented a 3D extension of the mentioned SonoNet architecture. Extending convolutions to the third (temporal) domain should aid the network in solving ambiguous situations where some parts of the anatomical structures are not clearly visible (or partly occluded) within a single frame, though they could appear in nearby frames. Based on [2] we also implemented SonoNet(2+1)D model. It is a 3D version of SonoNet2D, but each 2D convolution layers is replaced with a SpatioTemporal block, which consists of a 2D convolution layer followed by a 1D convolution layer. In this way, we have a model which is comparable to the SonoNet2D, in terms of trainable parameters, but with a number of non-linear operations that is double with respect to the 3D model, potentially leading to the best results.
This folder contains the main scripts for defining and training 2D-SonoNet architectures, as well as 3D-SonoNet and (2+1)D-SonoNet extensions. See the "usage" note at the beginning of each of them.
- sononet2d-traintest.py: train and test the 2D SonoNet-16/32/64 model.
- sononet2d-traintest_3d_comparable.py: trains and evaluates the 2D SonoNet-16/32/64 model using the same dataset as the 3D models for direct comparison.
- sononet3d-traintest.py: train and test the 3D SonoNet-16/32/64 model.
- temporal_test.py: loads a test video and visualises the predictions of different models for temporal comparison.
- 2d_vs_3d: computes per-video accuracy on the test set for both 2D and 3D models, and calculates the average accuracy..
Such scripts use code from the following Python packages:
utils: This folder contains Python files with many general-purpose utility functions.
- augments.py: defines data augmentation methods for US images.
- datareader.py: defines a class for loading either the 2D or the 3D version of our dataset.
- datasplit.py: defines functions for splitting the dataset into training and validation sets. Note that the splitting logic is tailored to our specific scenario, so using your own custom split method is recommended.
- iterators.py: define basic training and testing loops for a single epoch.
- runner.py: defines train and test functions.
- visualize.py: defines a useful function for plotting a confusion matrix and saving it as a PNG image.
_IMPORTANT: _ Change the method split3d_train_validation and split2d_train_validation based on your data and needs.
The data sononet2d: This folder contains the 2D implementation of the SonoNet-16/32/64 model.
- models.py: defines the SonoNet2D class. The number of features in the hidden layers of the network can be set by choosing between 3 configurations (16, 32, and 64). The network may be used in "classification mode." (the adaptation layer gives the output) or for "feature extraction" (no adaptation layer is defined, and the output is the set of features in the last convolutional layer. This last functionality is achieved by setting the features_only parameter to True (useful to check on which image parts the network is focusing its attention). Finally, by setting the train_classifier_only parameter to True, it is possible to freeze learning in all convolutional layers (only the adaptation layer will be trained).
- remap-weights.py: convert SonoNet weights (downloaded from the reference repository) to be compatible with our implementation of the model.
sononet3d: This folder contains the 3D and (2+1)D extensions of the standard SonoNet-16/32/64 model implementation.
- models.py: defines the SonoNet3D and SonoNet(2+1)D classes. For the SonoNet3D, all 2D convolutional and pooling layers are changed to their 3D extension. Instead, in the (2+1)D model, the 3D convolutional layers are replaced by a SpatioTemporal block, where the standard convolution is decomposed into a 2D convolution followed by a 1D convovolution. As for the 2D case, the number of features in the hidden layers of the network can be set by choosing between 3 configurations (16, 32, and 64).
logs / weights4sononet2d / FetalDB: pretrained weights of all SonoNet configurations (16, 32, and 64 initial features) from the FetalDB dataset. Each configuration has its own folder (SonoNet-16, SonoNet-32, and SonoNet-64) where weights are stored in "ckpt_best_loss.pth" file. Such files were obtained from those denoted as "old", which are the ones provided in this repository (same weights but not directly compatible with our model definition).
sononet2d-traintest.py:
python sononet2d-traintest.py -data_dir 'path_to_data' -log_dir 'logs/sononet2d' -gpu 0 -num_features 32 -batch_size 128 -lr 0.00001 -max_num_epochs 200 -patience 10 -lr_sched_patience 4 -weight_decay 0.0001 -seed 21 --sampler --augmentationsononet2d-traintest_3d_comparable.py:
python sononet2d-traintest_3d_comparable.py -data_dir 'path_to_data' -log_dir 'logs/sononet2d_3d_comparable' -gpu 0 -num_features 32 -clip_len 10 -batch_size 128 -lr 0.00001 -max_num_epochs 200 -patience 10 -lr_sched_patience 4 -weight_decay 0.0001 -seed 21 --sampler --augmentationsononet3d-traintest.py:
python sononet3d-traintest.py -data_dir 'path_to_data' -log_dir 'logs/sononet3d' -gpu 0 -num_features 32 -clip_len 10 -batch_size 128 -lr 0.00001 -max_num_epochs 200 -patience 10 -lr_sched_patience 4 -weight_decay 0.0001 -seed 21 --sampler --augmentationif you want use SoneNet (2+1)D version add to the command line the argument --modify_3d
temporal_test.py:
python temporal_test.py -data_dir 'path_to_data' -log_dir 'logs/temporal_test' -model_dir_2d 'logs/sononet2d_3d_comparable' -model_dir_3d 'logs/sononet3d' -model_dir_2_1d 'logs/sononet_2_1d' -gpu 0 -num_features 32 -clip_len 10
- 2d_vs_3d:
python 2d_vs_3d.py -data_dir 'path_to_data' -log_dir 'logs/2d_vs_3d' -model_dir_2d 'logs/sononet2d_3d_comparable' -model_dir_3d 'logs/sononet3d' -model_dir_2d1d 'logs/sononet_2_1d' -gpu 0 -num_features 32 -clip_len 10 -batch_size 128
To run the experiments correctly, the dataset directory must follow the structure below:
data/
│
├── classes.json
│
├── train/
│ ├── labels/
│ │ ├── <video_name_1>/
│ │ │ ├── <video_name_1>_<frame_idx>.txt
│ │ │ ├── <video_name_1>_<frame_idx>.txt
│ │ │ └── ...
│ │ └── <video_name_2>/
│ │ └── ...
│ │
│ └── videos/
│ ├── <video_name_1>/
│ │ ├── <video_name_1>_<frame_idx>.<ext>
│ │ ├── <video_name_1>_<frame_idx>.<ext>
│ │ └── ...
│ └── <video_name_2>/
│ └── ...
│
└── test/
├── labels/
│ ├── <video_name_1>/
│ │ ├── <video_name_1>_<frame_idx>.txt
│ │ ├── <video_name_1>_<frame_idx>.txt
│ │ └── ...
│ └── <video_name_2>/
│ └── ...
│
└── videos/
├── <video_name_1>/
│ ├── <video_name_1>_<frame_idx>.<ext>
│ ├── <video_name_1>_<frame_idx>.<ext>
│ └── ...
└── <video_name_2>/
└── ...
-
classes.json
This file must be located directly inside the data/ directory. It contains a dictionary where:
-
each key is a class name (string),
-
each value is a unique integer ID.
-
-
labels/
Contains one subfolder per video, named exactly as the video. Each subfolder includes one .txt file per frame. File naming format: <video_name>_<frame_idx>.txt
-
videos/
Contains one subfolder per video, using the same video name as in labels/.
Each subfolder includes all frames of the video.
File naming format matches the labels:
<video_name>_<frame_idx>.png
[1] Baumgartner C.F., Kamnitsas K., Matthew J., Fletcher T.P., Smith S., Koch L.M., Kainz B., and Rueckert D. (2017). SonoNet: real-time detection and localisation of fetal standard scan planes in freehand ultrasound. IEEE transactions on medical imaging, 36(11), pp.2204-2215. [link]
[2] Tran, D., Wang, H., Torresani, L., Ray, J., LeCun, Y., & Paluri, M. (2018). A closer look at spatiotemporal convolutions for action recognition. In Proceedings of the IEEE conference on Computer Vision and Pattern Recognition (pp. 6450-6459). [link]