-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Our repository is rather large/bloated sitting at around ~1.6GB which means when users clone it can take a while to download.
I found a neat solution to listing "blobs" by size (see here) and running it shows that there are a bunch of notebooks and output files that have been committed to branches at some point in time.
A sample of .topostats and .ipynb files >=3MB...
❱ git rev-list --objects --all --missing=print |
git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |
sed -n 's/^blob //p' |
sort --numeric-sort --key=2 |
cut -c 1-12,41- |
$(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest |
grep -v tests |
egrep "topostats$|ipynb$"
...
8668:11353:2dfa202bcab1 3.0MiB perovskite/perovskite_unet.ipynb
8671:11357:6546e9b30ae7 3.2MiB perovskite/perovskite_unet.ipynb
8672:11358:f2925984aac5 3.3MiB Bradley/bradley.ipynb
8673:11359:32a80761a13b 3.3MiB perovskite/perovskite_unet.ipynb
8674:11360:03fe3952c8ed 3.3MiB Flattening development/flattening.ipynb
8675:11363:1f2f67a87de2 3.4MiB topostats/networks/Segmentation.ipynb
8676:11364:9744d14d2e30 3.4MiB hariborings/cas9_analysis_again.ipynb
8677:11365:ed95e27c17fa 3.4MiB Flattening development/flattening.ipynb
8684:11372:75f9ba9f146a 3.5MiB perovskite/perovskite_unet.ipynb
8703:11391:1ac615bdf141 3.5MiB Bradley/bradley.ipynb
8704:11392:5137313f596c 3.5MiB perovskite-grain-analysis.ipynb
8705:11410:269c2456b696 3.8MiB topostats/networks/Segmentation.ipynb
8706:11411:ba7757181286 4.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00061.topostats
8707:11412:f1160d593147 4.1MiB Bradley/bradley.ipynb
8708:11413:fb80c3b1c804 4.2MiB topostats/networks/Node_Coordination.ipynb
8709:11414:915f2847090d 4.4MiB perovskite/perovskite_unet.ipynb
8710:11418:1ab92445346b 4.6MiB perovskite/perovskite_unet.ipynb
8711:11420:a02cd06936ef 4.8MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00086.topostats
8712:11421:a09ebea4fe7c 4.8MiB perovskite/perovskite_unet.ipynb
8713:11422:98cc2850b224 4.8MiB perovskite/perovskite_unet.ipynb
8715:11428:2e51f15fb317 5.0MiB Bradley/bradley.ipynb
8716:11429:0ed5e5ff96b2 5.0MiB Bradley/bradley.ipynb
8720:11433:0586fbca9334 5.2MiB Bradley/bradley.ipynb
8728:11441:7f289265a5b9 5.6MiB Bradley/bradley.ipynb
8729:11442:6401131c7e63 5.6MiB perovskite/perovskite_unet.ipynb
8731:11445:9d74f73a6776 5.8MiB Bradley/sylvia_blur_edge_detection.ipynb
8732:11446:ef5320fa25a3 5.9MiB topostats/networks/Segmentation.ipynb
8733:11447:b8a48e6c49e5 5.9MiB topostats/networks/Blur_Edge_Detection.ipynb
8734:11448:04daa067f7fc 5.9MiB topostats/networks/Segmentation.ipynb
8735:11449:12cd8312dbf7 5.9MiB Bradley/Blur_Edge_Detection.ipynb
8736:11450:4df149fd246f 5.9MiB topostats/networks/Blur_Edge_Detection.ipynb
8737:11451:3744c72887da 5.9MiB topostats/networks/Segmentation.ipynb
8738:11452:49c3ff967b24 6.0MiB topostats/networks/Blur_Edge_Detection.ipynb
8739:11455:52db75ac8b8b 6.1MiB topostats/networks/ARCHIVE_Blur_Edge_Detection.ipynb
8740:11456:a7d89d37b98d 6.2MiB segmentation/haribonet_multiclass.ipynb
8744:11461:c30f5f00055f 6.2MiB Bradley/sylvia_blur_edge_detection.ipynb
8745:11462:f894920e22b5 6.3MiB Bradley/sylvia_blur_edge_detection.ipynb
8746:11463:029931c2c96d 6.3MiB Bradley/Segmentation.ipynb
8747:11464:365b3a98a819 6.4MiB topostats/networks/bradley.ipynb
8748:11465:72566bfe0d7c 6.4MiB Bradley/Node_Coordination.ipynb
8749:11466:70e35547a8d9 6.5MiB Bradley/sylvia_blur_edge_detection.ipynb
8751:11468:e54ddb7fede2 6.5MiB Bradley/sylvia_blur_edge_detection.ipynb
8752:11469:64b6bb1ecb2c 6.5MiB Bradley/sylvia_blur_edge_detection.ipynb
8753:11470:916cc4d0fc88 6.5MiB Bradley/sylvia_blur_edge_detection.ipynb
8754:11471:7fa4b43d548a 6.6MiB Bradley/sylvia_blur_edge_detection.ipynb
8755:11472:c36f23e28eec 6.6MiB perovskite/segmentation.ipynb
8761:11478:a50d57e808d5 6.7MiB Bradley/Node_Coordination.ipynb
8764:11481:10bb2d420a30 6.8MiB Bradley/Blur_Edge_Detection.ipynb
8766:11483:5f2ece6038d8 7.1MiB Bradley/Blur_Edge_Detection.ipynb
8767:11485:c82c40355ea4 7.1MiB Bradley/Blur_Edge_Detection.ipynb
8768:11486:683ca9a23604 7.2MiB perovskite/segmentation.ipynb
8769:11487:eabefe47a5ba 7.5MiB Bradley/sylvia_blur_edge_detection.ipynb
8770:11488:74132e2c6d8d 7.5MiB Bradley/sylvia_blur_edge_detection.ipynb
8771:11491:613ecdec117e 7.6MiB perovskite/segmentation.ipynb
8772:11492:34e6020ac98e 7.6MiB Bradley/sylvia_blur_edge_detection.ipynb
8773:11495:e374517d3034 7.6MiB Bradley/sylvia_blur_edge_detection.ipynb
8774:11496:52024c5527ce 7.7MiB perovskite/segmentation.ipynb
8775:11497:b1838b9c5f31 7.7MiB Flattening development/flattening.ipynb
8776:11498:a235e36d3156 7.7MiB Flattening development/flattening.ipynb
8777:11501:d66aeb5400df 7.7MiB topostats/networks/bradley_blur_edge_detection.ipynb
8778:11502:0a118c15262b 7.7MiB Bradley/sylvia_blur_edge_detection.ipynb
8779:11504:21a52b225ecd 7.9MiB perovskite/segmentation.ipynb
8780:11505:e25946102f6d 7.9MiB perovskite/perovskite_unet.ipynb
8781:11507:539eae2585d7 8.0MiB perovskite/segmentation.ipynb
8782:11508:8e2707f37cdc 8.0MiB perovskite/perovskite_unet.ipynb
8785:11511:2fdb12b75f42 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_waterblank.0_00000.topostats
8786:11512:72019a153d6c 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00072.topostats
8787:11513:7a1e8a61be6b 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00100.topostats
8788:11514:c6e9f0c688cb 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_2xkclshelterinbufferblank.0_00002.topostats
8789:11515:dcac6c6ed2ea 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_nickelblank.0_00001.topostats
8790:11516:bfd3638d07cb 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00094.topostats
8791:11517:0e2d55560bc6 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00028.topostats
8792:11518:25b07d14b619 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00034.topostats
8793:11519:297cb82a449d 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00103.topostats
8794:11520:2d361dd04b7a 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00092.topostats
8795:11521:3c9e5c643a7a 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00026.topostats
8796:11522:43fed85635cc 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00052.topostats
8797:11523:4b37b0975821 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00080.topostats
8798:11524:4c3f1ac695f5 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00085.topostats
8799:11525:56dec9d169ef 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00091.topostats
8800:11526:5b3e5f5771ff 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00083.topostats
8801:11527:674e01501cd7 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00057.topostats
8802:11528:6ab6a9388c90 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00017.topostats
8803:11529:6dacc6f898b8 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00090.topostats
8804:11530:7620fb8341b5 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00101.topostats
8805:11531:7d90e889f8c7 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00047.topostats
8806:11532:8878978cd976 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00099.topostats
8807:11533:a3cb172b1dc0 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00043.topostats
8808:11534:a62b494dc066 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00046.topostats
8809:11535:c2b556df56cd 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00035.topostats
8810:11536:d239c4e10abb 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00051.topostats
8811:11537:d31e570e8712 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00054.topostats
8812:11538:da5c5b002a2f 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00073.topostats
8813:11539:f0cf8c13badd 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00023.topostats
8814:11540:233a50789add 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00097.topostats
8815:11541:3000e6c2b559 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00068.topostats
8816:11542:34d441609866 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00077.topostats
8817:11543:6ba667c8e1e5 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00037.topostats
8818:11544:74fd70ff10a3 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00015.topostats
8819:11545:cd5437b2e6b9 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00004.topostats
8820:11546:e43b1cfb2408 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00098.topostats
8821:11547:ea826d0b336e 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00087.topostats
8822:11548:6e8ab0d74858 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00041.topostats
8823:11549:255d8c10f3ef 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00065.topostats
8824:11550:53703ed33127 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00019.topostats
8825:11551:1a587ac598b4 8.0MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00045.topostats
8826:11552:03ab1831ccc9 8.1MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00010.topostats
8827:11555:b5a7387f22fc 8.2MiB concat/processed/20230526_puc19_tube1_24hr_mg.0_00003.topostats
8828:11556:3b57c10bf43a 8.2MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00031.topostats
8829:11557:7af42e23f2f0 8.2MiB perovskite/plotting.ipynb
8830:11558:c166f182e1f8 8.2MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF2_25nM_5nM_piCOz_TEL12_ni_eph.0_00074.topostats
8831:11559:0d8af26f331f 8.2MiB Bradley/Blur_Edge_Detection.ipynb
8832:11560:81020fa4dddb 8.3MiB 20240611_TRF1_TRF2/laura_output/processed/20240611_TRF1_25nM_5nM_piCOz_TEL12_ni_eph.0_00012.topostats
8833:11563:c87044691dcc 8.9MiB topostats/networks/bradley_blur_edge_detection.ipynb
8834:11566:b14b8bd66fb8 9.1MiB perovskite/segmentation.ipynb
8835:11568:896ebf213a1c 12MiB hariborings/trace_ring.ipynb
8836:11570:c100278f479e 15MiB Flattening development/flattening.ipynb
8838:11576:01c2cbd73194 18MiB Flattening development/flattening.ipynb
8839:11580:35247e47e029 21MiB hariborings/dna_only_analysis.ipynb
8841:11582:0158023779f6 24MiB atst/processed/cat.topostats
8842:11583:7d3aedf69b43 24MiB atst/processed/minicircle.topostats
8843:11587:b2148b752538 28MiB topostats/simulations/Structures_2_AFM_new.ipynb
8844:11593:343ad60083e4 34MiB Protein_test_data/minicircle/processed/minicircle.topostats
8846:11597:14af9a487a6c 40MiB concat/processed/20230526_puc19_tube1_24hr_mg.0_00002.topostats
8847:11600:2aa020c56ce2 61MiB segmentation/minicircles_single_class.ipynb
github/git-sizer: Compute various size metrics for a Git repository, flagging those that might cause
problems also gives insight and metrics.
Actions
Clean the repository
We should remove all of these files from the repository to reduce the size and make cloning faster.
- Unused branches should be deleted (from GitHub)
- Remove all
.ipynbthat are not in thenotebooksdirectory. - Remove all
.topostats/.spm/.gwy/.ibw/ etc.
The BFG Repi-Cleaner can be used to excise files from the repository.
Ignore Notebooks
The repository does use notebooks (formerly Jupyter .ipynb now Marimo notebooks which are plain
.py files). However, it was never intended to be used for storing work-in-progress in this manner.
To which end I think we should explicitly block them from being included in the repository by adding .ipynb to
.gitignore.
Ignore AFM images
Similarly we have a number of .topostats and other raw AFM images (.gwy/.spm/etc.) that have been included in the
history at some point. These too should be added to `.gitignore** so they are not included by accident in the
future. Similarly any new file formats that are supported should also be added.
NB - It is possible to explicitly include files that have been ignored, e.g. when expanding the test suite.