PangeAI · tornikeo · Jul 26, 2024 · Jul 26, 2024 · Jul 26, 2024 · Jul 26, 2024
diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml
@@ -16,7 +16,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        python-version: ["3.9", "3.10", "3.11"]
+        python-version: ["3.9"] #, "3.10", "3.11"]
 
     steps:
     - uses: actions/checkout@v3

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,4 +1,9 @@
 repos:
+  - repo: https://github.com/PyCQA/autoflake
+    rev: v2.2.1
+    hooks:
+    -   id: autoflake
+        args: [--remove-all-unused-imports, --in-place]
   - repo: https://github.com/nbQA-dev/nbQA
     rev: 0.11.0  # Use the latest version
     hooks:
@@ -17,4 +22,5 @@ repos:
   - repo: https://github.com/PyCQA/flake8
     rev: 7.0.0
     hooks:
-    -   id: flake8
+    -   id: flake8
+        args: ["--ignore=E501,W503"]
diff --git a/README.md b/README.md
@@ -26,30 +26,33 @@
 </tr>
 </table>
 
-Calculate similarity between large number of mass spectra using a GPU. SimMS aims to provide very fast replacements for commonly used similarity functions in [matchms](https://github.com/matchms/matchms/).
-`
+Calculate the similarity between a large number of mass spectra using a GPU. SimMS aims to provide very fast replacements for commonly used similarity functions in [matchms](https://github.com/matchms/matchms).
+
 <div style='text-align:center'>
 
   ![img](./assets/perf_speedup.svg)
 
 </div>
 
-
 # How SimMS works, in a nutshell
 
 ![alt text](assets/visual_guide.png)
 
-Comparing large sets of mass spectra can be done in parallel, since scores can be calculated independent of the other scores. By leveraging a large number of threads in a GPU, we created a GPU program (kernel) that calculates a 4096 x 4096 similarity matrix in a fraction of a second. By iteratively calculating similarities for batches of spectra, SimMS can quickly process datasets much larger than the GPU memory. For details, visit the [preprint](https://www.biorxiv.org/content/biorxiv/early/2024/07/25/2024.07.24.605006.full.pdf).
+Comparing large sets of mass spectra can be done in parallel since scores can be calculated independently of each other. 
+By leveraging a large number of threads in a GPU, we created a GPU program (kernel) that calculates a 4096x4096
+ similarity matrix in a fraction of a second. 
+By iteratively calculating similarities for batches of spectra, SimMS can quickly process datasets much larger than the GPU's memory. 
+For details, visit the [preprint](https://www.biorxiv.org/content/biorxiv/early/2024/07/25/2024.07.24.605006.full.pdf).
 
 # Quickstart
 
 ## Hardware
 
-Any GPU [supported](https://numba.pydata.org/numba-doc/dev/cuda/overview.html#requirements) by numba can be used. We tested a number of GPUs:
+Any GPU [supported](https://numba.pydata.org/numba-doc/dev/cuda/overview.html#requirements) by Numba can be used. We tested a number of GPUs:
 
-- RTX 2070, used on local machine
+- RTX 2070, used on a local machine
 - T4 GPU, offered for free on Colab
-- RTX4090 GPU, offered on vast.ai
+- RTX 4090 GPU, offered on vast.ai
 - Any A100 GPU, offered on vast.ai
 
 The `pytorch/pytorch:2.2.1-cuda12.1-cudnn8-devel` docker [image](https://hub.docker.com/layers/pytorch/pytorch/2.2.1-cuda12.1-cudnn8-devel/images/sha256-42204bca460bb77cbd524577618e1723ad474e5d77cc51f94037fffbc2c88c6f?context=explore) was used for development and testing. 
@@ -84,21 +87,6 @@ scores = calculate_scores(
 scores.scores_by_query(queries[42], 'CudaCosineGreedy_score', sort=True)
 ```
 
-## Use as a CLI
-
-```sh
-pangea-simms --references library.mgf --queries queries.mgf --output_file scores.pickle \
-                    --tolerance 0.01 \
-                    --mz_power 1 \
-                    --intensity_power 1 \
-                    --batch_size 512 \
-                    --n_max_peaks 512 \
-                    --match_limit 1024 \
-                    --array_type numpy \
-                    --sparse_threshold 0.5 \
-                    --method CudaCosineGreedy
-```
-
 # Supported similarity functions
 
 - `CudaModifiedCosine`, equivalent to [ModifiedCosine](https://matchms.readthedocs.io/en/latest/api/matchms.similarity.ModifiedCosine.html)
@@ -134,15 +122,15 @@ pip install git+https://github.com/PangeAI/simms
 
 The `pytorch/pytorch:2.2.1-cuda12.1-cudnn8-devel` has nearly everything you need. Once inside, do:
 
-```
+```sh
 pip install git+https://github.com/PangeAI/simms
 ```
 
 ## Run on vast.ai
 
 Use [this template](https://cloud.vast.ai/?ref_id=51575&template_id=f45f6048db515291bda978a34e908d09) as a starting point, once inside, simply do:
 
-```
+```sh
 pip install git+https://github.com/PangeAI/simms
 ```