Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
3399b8a
Update ReadME, fix matchms removing add_losses
tornikeo Jul 26, 2024
5c703b2
Update the colab tutorial
tornikeo Jul 26, 2024
0759225
Add flake8
tornikeo Jul 26, 2024
4d5557b
Add a visual guide figure for SimMS
tornikeo Jul 26, 2024
d02fc5e
Fix ReadME type
tornikeo Jul 26, 2024
c1c607d
Include BLINK comparison, fix ReadME
tornikeo Oct 24, 2024
74779cd
Update BLINK benchmark
tornikeo Dec 2, 2024
d8d07a5
Update citation
tornikeo Dec 14, 2024
42c8db9
Loosen numba requirements in favor of matchms
tornikeo Dec 14, 2024
c513561
Re-add HF spaces demo
tornikeo Dec 15, 2024
6438cb5
Fix incorrectly set up test for FP comparsion
tornikeo Dec 15, 2024
57dee09
Fix typo in readme
tornikeo Dec 23, 2024
4babe67
Respect NUMBA SIM env var in CPU tests
tornikeo Dec 25, 2024
53bbf03
Change speed with comparisons/s
tornikeo Jan 10, 2025
b364bb1
Remove non-vital dependencies
tornikeo Jan 15, 2025
b8f194d
Don't mention experimental CLI in the readme
tornikeo Jan 15, 2025
4d6e8d0
Add one doctest for cosine greedy
tornikeo Jan 16, 2025
ee46d71
Add doctests
tornikeo Jan 16, 2025
2f431f8
Fix tests picking up data dirs
tornikeo Jan 16, 2025
bd20e91
Rely on matchms for numba dependency
tornikeo Jan 16, 2025
c79f4b7
Merge branch 'main' into development
tornikeo Jan 16, 2025
d5c440b
Add CudaFingerPrint doctest
tornikeo Jan 16, 2025
9343e21
Make figure visuals more consistent
tornikeo Jan 16, 2025
6a571a6
Update pyporoject.toml to best practices
tornikeo Jan 16, 2025
aae53ce
Tiny notebook fixup
tornikeo Jan 16, 2025
374c3a7
Remove pooch from dependencies
tornikeo Jan 17, 2025
b0a6d95
Bump version
tornikeo Jan 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ["3.9", "3.10", "3.11"]
python-version: ["3.9"] #, "3.10", "3.11"]

steps:
- uses: actions/checkout@v3
Expand Down
8 changes: 7 additions & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,9 @@
repos:
- repo: https://github.com/PyCQA/autoflake
rev: v2.2.1
hooks:
- id: autoflake
args: [--remove-all-unused-imports, --in-place]
- repo: https://github.com/nbQA-dev/nbQA
rev: 0.11.0 # Use the latest version
hooks:
Expand All @@ -17,4 +22,5 @@ repos:
- repo: https://github.com/PyCQA/flake8
rev: 7.0.0
hooks:
- id: flake8
- id: flake8
args: ["--ignore=E501,W503"]
36 changes: 12 additions & 24 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,30 +26,33 @@
</tr>
</table>

Calculate similarity between large number of mass spectra using a GPU. SimMS aims to provide very fast replacements for commonly used similarity functions in [matchms](https://github.com/matchms/matchms/).
`
Calculate the similarity between a large number of mass spectra using a GPU. SimMS aims to provide very fast replacements for commonly used similarity functions in [matchms](https://github.com/matchms/matchms).

<div style='text-align:center'>

![img](./assets/perf_speedup.svg)

</div>


# How SimMS works, in a nutshell

![alt text](assets/visual_guide.png)

Comparing large sets of mass spectra can be done in parallel, since scores can be calculated independent of the other scores. By leveraging a large number of threads in a GPU, we created a GPU program (kernel) that calculates a 4096 x 4096 similarity matrix in a fraction of a second. By iteratively calculating similarities for batches of spectra, SimMS can quickly process datasets much larger than the GPU memory. For details, visit the [preprint](https://www.biorxiv.org/content/biorxiv/early/2024/07/25/2024.07.24.605006.full.pdf).
Comparing large sets of mass spectra can be done in parallel since scores can be calculated independently of each other.
By leveraging a large number of threads in a GPU, we created a GPU program (kernel) that calculates a 4096x4096
similarity matrix in a fraction of a second.
By iteratively calculating similarities for batches of spectra, SimMS can quickly process datasets much larger than the GPU's memory.
For details, visit the [preprint](https://www.biorxiv.org/content/biorxiv/early/2024/07/25/2024.07.24.605006.full.pdf).

# Quickstart

## Hardware

Any GPU [supported](https://numba.pydata.org/numba-doc/dev/cuda/overview.html#requirements) by numba can be used. We tested a number of GPUs:
Any GPU [supported](https://numba.pydata.org/numba-doc/dev/cuda/overview.html#requirements) by Numba can be used. We tested a number of GPUs:

- RTX 2070, used on local machine
- RTX 2070, used on a local machine
- T4 GPU, offered for free on Colab
- RTX4090 GPU, offered on vast.ai
- RTX 4090 GPU, offered on vast.ai
- Any A100 GPU, offered on vast.ai

The `pytorch/pytorch:2.2.1-cuda12.1-cudnn8-devel` docker [image](https://hub.docker.com/layers/pytorch/pytorch/2.2.1-cuda12.1-cudnn8-devel/images/sha256-42204bca460bb77cbd524577618e1723ad474e5d77cc51f94037fffbc2c88c6f?context=explore) was used for development and testing.
Expand Down Expand Up @@ -84,21 +87,6 @@ scores = calculate_scores(
scores.scores_by_query(queries[42], 'CudaCosineGreedy_score', sort=True)
```

## Use as a CLI

```sh
pangea-simms --references library.mgf --queries queries.mgf --output_file scores.pickle \
--tolerance 0.01 \
--mz_power 1 \
--intensity_power 1 \
--batch_size 512 \
--n_max_peaks 512 \
--match_limit 1024 \
--array_type numpy \
--sparse_threshold 0.5 \
--method CudaCosineGreedy
```

# Supported similarity functions

- `CudaModifiedCosine`, equivalent to [ModifiedCosine](https://matchms.readthedocs.io/en/latest/api/matchms.similarity.ModifiedCosine.html)
Expand Down Expand Up @@ -134,15 +122,15 @@ pip install git+https://github.com/PangeAI/simms

The `pytorch/pytorch:2.2.1-cuda12.1-cudnn8-devel` has nearly everything you need. Once inside, do:

```
```sh
pip install git+https://github.com/PangeAI/simms
```

## Run on vast.ai

Use [this template](https://cloud.vast.ai/?ref_id=51575&template_id=f45f6048db515291bda978a34e908d09) as a starting point, once inside, simply do:

```
```sh
pip install git+https://github.com/PangeAI/simms
```

Expand Down
Loading
Loading