performance improvements for spot location refinement code #96

ellenemerson · 2025-09-06T00:35:29Z

also pinning some packages in requirements.txt to be able to get it to install locally - let me know if you think any of these are too restrictive!

…ome packages in requirements.txt

rossbar

These seem like very nice improvements @ellenemerson ! These operations look equivalent to me, but of course it'd be a good idea to test on a pipeline with real data to ensure you get the same result!

I left one comment re: defaultdict, which is really just a "hey, check out this cool thing" comment and not at all important to the changes at hand.

Re: dependencies - that seems fine to me; the dependency situation is only going to get worse and worse as time goes on so finding the set of pins that keeps things working for you locally is 👍 for me. Fixing the problem outright is a bigger project!

rossbar · 2025-09-08T21:31:11Z

deepcell_spots/applications/polaris.py

+            # Pre-create a dictionary for fast lookups
+            lookup_dict = {}
+            for idx, row in df_results.iterrows():
+                key = (row['batch_id'], row['x'], row['y'])
+                if key not in lookup_dict:
+                    lookup_dict[key] = []
+                lookup_dict[key].append(idx)


This shouldn't affect performance at all, but one cool utility that I'm constantly using for situations like this is defaultdict. It essentially allows you to skip the if not in lookup as the default value for missing keys can be made an empty list (or set, or other container, etc.). So this could be something like (moving the import to a more appropriate location):

Suggested change

# Pre-create a dictionary for fast lookups

lookup_dict = {}

for idx, row in df_results.iterrows():

key = (row['batch_id'], row['x'], row['y'])

if key not in lookup_dict:

lookup_dict[key] = []

lookup_dict[key].append(idx)

# Pre-create a dictionary for fast lookups

from collections import defaultdict

lookup_dict = defaultdict(list)

for idx, row in df_results.iterrows():

lookup_dict[(row['batch_id'], row['x'], row['y'])].append(idx)

performance improvements for spot location refinement code; pinning s…

77e4b0f

…ome packages in requirements.txt

ellenemerson requested review from elaubsch and rossbar September 6, 2025 00:35

rossbar approved these changes Sep 8, 2025

View reviewed changes

ellenemerson removed the request for review from elaubsch September 8, 2025 22:00

updating lookup dict to use collections' defaultdict

1ba3a2f

ellenemerson merged commit 319b300 into master Sep 8, 2025
5 of 9 checks passed

ellenemerson deleted the optimize-spots-refinement branch September 8, 2025 23:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

performance improvements for spot location refinement code #96

performance improvements for spot location refinement code #96

Uh oh!

ellenemerson commented Sep 6, 2025

Uh oh!

rossbar left a comment

Uh oh!

rossbar Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

performance improvements for spot location refinement code #96

performance improvements for spot location refinement code #96

Uh oh!

Conversation

ellenemerson commented Sep 6, 2025

Uh oh!

rossbar left a comment

Choose a reason for hiding this comment

Uh oh!

rossbar Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants