Skip to content

Conversation

@ellenemerson
Copy link
Contributor

also pinning some packages in requirements.txt to be able to get it to install locally - let me know if you think any of these are too restrictive!

Copy link
Contributor

@rossbar rossbar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These seem like very nice improvements @ellenemerson ! These operations look equivalent to me, but of course it'd be a good idea to test on a pipeline with real data to ensure you get the same result!

I left one comment re: defaultdict, which is really just a "hey, check out this cool thing" comment and not at all important to the changes at hand.

Re: dependencies - that seems fine to me; the dependency situation is only going to get worse and worse as time goes on so finding the set of pins that keeps things working for you locally is 👍 for me. Fixing the problem outright is a bigger project!

Comment on lines 451 to 457
# Pre-create a dictionary for fast lookups
lookup_dict = {}
for idx, row in df_results.iterrows():
key = (row['batch_id'], row['x'], row['y'])
if key not in lookup_dict:
lookup_dict[key] = []
lookup_dict[key].append(idx)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't affect performance at all, but one cool utility that I'm constantly using for situations like this is defaultdict. It essentially allows you to skip the if not in lookup as the default value for missing keys can be made an empty list (or set, or other container, etc.). So this could be something like (moving the import to a more appropriate location):

Suggested change
# Pre-create a dictionary for fast lookups
lookup_dict = {}
for idx, row in df_results.iterrows():
key = (row['batch_id'], row['x'], row['y'])
if key not in lookup_dict:
lookup_dict[key] = []
lookup_dict[key].append(idx)
# Pre-create a dictionary for fast lookups
from collections import defaultdict
lookup_dict = defaultdict(list)
for idx, row in df_results.iterrows():
lookup_dict[(row['batch_id'], row['x'], row['y'])].append(idx)

@ellenemerson ellenemerson removed the request for review from elaubsch September 8, 2025 22:00
@ellenemerson ellenemerson merged commit 319b300 into master Sep 8, 2025
5 of 9 checks passed
@ellenemerson ellenemerson deleted the optimize-spots-refinement branch September 8, 2025 23:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants