-
Notifications
You must be signed in to change notification settings - Fork 192
Improve performance by reducing redundant dictionary fetches. #336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| if iteration is not None: | ||
| self[key]['iteration'] = iteration | ||
| entry['geno'] = geno | ||
| entry['iteration'] = iteration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't look correct to me, as it doesn't check for None first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shortly before we already perform this check and define it as last_iteration + 0.5 if necessary, so to my understanding it is impossible that it will be None. Maybe the right way to go would be to place an assert + comment to point this out?
But removing the if statement indeed was just a cleanup because I thought it makes things more clearly if we know that we will always have the iteration variable and doesn't have any significant performance benefits.
if iteration is not None:
if iteration > self.last_iteration:
self.last_solution_index = 0
self.last_iteration = iteration
else:
iteration = self.last_iteration + 0.5 # a - hack to get a somewhat reasonable value
|
|
||
| # TODO: insert takes 30% of the overall CPU time, mostly in def key() | ||
| # with about 15% of the overall CPU time | ||
| def insert(self, key, geno=None, iteration=None, fitness=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment was removed, but why? How do we know insert is not expensive anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So before the change the key-function was called every time we fetched the element from the map, which happened three times instead of only once. So the expensive part should go down to 1/3 of what it was before. So much the theory.
I think for my use case the two changes that I made reduced the running time of my whole program by around 30 %. Before the change the insert function took a big part of the time of the overall runtime which is not the case anymore, and it doesn't seem to me that improving it further is worth the effort.
I tried to make a specific test, but the numbers in the end seemed to vary with the type of key etc, and I'm not sure how to reproduce the exact numbers of the comment.
Sorry that I can't give a better answer. If there is any test that I should do I'm happy to run it and write the numbers here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So before the change the key-function was called every time we fetched the element from the map, which happened three times instead of only once. So the expensive part should go down to 1/3 of what it was before. So much the theory.
You are assuming that fetching is the expensive part, which, I guess, you only did in the first place because you saw the comment you deleted?
Maybe make a comment on the todo instead of removing it, as you seem not to know for sure whether your changes actually addressed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, not really. I plotted a callgraph which guided me to the key function and from there back to the insert function. I only realized afterwards, that the comment was about this problem. I don't have the original files anymore, but I just ran a small benchmark with both versions again.
And here the modified version:

Three points to consider about the plot:
I don't now how much overhead the profiling itself added.
I pruned everything that is smaller then 2% of overall runtime, so not all Vertices and Edges are shown, but the numbers itself are unaffected.
The original version had a total runtime of 1.477 s the modified version 1.313 s. For correctness we have to multiply all numbers in the graph of the modified run with a factor of 0.89, because all numbers are relative to the total runtime.
In the first version getitem was called three times for each insert, each time resulting in a call to the key-function. With the modification the calls of getItem reduced to zero. There is still a call of the key function through setItem in both versions (even though the edge is not displayed in the original graph because the connection was below 2%).
The time that the insert function took reduced to 1/3 of what it used to be ((3.47 % / 9.23 %) * 0.89).
The total speedup seems to be much smaller than I originally thought (only 10% in this test). I have to confess that I didn't do extensive tests on the whole program, I just ran it again, so maybe there was some background task in the first run or anything else making me believe that the change improved more then it actually did. Still, I would see this comment as addressed.
Does this all make sense or am I overlooking something? I think you can also just edit my pull request if you have a specific comment in mind that you would like to have placed there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I how do you do these graphs?
The time that the insert function took reduced to 1/3 of what it used to be ((3.47 % / 9.23 %) * 0.89).
OK, true, on the other hand, first case: key in utils takes cumulated 12.84%. Second case: 8.47%. This matches with the overall reduction of number of calls to key. Doesn't that mean that reducing the number of calls to key was just about moderately effective overall? I would say so, which means, the part of the comment that says key is expensive remains valid?
Still, I would see this comment as addressed.
I don't know whether it is addressed, in particular because we don't know the setup under which the original comment was made (I suspect with large population size and moderate to large dimension). Also, the comment suggests that (only) half of the time is spent in key which suggests that the modification should speed up the function by less than a factor of two? In such a case, where I don't quite know, I won't see it as addressed until I have better confirmation.
Don't get me wrong, I think the code modification is solid and justified, there nothing wrong with that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used cProfile with gprof2dot.
python -m cProfile -o output.prof benchmark.py
python -m gprof2dot -f pstats --root "benchmark:17:run_benchmark" -n 2 -e 2 output.prof > output.dot
Let me know if you have any questions.
Everything you write sounds correct to me. I just not sure what we could really write instead. I think that 3.5% for the insert function of overall runtime is probably just fine. Leaving the comment as it is might send someone else to search for something that they just can't find. I don't really know what I can do to find out if it is resolved on top of what I did so far as I can't reproduce original the numbers (of course I'm more a user of this library than a developer :D). But the question is a bit will someone else in the future be able to do that? If not than the comment might stay forever.
As said if you have anything for that comment in mind I'm happy to edit it in that way.
The key function is potentially something to work on, but I didn't see any easy fix that doesn't cause bigger structural changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just leave the original comment and add a comment on it below that the insert function was change to (try to) address this by removing redundant dictionary accesses with a date when this was done (like Jan 2026).
|
Thanks for the pull request! Can you check the above comments? |
|
Could you squash your commits into a single one? We don't need to see the fixes as separate commits! |
2171bc7 to
e95c88b
Compare
cma/evolution_strategy.py
Outdated
| if (iteration % 10) < 1: | ||
| self.truncate(300, iteration - 3) | ||
| elif value is not None: | ||
| iteration_tmp = value.get('iteration') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was in the belief that dictionary access is generally very quick in Python, am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. From a performance perspective the benefit should be neglectable as long as the hash function is cheap (I think hash values for strings are cached).
I usually see this as a safe way of being sure that the map can't change in the meantime and that we don't spend unnecessary computation time if the hash function is expensive.
But well it doesn't really fit into the scheme of what the commit is about so should I remove this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks, I think it's better to keep the original version mainly for readability and also for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I changed this back to the original version.
| iteration = value['iteration'] | ||
| except: | ||
| pass | ||
| iteration = value.get('iteration', iteration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is wrong with the original code here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing an exception is quiet expensive, and there is no need for it here. Even if this is not a problem right now it can unexpectedly become one if we run into this path more often. Also we don't check which exception was thrown, so it can be anything. If the user pressed ctrl+c or any other interrupt comes in at this line the program would silently ignore it, and in addition start to show undefined / unexpected behavior from this point on because the iteration variable was not updated, even though it should have been.
Finally I think the second version is much more expressive, as it basically says, take 'iteration' from value with the old iteration-variable as default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, the exception should be caught more tightly and not in this a lazy manner. Otherwise I am not sure about your arguments: "Easier to Ask Forgiveness than Permission (EAFP) is the recommended Python practice over checking conditions in advance (LBYL – Look Before You Leap)." Doesn't really matter here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, Look Before you leap would be something like
if 'iteration' in value:
iteration = value['iteration']
and yes, I agree, there are good reasons to avoid this type of code (for example because concurrency can create unexpected behavior). But the get-function is exactly there so that we don't have to surround every fetch of a dictionary with a try-except block. I also agree with preferring EAFP over LBYL, but this isn’t really a choice between those two idioms, and preferring EAFP over LBYL doesn't mean that we should use EAFP for its own sake.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main reason and use case for the get function is the ability to write val = d.get(k, default), not for writing val = d.get(k, val). The latter code has an execution pass which assigns val = val, which kinda just doesn't smell right. I guess the problem here is that there is no perfect coding solution for this, even though it's such a basic scenario.
4d33563 to
7e88661
Compare
cma/utilities/utils.py
Outdated
| else: | ||
| if key in self.data_with_same_key: | ||
| self.data_with_same_key[key] += [self.data[key]] | ||
| self.data_with_same_key[key].append(self.data[key]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the only difference that we do not create a new list? I suspect writing [a] is incredibly cheap? You would need to change a lot of code in this package if you insisted in this to be crucial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, changed this one back, too.
7e88661 to
8caaea3
Compare
|
Thanks for the PR and the discussion! |
|
Thank you, too! |

Inserting elements into _CMASolutionDict_functional caused read amplification because the same entry was fetched multiple times from the dictionary. Since the custom fetch function typically requires hashing the key, this could result in significant overhead.