Skip to content

Conversation

@PaulStahr
Copy link
Contributor

Inserting elements into _CMASolutionDict_functional caused read amplification because the same entry was fetched multiple times from the dictionary. Since the custom fetch function typically requires hashing the key, this could result in significant overhead.

if iteration is not None:
self[key]['iteration'] = iteration
entry['geno'] = geno
entry['iteration'] = iteration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look correct to me, as it doesn't check for None first?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shortly before we already perform this check and define it as last_iteration + 0.5 if necessary, so to my understanding it is impossible that it will be None. Maybe the right way to go would be to place an assert + comment to point this out?

But removing the if statement indeed was just a cleanup because I thought it makes things more clearly if we know that we will always have the iteration variable and doesn't have any significant performance benefits.

     if iteration is not None: 
          if iteration > self.last_iteration: 
              self.last_solution_index = 0 
          self.last_iteration = iteration 
      else: 
          iteration = self.last_iteration + 0.5  # a - hack to get a somewhat reasonable value


# TODO: insert takes 30% of the overall CPU time, mostly in def key()
# with about 15% of the overall CPU time
def insert(self, key, geno=None, iteration=None, fitness=None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment was removed, but why? How do we know insert is not expensive anymore?

Copy link
Contributor Author

@PaulStahr PaulStahr Jan 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So before the change the key-function was called every time we fetched the element from the map, which happened three times instead of only once. So the expensive part should go down to 1/3 of what it was before. So much the theory.

I think for my use case the two changes that I made reduced the running time of my whole program by around 30 %. Before the change the insert function took a big part of the time of the overall runtime which is not the case anymore, and it doesn't seem to me that improving it further is worth the effort.

I tried to make a specific test, but the numbers in the end seemed to vary with the type of key etc, and I'm not sure how to reproduce the exact numbers of the comment.

Sorry that I can't give a better answer. If there is any test that I should do I'm happy to run it and write the numbers here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So before the change the key-function was called every time we fetched the element from the map, which happened three times instead of only once. So the expensive part should go down to 1/3 of what it was before. So much the theory.

You are assuming that fetching is the expensive part, which, I guess, you only did in the first place because you saw the comment you deleted?

Maybe make a comment on the todo instead of removing it, as you seem not to know for sure whether your changes actually addressed it.

Copy link
Contributor Author

@PaulStahr PaulStahr Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not really. I plotted a callgraph which guided me to the key function and from there back to the insert function. I only realized afterwards, that the comment was about this problem. I don't have the original files anymore, but I just ran a small benchmark with both versions again.

So here the original version:
original

And here the modified version:
modified

Three points to consider about the plot:
I don't now how much overhead the profiling itself added.
I pruned everything that is smaller then 2% of overall runtime, so not all Vertices and Edges are shown, but the numbers itself are unaffected.
The original version had a total runtime of 1.477 s the modified version 1.313 s. For correctness we have to multiply all numbers in the graph of the modified run with a factor of 0.89, because all numbers are relative to the total runtime.

In the first version getitem was called three times for each insert, each time resulting in a call to the key-function. With the modification the calls of getItem reduced to zero. There is still a call of the key function through setItem in both versions (even though the edge is not displayed in the original graph because the connection was below 2%).

The time that the insert function took reduced to 1/3 of what it used to be ((3.47 % / 9.23 %) * 0.89).

The total speedup seems to be much smaller than I originally thought (only 10% in this test). I have to confess that I didn't do extensive tests on the whole program, I just ran it again, so maybe there was some background task in the first run or anything else making me believe that the change improved more then it actually did. Still, I would see this comment as addressed.

Does this all make sense or am I overlooking something? I think you can also just edit my pull request if you have a specific comment in mind that you would like to have placed there.

Copy link
Contributor

@nikohansen nikohansen Jan 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, I how do you do these graphs?

The time that the insert function took reduced to 1/3 of what it used to be ((3.47 % / 9.23 %) * 0.89).

OK, true, on the other hand, first case: key in utils takes cumulated 12.84%. Second case: 8.47%. This matches with the overall reduction of number of calls to key. Doesn't that mean that reducing the number of calls to key was just about moderately effective overall? I would say so, which means, the part of the comment that says key is expensive remains valid?

Still, I would see this comment as addressed.

I don't know whether it is addressed, in particular because we don't know the setup under which the original comment was made (I suspect with large population size and moderate to large dimension). Also, the comment suggests that (only) half of the time is spent in key which suggests that the modification should speed up the function by less than a factor of two? In such a case, where I don't quite know, I won't see it as addressed until I have better confirmation.

Don't get me wrong, I think the code modification is solid and justified, there nothing wrong with that.

Copy link
Contributor Author

@PaulStahr PaulStahr Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used cProfile with gprof2dot.

python -m cProfile -o output.prof benchmark.py
python -m gprof2dot -f pstats --root "benchmark:17:run_benchmark" -n 2 -e 2 output.prof > output.dot

Let me know if you have any questions.

Everything you write sounds correct to me. I just not sure what we could really write instead. I think that 3.5% for the insert function of overall runtime is probably just fine. Leaving the comment as it is might send someone else to search for something that they just can't find. I don't really know what I can do to find out if it is resolved on top of what I did so far as I can't reproduce original the numbers (of course I'm more a user of this library than a developer :D). But the question is a bit will someone else in the future be able to do that? If not than the comment might stay forever.

As said if you have anything for that comment in mind I'm happy to edit it in that way.

The key function is potentially something to work on, but I didn't see any easy fix that doesn't cause bigger structural changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leave the original comment and add a comment on it below that the insert function was change to (try to) address this by removing redundant dictionary accesses with a date when this was done (like Jan 2026).

@nikohansen
Copy link
Contributor

Thanks for the pull request! Can you check the above comments?

@nikohansen
Copy link
Contributor

nikohansen commented Jan 20, 2026

Could you squash your commits into a single one? We don't need to see the fixes as separate commits!

if (iteration % 10) < 1:
self.truncate(300, iteration - 3)
elif value is not None:
iteration_tmp = value.get('iteration')
Copy link
Contributor

@nikohansen nikohansen Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was in the belief that dictionary access is generally very quick in Python, am I wrong?

Copy link
Contributor Author

@PaulStahr PaulStahr Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. From a performance perspective the benefit should be neglectable as long as the hash function is cheap (I think hash values for strings are cached).
I usually see this as a safe way of being sure that the map can't change in the meantime and that we don't spend unnecessary computation time if the hash function is expensive.
But well it doesn't really fit into the scheme of what the commit is about so should I remove this change?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, I think it's better to keep the original version mainly for readability and also for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I changed this back to the original version.

iteration = value['iteration']
except:
pass
iteration = value.get('iteration', iteration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with the original code here?

Copy link
Contributor Author

@PaulStahr PaulStahr Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing an exception is quiet expensive, and there is no need for it here. Even if this is not a problem right now it can unexpectedly become one if we run into this path more often. Also we don't check which exception was thrown, so it can be anything. If the user pressed ctrl+c or any other interrupt comes in at this line the program would silently ignore it, and in addition start to show undefined / unexpected behavior from this point on because the iteration variable was not updated, even though it should have been.

Finally I think the second version is much more expressive, as it basically says, take 'iteration' from value with the old iteration-variable as default.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, the exception should be caught more tightly and not in this a lazy manner. Otherwise I am not sure about your arguments: "Easier to Ask Forgiveness than Permission (EAFP) is the recommended Python practice over checking conditions in advance (LBYL – Look Before You Leap)." Doesn't really matter here.

Copy link
Contributor Author

@PaulStahr PaulStahr Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, Look Before you leap would be something like

if 'iteration' in value:
    iteration = value['iteration']

and yes, I agree, there are good reasons to avoid this type of code (for example because concurrency can create unexpected behavior). But the get-function is exactly there so that we don't have to surround every fetch of a dictionary with a try-except block. I also agree with preferring EAFP over LBYL, but this isn’t really a choice between those two idioms, and preferring EAFP over LBYL doesn't mean that we should use EAFP for its own sake.

Copy link
Contributor

@nikohansen nikohansen Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main reason and use case for the get function is the ability to write val = d.get(k, default), not for writing val = d.get(k, val). The latter code has an execution pass which assigns val = val, which kinda just doesn't smell right. I guess the problem here is that there is no perfect coding solution for this, even though it's such a basic scenario.

@PaulStahr PaulStahr force-pushed the development branch 3 times, most recently from 4d33563 to 7e88661 Compare January 21, 2026 00:47
else:
if key in self.data_with_same_key:
self.data_with_same_key[key] += [self.data[key]]
self.data_with_same_key[key].append(self.data[key])
Copy link
Contributor

@nikohansen nikohansen Jan 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the only difference that we do not create a new list? I suspect writing [a] is incredibly cheap? You would need to change a lot of code in this package if you insisted in this to be crucial.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, changed this one back, too.

@nikohansen nikohansen merged commit 2eb5afb into CMA-ES:development Jan 21, 2026
7 checks passed
@nikohansen
Copy link
Contributor

Thanks for the PR and the discussion!

@PaulStahr
Copy link
Contributor Author

Thank you, too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants