Skip to content

Conversation

@vineetbansal
Copy link
Collaborator

@vineetbansal vineetbansal commented Oct 6, 2025

Summary

Major changes:

The atol I chose is totally arbitrary and I'm not sure if it's realistic. Happy to discuss more and introduce changes to this PR!

Some other unrelated minor changes seem to have been made by my pre-commit hook.

Checklist

  • Google format doc strings added.
  • Code linted with ruff. (For guidance in fixing rule violates, see rule list)
  • Type annotations included. Check with mypy.
  • Tests added for new features/fixes.
  • I have run the tests locally and they passed.

Tip: Install pre-commit hooks to auto-check types and linting before every commit:

pip install -U pre-commit
pre-commit install

@vineetbansal vineetbansal marked this pull request as draft October 6, 2025 17:34
@vineetbansal vineetbansal marked this pull request as ready for review October 6, 2025 17:44
@codecov
Copy link

codecov bot commented Oct 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.89%. Comparing base (59523f4) to head (2dc42fc).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #282      +/-   ##
==========================================
+ Coverage   84.78%   84.89%   +0.10%     
==========================================
  Files           9        9              
  Lines        1466     1476      +10     
  Branches      255      257       +2     
==========================================
+ Hits         1243     1253      +10     
  Misses        193      193              
  Partials       30       30              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@rkingsbury rkingsbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @vineetbansal ! This is a solid start and I think as-is, it should functionally solve the problem. I've made some comments throughout b/c I think there might be a "cleaner" way to accomplish it. Let me know if you have any questions.

@vineetbansal
Copy link
Collaborator Author

Thanks - that makes sense, and seems like it will work. So it looks like you're suggesting:

a. Save the original amounts per element before equilibrate, say as orig_el_amts.
b. Iterate through key/values in solution.get_el_amt_dict.
- b.i) Get the element's amount using get_el_amt_dict. If its greater than the value in orig_el_amts, it means we "lost" some accounting for that element. In this case:
- b.ii) Query solution.get_components_by_element to get all components for that element and add them back in (getting their amounts from self._stored_comp which are keyed by species, and stored before equilibration).

However, I have a few observations/suggestions if we go this route:

  1. The get_el_amt_dict method currently returns something like:
{'H(0.0)': 1.5823232582233561e-34, 'H(1.0)': 110.72283247280436, 'O(-2.0)': 55.361416234580375, 'O(0.0)': 2.016833383930518e-24}

To ask it the question - give the amount of H, instead of writing the parsing logic again, it would be better to have it return a 2-level dict instead:

{'H': {0.0: 1.5823232582233561e-34, 1.0: 110.72283247280436}, 'O': {-2.0: 55.361416234580375, 0.0: 2.016833383930518e-24}}

The get_el_amt_dict method's only consumer is _setup_ppsol, which immediately parses out the parenthesized valence anyway. So it avoids that caller logic as well.

  1. The same reasoning applies to solution.get_components_by_element. Instead of returning:
{'H(0.0)': ['H2(aq)'], 'H(1.0)': ['H2O(aq)', 'H[+1]', 'OH[-1]'], 'O(-2.0)': ['H2O(aq)', 'OH[-1]'], 'O(0.0)': ['O2(aq)']}

it could return

{'H': {0.0: ['H2(aq)'], 1.0: ['H2O(aq)', 'H[+1]', 'OH[-1]']}, 'O': {-2.0: ['H2O(aq)', 'OH[-1]'], 0.0: ['O2(aq)']}}

All of this method's users also immediately try to parse out the parenthesis anyway, so it will save them some work.

  1. Instead of iterating through all elements like what I think you're saying, isn't it better to just iterate through only the elements in the missing species, since we have them anyway:
missing_species = set(self._stored_comp.keys()) - {standardize_formula(s) for s in self.ppsol.species}

In this I'm assuming that the number of possible missing species after going through phreeqc are few, while the total number of unique elements in the original solution are high. We can always make sure that we don't handle the same element twice by maintaining a set of seen elements.

Putting all of this together, the code looks like (assume nested=True does what I suggest in 1/2 above). This is minus the comments since this post is getting long already):

  if self.ppsol is not None:
      self.ppsol.forget()
  self._setup_ppsol(solution)

  # store the original solvent mass
  orig_solvent_moles = solution.components[solution.solvent]
  orig_el_dict = solution.get_el_amt_dict(nested=True)

  solution.components = FormulaDict({})
  for s, mol in self.ppsol.species_moles.items():
      solution.components[s] = mol

  all_missing_species = set(self._stored_comp.keys()) - {standardize_formula(s) for s in self.ppsol.species}
  if len(all_missing_species) > 0:
      logger.warning(
          f"After equilibration, the amounts of species {all_missing_species} were not modified "
          "by PHREEQC. These species are likely absent from its database."
      )

      new_el_dict = solution.get_el_amt_dict(nested=True)

      handled_elements = set()
      for missing_species in all_missing_species:
          elements = solution.get_property(missing_species, "elements")
          for el in elements:
              if el in handled_elements:
                  continue
              orig_el_amount = sum([orig_el_dict[el][k] for k in orig_el_dict[el]])
              new_el_amount = sum([new_el_dict[el][k] for k in new_el_dict[el]]) if el in new_el_dict else 0
              if orig_el_amount - new_el_amount > 1e-16:
                  solution.components[missing_species] = self._stored_comp[missing_species]
                  handled_elements.add((el))

This version does pass the newly introduced tests as well.

@rkingsbury
Copy link
Member

Thanks - that makes sense, and seems like it will work. So it looks like you're suggesting:

a. Save the original amounts per element before equilibrate, say as orig_el_amts.
b. Iterate through key/values in solution.get_el_amt_dict.

  • b.i) Get the element's amount using get_el_amt_dict. If its greater than the value in orig_el_amts, it means we "lost" some accounting for that element. In this case:
  • b.ii) Query solution.get_components_by_element to get all components for that element and add them back in (getting their amounts from self._stored_comp which are keyed by species, and stored before equilibration).

Yes, exactly. (I assume when you said "If its greater" you meant "if its less" - although we should probably just confirm they are equal via. np.isclose or something).

Query solution.get_components_by_element to get all components for that element and add them back in (getting their amounts from self._stored_comp which are keyed by species, and stored before equilibration).

Yes, or alternatively, I think we can skip the query step and just restore all the missing_species, getting their amounts from stored_comp.

Elements to iterate over

Instead of iterating through all elements like what I think you're saying, isn't it better to just iterate through only the elements in the missing species, since we have them anyway:

In this I'm assuming that the number of possible missing species after going through phreeqc are few, while the total number of unique elements in the original solution are high. We can always make sure that we don't handle the same element twice by maintaining a set of seen elements.

This is an excellent point! Although the advantage of checking every element is that it provides additional assurance that mass is conserved (really solving a different problem than the one at issue here - but just in case something strange happened in the wrapper).

I suppose would could add a first level check of the total solution mass (so save orig_mass = solution.mass) which would be faster to compute. If that matches before and after equilibrating, we can skip everything else. But if it does not match, then we examine missing species and their associated elements as you suggest.

get_el_amt_dict, etc.

Thanks for all the thoughtful observations about get_el_amt_dict and get_components_by_element. I also want to point out that get_total_amount is another option here. In this method, you can EITHER specify an element with an oxidation state (Na(1.0)) OR a pure element (Na). If you do the latter, you get the total concentration, already summed over every oxidation state.

But I think you have identified a nice opportunity for optimization here. I like your proposed solution of adding a nested kwarg to get_el_amt_dict, which we can default to False for the time being (to preserve existing behavior) but set True internally for efficiency.

If we do that, I think you can streamline some code within get_total_amount and _setup_ppsol as you said.

Can you please open a separate PR for the changes to these methods? Let's merge those first and then build on them here.

@vineetbansal
Copy link
Collaborator Author

Thanks @rkingsbury - I've opened PR #284 with the changes to the methods. I'll make the necessary tweaks to this one once that one is merged.

Copy link
Member

@rkingsbury rkingsbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good @vineetbansal ! One small change to the log message please.

@vineetbansal
Copy link
Collaborator Author

@rkingsbury - hopefully this is what you had in mind all along - in retrospect there's very little code added but it took me a while to get there!

Copy link
Member

@rkingsbury rkingsbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @vineetbansal ! On my final review I found one small (possible) efficiency improvement and would like one tweak to two of the unit tests. Please check those items and then I'm ready to merge!

(Actually, if you could also please add an item to the CHANGELOG I would appreciate it). I will typically edit the text a bit before a release, but having the placeholders there is helpful to make sure I remember all the contributions!

@vineetbansal
Copy link
Collaborator Author

vineetbansal commented Oct 16, 2025

@rkingsbury - thanks - your suggestions make sense. I'll remember to edit the CHANGELOG in future PRs. Feel free to change my initial stab at the CHANGELOG entry in case I'm missing something.

@rkingsbury rkingsbury dismissed their stale review October 16, 2025 16:20

All items resolved!

@rkingsbury rkingsbury added the fix Bug Fixes label Oct 16, 2025
@rkingsbury rkingsbury changed the title Tackling issue 229 Fix behavior of PHREEQC equilibrate when composition contains pure elements Oct 16, 2025
@rkingsbury rkingsbury merged commit 56b71d6 into KingsburyLab:main Oct 16, 2025
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fix Bug Fixes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants