Fix behavior of PHREEQC equilibrate when composition contains pure elements #282

vineetbansal · 2025-10-06T17:33:05Z

Summary

Major changes:

Issue Equilibrate concentration & species wrong using PHREEQC in pyEQL when specified with pure elements concentration #229:
- Added 3 tests as per our discussion there (2 failing, 1 passing).
- Code changes that fix the behavior, hopefully correctly.
- Removed xfails to show that the introduced tests do pass with the change.

The atol I chose is totally arbitrary and I'm not sure if it's realistic. Happy to discuss more and introduce changes to this PR!

Some other unrelated minor changes seem to have been made by my pre-commit hook.

Checklist

Google format doc strings added.
Code linted with ruff. (For guidance in fixing rule violates, see rule list)
Type annotations included. Check with mypy.
Tests added for new features/fixes.
I have run the tests locally and they passed.

Tip: Install pre-commit hooks to auto-check types and linting before every commit:

pip install -U pre-commit
pre-commit install

codecov · 2025-10-07T17:06:37Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 84.89%. Comparing base (59523f4) to head (2dc42fc).
⚠️ Report is 11 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #282      +/-   ##
==========================================
+ Coverage   84.78%   84.89%   +0.10%     
==========================================
  Files           9        9              
  Lines        1466     1476      +10     
  Branches      255      257       +2     
==========================================
+ Hits         1243     1253      +10     
  Misses        193      193              
  Partials       30       30

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rkingsbury

Thank you @vineetbansal ! This is a solid start and I think as-is, it should functionally solve the problem. I've made some comments throughout b/c I think there might be a "cleaner" way to accomplish it. Let me know if you have any questions.

tests/test_solution.py

src/pyEQL/engines.py

src/pyEQL/solution.py

vineetbansal · 2025-10-09T20:23:23Z

Thanks - that makes sense, and seems like it will work. So it looks like you're suggesting:

a. Save the original amounts per element before equilibrate, say as orig_el_amts.
b. Iterate through key/values in solution.get_el_amt_dict.
- b.i) Get the element's amount using get_el_amt_dict. If its greater than the value in orig_el_amts, it means we "lost" some accounting for that element. In this case:
- b.ii) Query solution.get_components_by_element to get all components for that element and add them back in (getting their amounts from self._stored_comp which are keyed by species, and stored before equilibration).

However, I have a few observations/suggestions if we go this route:

The get_el_amt_dict method currently returns something like:

{'H(0.0)': 1.5823232582233561e-34, 'H(1.0)': 110.72283247280436, 'O(-2.0)': 55.361416234580375, 'O(0.0)': 2.016833383930518e-24}

To ask it the question - give the amount of H, instead of writing the parsing logic again, it would be better to have it return a 2-level dict instead:

{'H': {0.0: 1.5823232582233561e-34, 1.0: 110.72283247280436}, 'O': {-2.0: 55.361416234580375, 0.0: 2.016833383930518e-24}}

The get_el_amt_dict method's only consumer is _setup_ppsol, which immediately parses out the parenthesized valence anyway. So it avoids that caller logic as well.

The same reasoning applies to solution.get_components_by_element. Instead of returning:

{'H(0.0)': ['H2(aq)'], 'H(1.0)': ['H2O(aq)', 'H[+1]', 'OH[-1]'], 'O(-2.0)': ['H2O(aq)', 'OH[-1]'], 'O(0.0)': ['O2(aq)']}

it could return

{'H': {0.0: ['H2(aq)'], 1.0: ['H2O(aq)', 'H[+1]', 'OH[-1]']}, 'O': {-2.0: ['H2O(aq)', 'OH[-1]'], 0.0: ['O2(aq)']}}

All of this method's users also immediately try to parse out the parenthesis anyway, so it will save them some work.

Instead of iterating through all elements like what I think you're saying, isn't it better to just iterate through only the elements in the missing species, since we have them anyway:

missing_species = set(self._stored_comp.keys()) - {standardize_formula(s) for s in self.ppsol.species}

In this I'm assuming that the number of possible missing species after going through phreeqc are few, while the total number of unique elements in the original solution are high. We can always make sure that we don't handle the same element twice by maintaining a set of seen elements.

Putting all of this together, the code looks like (assume nested=True does what I suggest in 1/2 above). This is minus the comments since this post is getting long already):

  if self.ppsol is not None:
      self.ppsol.forget()
  self._setup_ppsol(solution)

  # store the original solvent mass
  orig_solvent_moles = solution.components[solution.solvent]
  orig_el_dict = solution.get_el_amt_dict(nested=True)

  solution.components = FormulaDict({})
  for s, mol in self.ppsol.species_moles.items():
      solution.components[s] = mol

  all_missing_species = set(self._stored_comp.keys()) - {standardize_formula(s) for s in self.ppsol.species}
  if len(all_missing_species) > 0:
      logger.warning(
          f"After equilibration, the amounts of species {all_missing_species} were not modified "
          "by PHREEQC. These species are likely absent from its database."
      )

      new_el_dict = solution.get_el_amt_dict(nested=True)

      handled_elements = set()
      for missing_species in all_missing_species:
          elements = solution.get_property(missing_species, "elements")
          for el in elements:
              if el in handled_elements:
                  continue
              orig_el_amount = sum([orig_el_dict[el][k] for k in orig_el_dict[el]])
              new_el_amount = sum([new_el_dict[el][k] for k in new_el_dict[el]]) if el in new_el_dict else 0
              if orig_el_amount - new_el_amount > 1e-16:
                  solution.components[missing_species] = self._stored_comp[missing_species]
                  handled_elements.add((el))

This version does pass the newly introduced tests as well.

rkingsbury · 2025-10-10T13:53:27Z

Thanks - that makes sense, and seems like it will work. So it looks like you're suggesting:

a. Save the original amounts per element before equilibrate, say as orig_el_amts.
b. Iterate through key/values in solution.get_el_amt_dict.

b.i) Get the element's amount using get_el_amt_dict. If its greater than the value in orig_el_amts, it means we "lost" some accounting for that element. In this case:

b.ii) Query solution.get_components_by_element to get all components for that element and add them back in (getting their amounts from self._stored_comp which are keyed by species, and stored before equilibration).

Yes, exactly. (I assume when you said "If its greater" you meant "if its less" - although we should probably just confirm they are equal via. np.isclose or something).

Query solution.get_components_by_element to get all components for that element and add them back in (getting their amounts from self._stored_comp which are keyed by species, and stored before equilibration).

Yes, or alternatively, I think we can skip the query step and just restore all the missing_species, getting their amounts from stored_comp.

Elements to iterate over

Instead of iterating through all elements like what I think you're saying, isn't it better to just iterate through only the elements in the missing species, since we have them anyway:

In this I'm assuming that the number of possible missing species after going through phreeqc are few, while the total number of unique elements in the original solution are high. We can always make sure that we don't handle the same element twice by maintaining a set of seen elements.

This is an excellent point! Although the advantage of checking every element is that it provides additional assurance that mass is conserved (really solving a different problem than the one at issue here - but just in case something strange happened in the wrapper).

I suppose would could add a first level check of the total solution mass (so save orig_mass = solution.mass) which would be faster to compute. If that matches before and after equilibrating, we can skip everything else. But if it does not match, then we examine missing species and their associated elements as you suggest.

`get_el_amt_dict`, etc.

Thanks for all the thoughtful observations about get_el_amt_dict and get_components_by_element. I also want to point out that get_total_amount is another option here. In this method, you can EITHER specify an element with an oxidation state (Na(1.0)) OR a pure element (Na). If you do the latter, you get the total concentration, already summed over every oxidation state.

But I think you have identified a nice opportunity for optimization here. I like your proposed solution of adding a nested kwarg to get_el_amt_dict, which we can default to False for the time being (to preserve existing behavior) but set True internally for efficiency.

If we do that, I think you can streamline some code within get_total_amount and _setup_ppsol as you said.

Can you please open a separate PR for the changes to these methods? Let's merge those first and then build on them here.

vineetbansal · 2025-10-13T14:17:38Z

Thanks @rkingsbury - I've opened PR #284 with the changes to the methods. I'll make the necessary tweaks to this one once that one is merged.

rkingsbury

Looking good @vineetbansal ! One small change to the log message please.

src/pyEQL/engines.py

vineetbansal · 2025-10-15T21:11:20Z

@rkingsbury - hopefully this is what you had in mind all along - in retrospect there's very little code added but it took me a while to get there!

rkingsbury

Thanks @vineetbansal ! On my final review I found one small (possible) efficiency improvement and would like one tweak to two of the unit tests. Please check those items and then I'm ready to merge!

(Actually, if you could also please add an item to the CHANGELOG I would appreciate it). I will typically edit the text a bit before a release, but having the placeholders there is helpful to make sure I remember all the contributions!

tests/test_solution.py

src/pyEQL/engines.py

vineetbansal · 2025-10-16T14:17:11Z

@rkingsbury - thanks - your suggestions make sense. I'll remember to edit the CHANGELOG in future PRs. Feel free to change my initial stab at the CHANGELOG entry in case I'm missing something.

All items resolved!

added tests before tackling issue 229

502fe07

vineetbansal marked this pull request as draft October 6, 2025 17:34

vineetbansal added 2 commits October 6, 2025 13:41

(possible) fix for issue 229

8c0b6c4

removed xfails introduced before code change

8202096

vineetbansal marked this pull request as ready for review October 6, 2025 17:44

rkingsbury previously requested changes Oct 7, 2025

View reviewed changes

tests/test_solution.py Show resolved Hide resolved

src/pyEQL/engines.py Show resolved Hide resolved

src/pyEQL/engines.py Outdated Show resolved Hide resolved

src/pyEQL/solution.py Outdated Show resolved Hide resolved

src/pyEQL/solution.py Show resolved Hide resolved

vineetbansal added 3 commits October 15, 2025 09:47

merged main

79d5e72

changes based on discussion on issue 229

c80618f

minor bug fix with added code

e65dd20

rkingsbury reviewed Oct 15, 2025

View reviewed changes

src/pyEQL/engines.py Outdated Show resolved Hide resolved

vineetbansal added 2 commits October 15, 2025 16:03

verbiage change in log

622b400

test fix after verbiage change in log

c13cefc

rkingsbury reviewed Oct 15, 2025

View reviewed changes

tests/test_solution.py Show resolved Hide resolved

src/pyEQL/engines.py Outdated Show resolved Hide resolved

an optimization; additional assert in new tests; changelog entry

2dc42fc

rkingsbury added the fix Bug Fixes label Oct 16, 2025

rkingsbury changed the title ~~Tackling issue 229~~ Fix behavior of PHREEQC equilibrate when composition contains pure elements Oct 16, 2025

rkingsbury merged commit 56b71d6 into KingsburyLab:main Oct 16, 2025
16 of 17 checks passed

vineetbansal mentioned this pull request Oct 21, 2025

Feature: Enable solid-liquid equilibrium via phreeqpython #186

Closed

rkingsbury mentioned this pull request Nov 5, 2025

Equilibrate concentration & species wrong using PHREEQC in pyEQL when specified with pure elements concentration #229

Closed

Fix behavior of PHREEQC equilibrate when composition contains pure elements #282

Fix behavior of PHREEQC equilibrate when composition contains pure elements #282

Uh oh!

Conversation

vineetbansal commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Uh oh!

codecov bot commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

rkingsbury left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vineetbansal commented Oct 9, 2025

Uh oh!

rkingsbury commented Oct 10, 2025

Elements to iterate over

get_el_amt_dict, etc.

Uh oh!

vineetbansal commented Oct 13, 2025

Uh oh!

rkingsbury left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

vineetbansal commented Oct 15, 2025

Uh oh!

rkingsbury left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vineetbansal commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vineetbansal commented Oct 6, 2025 •

edited

Loading

codecov bot commented Oct 7, 2025 •

edited

Loading

`get_el_amt_dict`, etc.

vineetbansal commented Oct 16, 2025 •

edited

Loading