Skip to content

Conversation

@ugognw
Copy link
Contributor

@ugognw ugognw commented Jun 19, 2025

Summary

Major changes:

Todos

  • validate data

  • scrutinize dataset format

  • finalize benchmarking statistics format

  • remove redundant tests

  • add more granular unit tests

  • optimize number of test cases/time

  • fix PhreeqcEOS benchmarking

  • add docs under Electrolyte Modeling Engines

  • remove logic related to _get_activity_coefficient_pitzer?

  • convert % concentration in datasets to mol/L in load_dataset

  • Update changelog

  • Google format doc strings added.

  • Code linted with ruff. (For guidance in fixing rule violates, see rule list)

  • Type annotations included. Check with mypy.

  • Tests added for new features/fixes.

  • I have run the tests locally and they passed.

Happy to hear any feedback on the current direction!

@codecov
Copy link

codecov bot commented Jun 19, 2025

Codecov Report

Attention: Patch coverage is 90.55118% with 12 lines in your changes missing coverage. Please review.

Project coverage is 80.01%. Comparing base (b361c6a) to head (9e21eb5).
Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/pyEQL/benchmark.py 90.55% 6 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #252      +/-   ##
==========================================
- Coverage   81.79%   80.01%   -1.79%     
==========================================
  Files           9       10       +1     
  Lines        1494     1621     +127     
  Branches      257      282      +25     
==========================================
+ Hits         1222     1297      +75     
- Misses        226      277      +51     
- Partials       46       47       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@rkingsbury
Copy link
Member

Wow, thank you for the effort on this @ugognw! It looks well thought out. I'm traveling this week and part of next week but will dedicate some time to reviewing in detail as soon as I can.

@ugognw ugognw force-pushed the benchmarking-suite branch from c1da2ed to 5735dea Compare June 22, 2025 19:46
@ugognw
Copy link
Contributor Author

ugognw commented Jun 23, 2025

Thanks! I'd like to think so! Sounds good. I look forward to hearing your feedback!

ugognw and others added 26 commits June 22, 2025 23:53
Signed-off-by: Ugochukwu Nwosu <ugn@sfu.ca>
Replace ReferenceData with _BenchmarkEntry. This named tuple now separates solution
and solute properties which enables the separate retrieval of each type of property.
An analogous _get_solution_property method has been implemented.

_get_reference is renamed to _get_dataset as the data structure returned by the method
defines a general solution/solute property data set (reference data + Solution object
which can be used for benchmarking).

benchmark_engine functionality moved to report_results for simplicity. If more fine-tuned
control over data set partitioning or result reporting is desired, this can be implemented
quite readily by undoing this change.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Solution creation is now handed when reference files are read.

Benchmarking is parametrized by engine.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Although EOS subclasses define get_activity_coefficicient to return the
activity coefficients of individual solutes, one can only measure the
mean activity coefficient experimentally. As such, to benchmark against
experimental data, we must use the definition of the mean activity
coefficient in terms of the activity coefficients of the individual ions.
Alternatively, this function could be implemented as a property
(Solution.mean_activity) or a method (Solution.get_mean_activity).

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Publicize functions/classes for processing input/output data models

Return values from main function (renamed to benchmark_engine for clarity) and pass
engine and source as arguments.

Store results instead of printing to console.

Parse Solution dictionary into Solution object when reading dataset from file.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Passing a string corresponding to an internal source name retrieves the Path to
the data file (distributed with package). All files are loaded in the same way.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
...to list[tuple[str, Quantity]]

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
The new RMSE is more robust in the case that property values are Python
primitives.

Property getters are now called to retrieve properties. This fixes a bug
where the getter was returned instead of the value.

Variables are renamed (from property) to avoid shadowing a Python built-in.

Enumerating solution data (in get_dataset) yields the right number of items
to unpack.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Improve docstrings

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Salt takes no extra kwargs. reduce takes no kwargs.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
ugognw added 23 commits June 22, 2025 23:53
Users should be able to select which properties as well as which solutions
(compositions, thermodynamic state) to benchmark against. This will also
enable speed-ups in testing.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Implementing BenchmarkEntry.solute_properties/solution_properties in this way
will enable speed-ups in testing when calculating error statistics between
calculated and reference datasets (list[BenchmarkEntry]) as data lookup from a
dictionary by key has O(1) time complexity (one will have the name of the property
when calculating error) while finding a specific item in a list has O(N) time
complexity.

Accordingly, load_dataset now expects solute/solution properties within data files
to be structured as dictionaries instead of prperty-value 2-tuples and accepts solutions,
solute_properties, and solution_properties as keyword-only arguments. That data for a
solution is included is determined by its components, molar concentration, temperature,
and pressure.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Encapsulating the logic for creating dictionary keys in a function and creating
a type alias for its type should minimize the amount of refactor required if the
format is changed and appease mypy. This also makes comparison of Solutions
for the purpose of benchmarking considerably more simple.

load_dataset now returns a dictionary with keys indicating the Solution for which
the data applies as generated by _create_solution_key.

Internal data sources are now opened more reliably using resources.files.

Generating the list of SolutionKeys corresponding to the `solutions` parameter in advance
simplifies the logic required to check for the Solution in `solutions`.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Using the same data structures for engine and reference data is desired because
it will separate the logic for retrieving properties and calculating stats and
simplify the retrieval of corresponding items in calculate_stats.

This implementation ensures that only one Solution object is created for each
Solution across all benchmark datasets and that each BenchmarkEntry object is
associated with a unique Solution. In this way, a single dataset can be generated
for the engine, which can then be benchmarked against each reference dataset.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Accepting two identical data structures generalizes calculate_stats for
all combinations of sources of data (e.g., reference-reference, engine-engine,
engine-reference). An additional benefit is that now, from the signature, it
is more clear to users how to organize the two sets of data to be compared.
Previously, engine data was implicitly generated using the Solution objects in
each BenchmarkEntry. As mentioned previously, this now removes the logic for
retrieving solute and solution properties.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
FormulaDict was initially used in order to automatically standardize how solutes
appear in solution compositions and solute property lists. However, in addition
to standardizing ion formulas, FormulaDict also sorts its internal copy of the
dictionary by its values. This led to an error in the previous implementation
because the values are dictionaries, and dictionaries don't implement the required
methods for comparison with '<' and '>'.

Instead, we manually call standardize_formula in the public functions where
malformatted formulas are likely. This is when loading datasets and when
creating a BenchmarkEntry from a Solution.

Bugs: initialize property data as dictionaries in create_entry; fix indices and
data type in _create_engine_dataset; initialize data pair lists in calculate_stats.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
The concentration for Solution objects that are created in _create_engine_dataset
is determined from the SolutionKey. Without units, the amounts are not interpreted
as concentrations and the Solution does not have the correct composition.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
The three load_dataset unit tests are just outlines and are expected to fail.
The explicit test cases for test_should_benchmark_all_engines are intended to
eventually overlap with those in test_conductivity and eventually replace them.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
The number of moles of solvent is dependent on the engine, but in the
capacity that SolutionKey will be used, two Solutions with the same concentrations
of their solutes with differing engines (and thus differing moles of solvent) should
be considered identical if they are at the same state (temperature and pressure).

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Updated module docstring variable for consistency.

Require both _create_engine_dataset arguments.

For now, only require that the metric argument to calculate_stats can handle
Quantity arguments, so that we can ensure that units are handled correctly.
Note that the property values in BenchmarkEntry.solute_data and
BenchmarkEntry.solution_data are expected to be Quantity objects.

Also, the additional conditional statements in calculate_stats serve to handle
the edge cases of when the return values of _get_solute_property and
_get_solution_property are None and when there does not exist data in `calculated`
corresponding to `reference`. Because this may be of intereste to the user, the
event is logged at the INFO level.

Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
Signed-off-by: Ugochukwu Nwosu <ugognw@gmail.com>
@ugognw ugognw force-pushed the benchmarking-suite branch from fb3df4c to 82e06df Compare June 23, 2025 06:53
@rkingsbury
Copy link
Member

Wow, thank you for the effort on this @ugognw! It looks well thought out. I'm traveling this week and part of next week but will dedicate some time to reviewing in detail as soon as I can.

Update: I will do my best to look at this next week. I have some proposals due at the end of this week that will keep me busy until then. Thanks for your patience!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants