-
Notifications
You must be signed in to change notification settings - Fork 19
TopoStats classes internally and for writing/reading HDF5 #1151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
1501c0d to
8e33faa
Compare
f8aa5f0 to
00c3bee
Compare
SylviaWhittle
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking very good!
I've checked this out locally and just followed the threads of where data goes, and it looks very nice.
One thing, is to maybe move things out of / don't add more to utils.py given that IIRC we are wanting to eventually eliminate it? Perhaps a grain_handling.py or something?
I'll have a further look next week with Laura but looking great so far 👍
|
Oops meant to merely comment, not approve, sorry! |
734e0e1 to
5047af4
Compare
b20878d to
89110ac
Compare
be7360a to
6285d74
Compare
6285d74 to
a61203b
Compare
c32a889 to
acfa3c3
Compare
|
Now have |
944ccb0 to
8508ba1
Compare
7dc9535 to
1ea0e3e
Compare
- [X] `LoadScans` - [ ] `Filters` - [ ] `Grains` - [ ] `GrainStats` - [ ] `DisorderedTracing` - [ ] `NodeStats` - [ ] `OrderedTracing` - [ ] `Splining`
Switches `Filters()` class over to using `TopoStats` class objects as input. Tests directly on `Filters()` are updated, but integration tests (i.e. of how this impacts on `run_modules.py` and `processing.py`) have _not_ been included in this commit as they also require updating the other classes (`Grains` / `DisorderedTracing` / `NodeStats` / `OrderedTracing` / `Splining`)
The `Grains` class now works with `TopoStats` classes, however...because `GrainCrops` was used in `TopoStats` and this work meant `TopoStats` was used by `Grains` we introduced a circular dependency which Python, reasonably, complains about. The solution has been to move the class definitions to their own modules `topostats.classes`, but that wasn't without some issues since there are static methods of the `Grains` class that were used _within_ `GrainCrop`. For now these have been moved to the `utils` module and I've started writing tests for them (as they didn't appear to have any). As a consequence this commit has a lot of things moving around which _will_ make it a pain to review, but hopefully this will be worth it. For now the whole test suite does _not_ pass all tests because the integration tests where the pipeline is run end-to-end fails. No attempt has been made to correct this yet because ultimately we would like to simply update the `TopoStats` objects and pass them around and that will only be addressed once each processing step/class has been refactored to work with these. Subsequent modules should be a little easier to refactor now that the circular dependencies have been broken.
Switches `GrainStats` to take the `TopoStats` object as an argument and extract the `ImageGrainCrops.GrainCropDirection.crops` (be that `above` or `below`) and calculates the statistics from the returned dictionary. Tests are updated and passed for this module alone, integration tests still fail and will be addressed after all modules are updated.
I messed up correcting a merge conflict when rebasing so am putting the required `log_topostats_version() back in and will add this commit to `.git-blame-ignore-revs`
- Fixes `processing.run_filters()` and tests to use the TopoStats class. - Adds revision to ignore commit that fixed a bodged rebase - Some tpyos in docstrings of class definitions - Tpyo in `TRACING_RESOURCES` for disordered tracing
- Implements a regression test for `processing.run_disordered_tracing()`. - Checks results are attributes of `GrainCrop` for `minicircle_small`.
Moves closer towards using `TopoStats` class throughout the `processing` module. - Passes `topostats_object: TopoStats` into the various `run_<stage>` functions. - Switches all logging to use the attributes of this class.
- Introduces [pytest-profiling](https://pypi.org/project/pytest-profiling/) as a test dependency so we can profile tests. Introduced because `nodestats` was taking a looooong time to run and its because of long calls to `networkx` that are required to get edges/angles. - Adds `catenane_topostats` and `minicircle_small_topostats` fixtures used in `test_run_nodestats()`. - Tests `run_nodestats`, another step in the right direction of modularising and adding entry points. Note that the `catenane` image has 41 nodes which is one of the reason tests take so long! - Corrects asssertions in `test_run_grains()` to be madea against `topostats_object` attributes rather than pulling out and assigning to `imagegraincrops`. - Rounds out the `Nodes` class with documentation and attributes. - Switches to assessing whether disordered tracing worked by comparing the shape of the dataframe to `(0, 0)` which is the shape of an empty dataframe. Previously this test was done against `if disordered_trace_grainstats is not None` but as the following shows a `pd.DataFrame()` can't be used for truthiness as is normally the case in Python as an empty dataframe is "something" so the test wasn't doing what was expected. ``` pd.DataFrame() is None False pd.DataFrame is not None True ``` It is worth noting that there are some Warnings raised, these were noticed when testing for equality of Nodestats and I've not got the time to investigate these fully, comments have been left in place so we can address in the future and I'll make an issue for these too.
- introduces `ordered_trace` as an attribute to `GrainCrop` class.
- corrects test of equality for skeleton attribute of `GrainCrop`.
- introduce `OrderedTrace` class with attributes for...
- `ordered_trace_data`
- `n_molecules`
- `tracing_stats`
- `grain_mol_stats` - a dictionary of `Molecule`
- `pixel_to_nm_sacling`
- `images`
- `error`
- custom `__eq__` method that checks dictionary of images for equality
- introduce `Molecule` class with attributes
- `circular`
- `topology`
- `ordered_coords`
- `heights`
- `distances`
- test for `processing.run_ordered_tracing()` along with two `.topostats` files in `tests/resources/tracing/ordered_tracing/{catenane,minicircle}_post_nodestats.`
- updates `save_topostats_file()` to work with `TopoStats` boject
- remove errant `print()` from `TopoStats` class
- Switches `save_topostats_file` to work with classes
Required because loading `.topostats` objects from HDF5 AFMReader returns dictionaries. This is ok and I think for now we should not change this as it makes AFMReader very general and of use to others, but internally when we are switching to `TopoStats` classes for all the processing each entry point that loads a `.topostats` file requires a `TopoStats` object so we _have_ to convert these on loading.
- `padding` should be `int()` but was being read as `np.float64()` - mistakenly always tried to set `crop["skeleton"]` even if its not present (in which case it should be `None`).
Add `_to_dict()` methods to each of the following classes... - `MatchedBranch` - `Molecule` - `Node` - `OrderedTrace` ...and ensures these are written to HDF5. Adds dummy objects to `tests/contest.py` and tests the methods work via `tests/test_classes.py`. Currently the types of many of these are _wrong_ because I don't know what they actually represent, that doesn't really matter for the testing though which uses dictionary comprehension and handles any type. Key is that the `GrainCrop.grain_crop_to_dict()` method now works with all of the additional attributes so we can write the full `TopoStats` object to HDF5 which is required for on-going test development of the remaining `OrderedTrace`, `Splining` and `Curvature` so we can write intermediary `.topostats` objects which we can load for tests (instead of running the whole processing pipeline from the start). This is however also **vital** to the additional entry-points (aka "swiss-army knife") work so we can write `.topostats` objects with all of the data upto a given point and load it in the future (previous commit e731084 added the necessary `dict_to_topostats()` function for converting the HDF5-based dictionaries to `TopoStats` objects).
fead0b4 to
907a502
Compare
Successes... - Don't attempt to order traces that do not have a disordered trace - `OrderedTrace` class with attributes and methods - `MatchedBranch` class Very messy at the moment, some thoughts... - noticing a number of places where vectorisation could be used instead of loops and some nesting that seems redundant. - Dictionaries aren't currently mapped to the classes and their structure, many attributes are themselves dictionaries. - 2025-10-09 - Currently need to get ordered_branches passing around correctly, they are meant to be attributes of `MatchedBranch`. - `tests/resources/tracing/ordered_tracing/catenane_post_nodestats.topostats` is currently 304.4MB which is too big, - need to do something about this. It has been renamed for now to `catenane_post_nodestats_20251013.topostats` because of a conflict when rebasing.
e9b940c to
54e4b94
Compare
Closes #1220 (and possibly others but I can't find them at the moment!) TopoStats modular design, which is being improved in current refactoring, means that it should be easy to extend the analysis pipelines by developing other packages such as [AFMSlicer](https://github.com/AFM-SPM/AFMSlicer) where work is under way. One of the things that will be important is to allow developers of such packages, and in turn users, to generate sample configuration files which they can change as they desire. Rather than have the same code duplicated across packages we can use the `io.write_config_with_comments()` function from TopoStats to load a `<pkg_name>/default_config.yaml` from a package and write that to disk which is what this Pull Request achieves. I've included an early version of `docs/advanced/extending.md` to document how to develop extension packages, it _will_ change dramatically as this takes shape as this is new territory for me, but felt it important to document what I'm doing now so that I can expand and improve on it as things change and lesson are learnt. **NB** This branch will deliberately target `ns-rse/1102-switching-to-TopoStats-class` as that will be the basis on which other packages are built.
Successes... - Don't attempt to order traces that do not have a disordered trace - `OrderedTrace` class with attributes and methods - `MatchedBranch` class Very messy at the moment, some thoughts... - noticing a number of places where vectorisation could be used instead of loops and some nesting that seems redundant. This won't be addressed in this PR but should be addressed in the future - Dictionaries aren't currently mapped to the classes and their structure, many attributes are themselves dictionaries. - 2025-10-09 - Currently need to get ordered_branches passing around correctly, they are meant to be attributes of `MatchedBranch`. - `tests/resources/tracing/ordered_tracing/catenane_post_nodestats.topostats` is currently 304.4MB which is too big, - need to do something about this. It has been renamed for now to `catenane_post_nodestats_20251013.topostats` because of a conflict when rebasing. Working on making it so we can pickle objects (have added `__getstate__` and `__setstate__` to all classes see next commit)
54e4b94 to
23ca66c
Compare
- adds `thresholds` and `threshold_method` properties to `GrainCrop` class - adds `config` and `full_mask_tensor` properties to `TopoStats` class - updates tests in light of these changes - correct minor tpyo in `default_config.yaml` The main things that this adds though is `__getstate__`/`__setstate__` methods for each of the classes. The reason for doing so is because classes that have `@property` objects associated with them can't be pickled and so they need explicit conversion to dictionaries. See... - [here](https://stackoverflow.com/a/1939384/1444043) - [Handling stateful objects](https://docs.python.org/3/library/pickle.html#pickle-state) Unfortunately this still fails... ``` from pathlib import Path import pickle as pkl from topostats.classes import TopoStats OUTDIR = Path.cwd() OUTFILE = OUTDIR / "empty.topostats" empty_topostats = TopoStats(img_path = None) with OUTFILE.open(mode="wb") as f: pkl.dump(empty_topostats, f) TypeError Traceback (most recent call last) Cell In[905], line 2 1 with OUTFILE.open(mode="wb") as f: ----> 2 pkl.dump(empty_topostats, f) TypeError: cannot pickle 'property' object empty_topostats.__getstate__() {'_image_grain_crops': <property at 0x7fb40c81e0c0>, '_filename': <property at 0x7fb40c81d170>, '_pixel_to_nm_scaling': <property at 0x7fb40c81f880>, '_img_path': PosixPath('/home/neil/work/git/hub/AFM-SPM/TopoStats/tmp'), '_image': <property at 0x7fb39ce731a0>, '_image_original': <property at 0x7fb39ce71e40>, '_full_mask_tensor': <property at 0x7fb39ce72980>, '_topostats_version': <property at 0x7fb39ce71d00>, '_config': <property at 0x7fb39ce72020>} ``` Everything is _still_ a `property`. This dummy example works fine though... ``` @DataClass class dummy(): var1: int | None = None var2: float | None = None var3: str | None = None var4: list[int] | None = None var5: dict[str, str] | None = None def __getstate__(self): # return {"_var1": self._var1, # "_var2": self._var2, # "_var3": self._var3, # "_var4": self._var4, # "_var5": self._var5,} state = self.__dict__.copy() return state def __setstate__(self, state): # self._var1 = state["_var1"] # self._var2 = state["_var2"] # self._var3 = state["_var3"] # self._var4 = state["_var4"] # self._var5 = state["_var5"] self.__dict__.update(state) @Property def var1(self) -> int: """ Getter for the ``var1`` attribute. Returns ------- int Returns the value of ``var1``. """ return self._var1 @var1.setter def var1(self, value: int) -> None: """ Setter for the ``var1`` attribute. Parameters ---------- value : int Value to set for ``var1``. """ self._var1 = value @Property def var2(self) -> float: """ Getter for the ``var2`` attribute. Returns ------- float Returns the value of ``var2``. """ return self._var2 @var2.setter def var2(self, value: float) -> None: """ Setter for the ``var2`` attribute. Parameters ---------- value : float Value to set for ``var2``. """ self._var2 = value @Property def var3(self) -> str: """ Getter for the ``var3`` attribute. Returns ------- str Returns the value of ``var3``. """ return self._var3 @var3.setter def var3(self, value: str) -> None: """ Setter for the ``var3`` attribute. Parameters ---------- value : str Value to set for ``var3``. """ self._var3 = value @Property def var4(self) -> list[int]: """ Getter for the ``var4`` attribute. Returns ------- list[int] Returns the value of ``var4``. """ return self._var4 @var4.setter def var4(self, value: list[int]) -> None: """ Setter for the ``var4`` attribute. Parameters ---------- value : list[int] Value to set for ``var4``. """ self._var4 = value @Property def var5(self) -> dict[str,str]: """ Getter for the ``var5`` attribute. Returns ------- dict[str,str] Returns the value of ``var5``. """ return self._var5 @var5.setter def var5(self, value: dict[str,str]) -> None: """ Setter for the ``var5`` attribute. Parameters ---------- value : dict[str,str] Value to set for ``var5``. """ self._var5 = value OUTFILE = OUTDIR / "empty.dummy" empty_dummy = dummy() with OUTFILE.open(mode="wb") as f: pkl.dump(empty_dummy, f) ``` ...no error and I don't understand where I/we have gone wrong?!?!?!?! I'm somewhat inclined to move away from `@dataclass` and using `@property` to provide the `setter` / `getter` design pattern and instead use plain classes with attributes.
feature: write YAML configuration files from other packages
Moves to [Pydantic dataclasses](https://docs.pydantic.dev/latest/concepts/dataclasses/) for stricter data validation. This means we can pickle `TopoStats` objects which is useful because in the test suite we don't want to run the whole pipeline when we want to test e.g. `Nodestats`. As a consequence we now have pickles which are loaded as `pytest.fixtures` (from `tests/conftest.py`) rather than lines of code within tests themselves that save and modify `.npy`/`.pkl` files. There are therefore three sets of pickles... - `minicircle_small` - `catenanes` - `rep_int` ...at different stages... - `_post_grainstats` - `_post_disordered_tracing` - `_post_nodestats` ...and we will develop additional fixtures for... - `_post_ordered_tracing` - `_post_curvature` - `_post_splining` (optional, not required at the moment as no subsequent processing is done after this) A slight disconnect might arise from how these pickles were created, at the moment it is code in a `.py` file on @ns-rse computer. @ns-rse will look at adding this as an additional script in the repository, but as more work is required its not included at the moment. This now allows me to finish of re-factoring and writing the integration test for `ordered-tracing`.
feature(classes): Pydantic classes and pickling
46fba2d to
e057771
Compare
I started rebasing but didn't fancy the hell of going through repeated merg conflicts so opted to merge instead. To do so I first create a branch from `ns-rse/1102-switching-to-Topostats-class` called `ns-rse/1102-test-merging-main`, switched to it and...tested merging `main`. I resolved all the conflicts only once. I then renamed branches locally (but not on `origin`)... - `ns-rse/1102-switching-to-TopoStats-class` > `ns-rse/1102-switching-to-TopoStats-class-2025-10-29` - `ns-rse/1102-test-merging-main` > `ns-rse/1102-switching-to-TopoStats-class` This meant I could then push the local `ns-rse/1102-switching-to-TopoStats-class` which had `main` merged in to `origin`. I actually think this is what GitHub does in the background when you resolve merge conflicts but am not 100% sure. I would do this again and see now why some people advocate for `git merge` over `git rebase`. The only thing I'd do different is to instead simply make a backup branch of the one I want to merge `main` into so that I don't have to bother with renaming (but still have a backup everything went tits-up!).
e057771 to
11deabd
Compare
|
I'm revisiting Nodestats and have some questions about the various nested dictionaries I was wondering if anyone might Matched and Unmatched Branches
My queston here stems from the fact that the top-level key |
|
Answers after chatting with @SylivaWhittle
Also discussed... Having classes with attributes isn't too dissimilar to dictionaries with key/value pairs, but with the added benefit |
|
I have now added For example we currently have... class Filters:
def __init__(
self,
topostats_object: TopoStats,
row_alignment_quantile: float = 0.5,
threshold_method: str = "otsu",
otsu_threshold_multiplier: float = 1.7,
threshold_std_dev: dict | None = None,
threshold_absolute: dict | None = None,
gaussian_size: float = None,
gaussian_mode: str = "nearest",
remove_scars: dict = None,
):
self.topostats_object = topostats_object
self.image = topostats_object.image_original
self.filename = topostats_object.filename
self.pixel_to_nm_scaling = topostats_object.pixel_to_nm_scaling
self.gaussian_size = gaussian_size
self.gaussian_mode = gaussian_mode
self.row_alignment_quantile = row_alignment_quantile
self.threshold_method = threshold_method
self.otsu_threshold_multiplier = otsu_threshold_multiplier
# Convert to lists since the thresholding function expects lists of thresholds but
# we don't want to use more than one value for the filters step.
if threshold_std_dev is None:
threshold_std_dev = {"above": 1.0, "below": 1.0}
else:
self.threshold_std_dev = {
"above": [threshold_std_dev["above"]],
"below": [threshold_std_dev["below"]],
}
if threshold_absolute is None:
threshold_absolute = {"above": 1.0, "below": 10.0}
else:
self.threshold_absolute = {
"above": [threshold_absolute["above"]],
"below": [threshold_absolute["below"]],
}
self.remove_scars_config = remove_scars
self.images = {
"pixels": self.image,
"initial_median_flatten": None,
"initial_tilt_removal": None,
"initial_quadratic_removal": None,
"initial_scar_removal": None,
"initial_zero_average_background": None,
"masked_median_flatten": None,
"masked_tilt_removal": None,
"masked_quadratic_removal": None,
"secondary_scar_removal": None,
"scar_mask": None,
"mask": None,
"final_zero_average_background": None,
"gaussian_filtered": None,
}
self.thresholds = None
self.medians = {"rows": None, "cols": None}
self.results = {
"diff": None,
"median_row_height": None,
"x_gradient": None,
"y_gradient": None,
"threshold": None,
}This would become... EDIT 2025-11-10 : On reflection I think it could be foolish to remove the class options up-front. Instead we should populate them from the class Filters:
def __init__(
self,
topostats_object: TopoStats,
row_alignment_quantile: float = 0.5,
threshold_method: str = "otsu",
otsu_threshold_multiplier: float = 1.7,
threshold_std_dev: dict | None = None,
threshold_absolute: dict | None = None,
gaussian_size: float = None,
gaussian_mode: str = "nearest",
remove_scars: dict = None,
):
self.topostats_object = topostats_object
filter_config = self.topostats_object["config"]["filter"]
self.image = topostats_object.image_original
self.filename = topostats_object.filename
self.pixel_to_nm_scaling = topostats_object.pixel_to_nm_scaling
self.gaussian_size = filter_config["gaussian_size"] if gaussian_size is None else gaussian_size
self.gaussian_mode = filter["gaussian_mode"] if gaussian_mode is None else gaussian_mode
self.row_alignment_quantile = filter["row_alignment_quantile"] if row_alignment_quantile is None else row_alignment_quantile
self.threshold_method = filter["threshold_method"] if threshold_method is None else threshold_method
self.otsu_threshold_multiplier = filter["otsu_threshold_multiplier"] if otsu_threshold_multiplier is None else otsu_threshold_multiplier
# Convert to lists since the thresholding function expects lists of thresholds but
# we don't want to use more than one value for the filters step.
if threshold_std_dev is None:
self.threshold_std_dev = {
"above": [filter_config['threshold_std_dev["above"]']],
"below": [filter_config['threshold_std_dev["below"]']],
}
else:
self.threshold_std_dev = threshold_std_dev
if threshold_absolute is None:
self.threshold_absolute = {
"above": [filter_config['threshold_absolute["above"]']],
"below": [filter_config['threshold_absolute["below"]']],
}
else:
self.threshold_absolute = threshold_absolute
self.remove_scars_config = filter_config["remove_scars"] if remove_scars is None else remove_scars
self.images = {
"pixels": self.image,
"initial_median_flatten": None,
"initial_tilt_removal": None,
"initial_quadratic_removal": None,
"initial_scar_removal": None,
"initial_zero_average_background": None,
"masked_median_flatten": None,
"masked_tilt_removal": None,
"masked_quadratic_removal": None,
"secondary_scar_removal": None,
"scar_mask": None,
"mask": None,
"final_zero_average_background": None,
"gaussian_filtered": None,
}
self.thresholds = None
self.medians = {"rows": None, "cols": None}
self.results = {
"diff": None,
"median_row_height": None,
"x_gradient": None,
"y_gradient": None,
"threshold": None,
} |
|
Copying this from Slack chat so its recorded... 2025-12-05I've been making some decent progress on my refactoring this week and have been working on making sure all the plots are correctly generated at each stage. However, I've got some questions about the plots for disordered tracing and would appreciate some feedback/thoughts. Nodestats On the
Astute readers will notice that these are essentially identical images and do not have nodes, convolved skeletons nor node centres plotted on them, suggesting we have problems (on the Questions
I would expect the individual plots of grains to be of greater interest as the spatial relationship of grains within an image is not something I'm yet to hear people being interested in (perhaps a minor exception is excluding grains that touch the edge of images but that is because they can't be properly analysed). Ordered traces also suffer from the same afflicition, the whole image plot is missing skeletons of the back bone. Proposed Solution Rather than spend time trying to fix this I am inclined to remove the generation of these plots at this stage in the refactoring and saying adieu to these plots
Would this cause problems for anyone? 2025-12-10Further I've discovered the "whole image" plots made on the These are the three plots produced with... plotting:
image_set:
- ordered_tracing
..and if you zoom in on the grains it is just possible to make out a skeleton. Thus for the time being I'll make sure these are generated but still wonder what the utility of them actually is given they are barely legible and suspect the cropped plots are the one that are more useful. It is also worth noting that the |







Closes #1102
Closes #1143
This PR (draft for now) is the logical extension of the
GrainCropsGrainCropDirectionsandImageGrainCropsintroduced by @SylviaWhittle in #1022 and switches to using theTopoStatsclass @ns-rse introduced in #1145 for handling images and the derived datasets (arrays) such that the unit of interest is individual grains.It is at the moment far from complete as the checklist shows below but because of the large amount of changes and reorganisation I was keen to share it in stages. The tests for each commit pass (thanks
pytest-testmon😀 ) but until all steps are complete the integration tests (tests_processing.pyandtests_run_modules.pywon't pass, I'm working on them as I go through each class).Its perhaps worth reading the commit messages for the individual commits for a little more information on the re-organisation that has been done so far.
LoadScansFiltersGrainsGrainStatsDisorderedTracingNodeStatsOrderedTracingSpliningCurvatureOf note...
Shared Methods
Some methods from
Grainswere used@staticmethodfromGrainCropsand have been moved out toutils. I've setup some skeleton tests for these but they fail (get 2x5 arrays back when I would have expected 5x5 arrays from flattening 5x5x3 arrays). Couldn't see any existing tests for these.Documentation
I intend to document the class structures and in turn the HDF5 format these are written to.
Syrupy
I've closed #1143 as the key test which used
.pkl's and required manually updating has been addressed. I think we could still switch all tests to usesyrupy(see #1152).AFMReader
These changes will also require modifications to AFMReader and perhaps moving
LoadScansover but I'm wary of introducing a circular dependency and have already discussed with @SylviaWhittle this. We felt that perhapsAFMReadershould only load files and return dictionaries. Re-constructing these toTopoStats/ImageGrainCrops/GrainCropsDirection/GrainCropshould be the domain of TopoStats. I don't think this should be a problem for the Napari plug-in as it could import and use whatever it needs from either.Before submitting a Pull Request please check the following.