Skip to content

Conversation

@angus-g
Copy link
Contributor

@angus-g angus-g commented Apr 1, 2025

In the tripole, we use the make_topo_gen routine. This was allocating two arrays with 20,000,000 elements to hold topography for the patch. The target grid is divided into blocks of 100x25 points, and for each point in these blocks, for every block, both of the large arrays were being completely zeroed. This meant that performance was almost completely dominated by memset.

Isolating to just 10 of these blocks (but including NetCDF input and output), I saw 98.2% of the CPU time spent in memset, taking over 2 minutes to process. With this patch, it takes about 5 seconds to process the same blocks, where most of that time is spent in KD tree generation and NetCDF input/output. An entire invocation of gen_topo from GEBCO to a 1/10 full-globe grid went down from around 2 hours to 6 minutes.

The other performance win would be to use a selection (e.g. from fortran stdlib) rather than sorting algorithm for the lines like

            call quicksort(t_s_all(im)%topo, frst, lst)
            topo_all_med_out(im, jm) = t_s_all(im)%topo((npts(im, jm)+1)/2)

I haven't introduced that because of the extra dependency, but it shaves the runtime down to 4:45 with the same conditions as the tests above (an extra minute or so). All told, with compiler optimisations, this PR, and using selection instead of sort, the 1/10 topography generation is down from 2 hours to 200 seconds.

In the tripole, we use the make_topo_gen routine. This was allocating
two arrays with 20,000,000 elements to hold topography for the
patch. The target grid is divided into blocks of 100x25 points, and
for each point in these blocks, for ever block, both of the large
arrays were being completely zeroed. This meant that performance was
almost completely dominated by memset.

Isolating to just 10 of these blocks (but including NetCDF input and
output), I saw 98.2% of the CPU time spent in memset, taking over 2
minutes to process. With this patch, it takes about 5 seconds to
process the blocks, where most of that time is spent in KD tree
generation and NetCDF input/output. An entire invocation of gen_topo
from GEBCO to a 1/10 full-globe grid went down from around 2 hours to
6 minutes.
@angus-g angus-g changed the title Drastically improve efficiency of tripole generation Drastically improve efficiency of topography generation in tripole region Apr 1, 2025
@aekiss
Copy link
Contributor

aekiss commented Apr 1, 2025

Awesome, thanks @angus-g! It looks like this will produce bitwise identical results to the original, right?

@angus-g
Copy link
Contributor Author

angus-g commented Apr 2, 2025

It should do, yes.

@micaeljtoliveira
Copy link
Contributor

@angus-g Great work! I suspected the code could be made faster, but never imagined it would be something this simple.

@micaeljtoliveira micaeljtoliveira self-requested a review April 2, 2025 22:56
@micaeljtoliveira
Copy link
Contributor

@angus-g Regarding the use of the Fortran stdlib, I think this would be fine. The more projects use stdlib the better ;) So feel free to create a PR with those changes.

@micaeljtoliveira micaeljtoliveira merged commit a29d07a into COSIMA:main Apr 2, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants