Drastically improve efficiency of topography generation in tripole region #49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the tripole, we use the make_topo_gen routine. This was allocating two arrays with 20,000,000 elements to hold topography for the patch. The target grid is divided into blocks of 100x25 points, and for each point in these blocks, for every block, both of the large arrays were being completely zeroed. This meant that performance was almost completely dominated by memset.
Isolating to just 10 of these blocks (but including NetCDF input and output), I saw 98.2% of the CPU time spent in memset, taking over 2 minutes to process. With this patch, it takes about 5 seconds to process the same blocks, where most of that time is spent in KD tree generation and NetCDF input/output. An entire invocation of gen_topo from GEBCO to a 1/10 full-globe grid went down from around 2 hours to 6 minutes.
The other performance win would be to use a selection (e.g. from fortran stdlib) rather than sorting algorithm for the lines like
I haven't introduced that because of the extra dependency, but it shaves the runtime down to 4:45 with the same conditions as the tests above (an extra minute or so). All told, with compiler optimisations, this PR, and using selection instead of sort, the 1/10 topography generation is down from 2 hours to 200 seconds.