-
Notifications
You must be signed in to change notification settings - Fork 383
Increase default PnetCDF header size to avoid I/O hangs #1386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Increase default PnetCDF header size to avoid I/O hangs #1386
Conversation
This reduces the likelihood of header reallocation as metadata grows. Header reallocation is extremely expensive and can significantly degrade parallel I/O performance.
|
(Sorry to jump into the discussion.) You can also set argument FYI. When adding new data objects into an existing file which causes the file header section to grow, PnetCDF must move the data section to a place with a higher file offset, which can be expensive, especially when the size of exiting file is large. The application program most likely just ran very slowly, but not hanging. |
|
@wkliao Thanks very much for the suggestion to consider If we were to open an existing file for writing and call |
@wkliao This issue only surfaced when running MPAS through MPAS-JEDI under very specific conditions, which made it difficult to track down. Is there a check you’re aware of that we could have put in place to make this easier to catch? |
|
Let me use two PnetCDF terminologies to help explain.
Subtracting the two gives you the free space available in the header section.
In this case, PnetCDF will check if the file extent of the existing file is aligned with Please note that |
Two PnetCDF APIs can be used to query the If the free space is sufficiently large, then just call |
| int io_group; | ||
| MPI_Comm io_file_comm; | ||
| MPI_Comm io_group_comm; | ||
| MPI_Info info = MPI_INFO_NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure that there's any benefit to declaring info in this scope and using it in all calls to ncmpi_open. I'd suggest again that we declare this variable in the block beginning on line 323 (325 in this PR) and apply info only when creating new files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My reasoning here was to ensure that the info variable is always in a valid state (which I think is important when possible) and to improve code consistency, so that all calls to ncmpi_open use the same MPI_Info variable. This also makes it easier to pass a non-null MPI_Info object to other ncmpi_open calls in the future, if needed. That said, I don’t feel strongly about this change and am happy to revert it if you prefer the previous approach to calling ncmpi_open.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since our intent is only to pass a hint to ncmpi_create, and not to change the semantics of the calls to ncmpi_open for writing an existing file or reading a file, I think it would be much cleaner and clearer if we only use info within the scope of the block containing the call to ncmpi_create.
If in future we decide to pass a non-null info to either of the ncmpi_open calls, we can decide at that time how to handle the MPI_Info instance.
| int io_group; | ||
| MPI_Comm io_file_comm; | ||
| MPI_Comm io_group_comm; | ||
| MPI_Info info = MPI_INFO_NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since our intent is only to pass a hint to ncmpi_create, and not to change the semantics of the calls to ncmpi_open for writing an existing file or reading a file, I think it would be much cleaner and clearer if we only use info within the scope of the block containing the call to ncmpi_create.
If in future we decide to pass a non-null info to either of the ncmpi_open calls, we can decide at that time how to handle the MPI_Info instance.
This PR increases the default Parallel NetCDF (PnetCDF) header size to 128 KB to reduce the likelihood of header reallocation during MPAS I/O. In certain situations, when MPAS overwrites existing string attributes or variables with larger values, the NetCDF header can grow beyond its preallocated padding, requiring PnetCDF to reallocate the header during
ncmpi_enddef, which can lead to an I/O hang. This behavior was identified as the root cause of the hang reported in MPAS-Workflow issue #384. By increasing the default header size, this PR decreases the likelihood that header reallocation is required when string attributes or variables are overwritten with larger values. Preliminary testing of the calculation that previously triggered the hang indicates that this change resolves the issue without impacting calculation results or I/O performance.Fixes #1385