Skip to content

Enable parallel netCDF dataset reads #912

@davidhassell

Description

@davidhassell

By introducing two new netCDF backends:

  • For netCDF-4: h5netcdf with the pyfive backend
  • For netCDF-3: scipy.io.netcdf_file

we can enable parallel reading of netCDF datasets.

h5netcdf with the pyfive backend should become the new default read engine (it's currently the non-threadsafe h5netcdf with the h5py backend).

With pyfive in the mix, we can also improve the performance of active storage reductions, by passing extant pyfive Dataset objects to activestorage.Active, as opposed to a dataset URL and variable name.

PR to follow (which will require NCAS-CMS/cfdm#375 to be resolved first).

Metadata

Metadata

Assignees

No one assigned

    Labels

    active storageRelating to active storage operationsdataset readRelating to reading datasetsenhancementNew feature or requestperformanceRelating to speed and memory performance

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions