Storing results and memory errors

With #299 we get even better support for running on HPC. However, the existing way in which results are stored does not scale well once going to a very large number of experiments or when creating high dimensional data. Presently, the results are stored as a collection of CSVs wrapped in a tarball. The main advantage of this is that the results are easy to unzip and open with any text editor or even Excel. It is also a very convenient way of storing results in a cross-platform, cross-language way. However, it breaks with large outputs because you will run into memory errors.

A short-term solution is to change `save_results`. It currently builds up the entire tarball in memory before flushing it to disk. A slightly more memory-efficient solution is to create a directory on disk, write each CSV file to it, and then turn the entire directory into a tarball. Some memory profiling is likely needed as to how much of a difference this will make.

A longer-term solution is to add other storage solutions where results are flushed to disk while they are coming in. This avoids having to build up in memory the very large results dataset. The basic machinery for this is in place because of the callback keyword argument that is passed to `perform_experiments`. It requires, probably, however, a minor rethink of how to capture the serialization of all classes of outcomes (_i.e._, `to_disk` and `from_disk` ). Depending on the chosen storage solution, a slightly different serialization will be required.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Storing results and memory errors #304

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Storing results and memory errors #304

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions