Skip to content

chunk up fiboa datasets #168

@geospatial-jeff

Description

@geospatial-jeff

Fiboa datasets are currently published as .parquet files of varying sizes. Some are 100s of MB, some are many GB. This makes it difficult to perform further processing on the data as each file requires different amounts of memory to process. Chunking up the data into multiple files, each of similar size, could help here.

Some of the larger fiboa datasets (ex. japan) take ~120-150GB of memory to process which becomes quite expensive and unwieldy. It probably makes sense to roll out this change along with the move to a dedicated source repo - fiboa/data#51

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions