-
Notifications
You must be signed in to change notification settings - Fork 4
Open
Labels
good first issueGood for newcomersGood for newcomers
Description
This should extract precipitation information from the grib files and produce
- an index file of for finding neighboring grid points. This file should be in feather format and should map H3 indexes at resolution 6 ... 9 to a list of the grid points with that index. The list should contain the resolution 15 H3 index of each grid point.
- a daily set of data files in feather format that contain hourly precipitation information for multiple grid points. Data should be allocated to data files by sorting by grid point H3 index and by time.
- meta-data file that records which files contain which grid points
Questions:
- How many grid points should be assigned to each file to achieve desired retrieval times for 100 days of data for a single point?
- Should different grid points be partitioned by row group to improve read times?
- How can data integrity be verified?
- Can the meta-data be replaced with a deterministic mapping from nearest grid point to file name (something like mod of the hash)?
- How can we best have a single index file that merges all observed grid points into a single index?
- Should we be merging many days of data into single data files?
Links:
https://arrow.apache.org/docs/python/
https://github.com/agstack/weather-server/tree/main/experiments/s2-geohash
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers