Write data extractor task for airflow

This should extract precipitation information from the grib files and produce 

- an index file of for finding neighboring grid points. This file should be in feather format and should map H3 indexes at resolution 6 ... 9 to a list of the grid points with that index. The list should contain the resolution 15 H3 index of each grid point.
- a daily set of data files in feather format that contain hourly precipitation information for multiple grid points. Data should be allocated to data files by sorting by grid point H3 index and by time. 
- meta-data file that records which files contain which grid points

# Questions:
- How many grid points should be assigned to each file to achieve desired retrieval times for 100 days of data for a single point?
- Should different grid points be partitioned by row group to improve read times?
- How can data integrity be verified?
- Can the meta-data be replaced with a deterministic mapping from nearest grid point to file name (something like mod of the hash)?
- How can we best have a single index file that merges all observed grid points into a single index?
- Should we be merging many days of data into single data files?


# Links:
https://arrow.apache.org/docs/python/
https://github.com/agstack/weather-server/tree/main/experiments/s2-geohash


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Write data extractor task for airflow #4

Questions:

Links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Write data extractor task for airflow #4

Description

Questions:

Links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions