-
Notifications
You must be signed in to change notification settings - Fork 154
Open
Labels
Description
It would be very useful to have a version of the dataset download files provided in JSON Lines format (one self contained record per line) so that it is splittable for ingestion by a distributed cluster computing system like Spark. In the current format, each file has to be loaded into memory entirely before it can be ingested.
beckyconning