-
-
Notifications
You must be signed in to change notification settings - Fork 211
Closed
Labels
Description
This issue is a proposal that we (1) load datasets as pandas by default and (2) rewrite the dataset loader to be pandas by default and convert to numpy if the user requests a numpy array.
The reasons for this proposal are:
- pandas is much more stable as it used to be a few years ago when we started this project and can now also properly handle strings (see Proposal: Use pandas str type for str datasets #1107).
- pandas can properly encode categorical columns, which can make it easier for projects building on OpenML-Python to handle these categories.
- We will use parquet in the background to store files anyway, which has to be interfaced with pandas.
PGijsbers and LennartPurucker