Automatically load dataset as pandas

This issue is a proposal that we (1) load datasets as pandas by default and (2) rewrite the dataset loader to be pandas by default and convert to numpy if the user requests a numpy array.

The reasons for this proposal are:
1. pandas is much more stable as it used to be a few years ago when we started this project and can now also properly handle strings (see #1107).
2. pandas can properly encode categorical columns, which can make it easier for projects building on OpenML-Python to handle these categories.
3. We will use parquet in the background to store files anyway, which has to be interfaced with pandas.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Automatically load dataset as pandas #1251

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Automatically load dataset as pandas #1251

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions