Skip to content

tests: Unify LGBM test datasets #128

@AlbertoEAF

Description

@AlbertoEAF

Issue

Originally posted by @AndreFCruz in #116 (comment):

It was generated with the following ipython notebook. Should we commit this?
generate-fairgbm-sensitive-attribute.zip


Request

To test fairgbm the test datasets were modified. Documentation of the new column generated by André should be added to the file https://raw.githubusercontent.com/feedzai/feedzai-openml-java/master/openml-lightgbm/lightgbm-provider/src/test/resources/test_data/stats.org, which declares how features were generated.

Also, since @shenggwang introduced tests for the explanations/contributions, he added another whole set of test sets based on those initial test datasets that I had created, meaning we now have two similar but different datasets.

I suggest unification of all those test resources to avoid the redundant test payloads in the repo. Given that Sheng had the work to explain how to generate the new test sets in Python, with code that can be executed in the future, and André also used Python code to generate updated test set, I suggest getting rid of the older datasets I generated (and described in stats.org, for they use excel formulas instead), regenerating Sheng's datasets with the new python code from André and refactoring the tests to use the new test sets. Also, add that updated python code info to the README.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions