Skip to content

Test L0, L1, L2, and dropout regularization approaches #2

@eccuraa

Description

@eccuraa

As per issue #391 in -us-data.

This story can be closed when we feel good about the value of each one of the L0 parameters, which were hastily set during its initial integration. There is a risk of optimizing on future target holdouts, but the algorithm it will be competing against has been tuned to the full set as well, so they should be on even footing. The idea is to get the L0 approach "working well" without explicitly tuning, and then doing holdout validation to get an (imperfect) idea of the out-of-sample performance.

Optionally, grab some ideas from the L0 paper

It might be worth it to chat with an AI about whether our current code has faithfully implemented this method
In Section 4, there are some initial parameter values that have been set. In one case, the number of records N was used to set the L0 parameter lambda at 0.001/N. I believe that's close to what is in the code now if N is the number of targets.
Evaluate our dropout approach and how well it plays with L0. I think it's going to be contradictory, and we're going to want to remove it, but I don't know for sure.

The paper has a way to mix L2 regularization (closer to our dropout's goals) just on the non-zero parameters. How easy would that be to add?

Though not mentioned in the paper, a method of "annealing" actually takes the temperature parameter down from a higher starting value and drops it lower in later epochs. It might be worth investigating this option.

I noticed that the algorithm is still improving at 400 epochs and I am curious how different it would be if it was allowed to go longer. What if we decreased the learning rate at some point during the training?

How sensitive is the algorithm to leaving out a set of targets? Do we need to set lambda dynamically?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions