Skip to content

Confusion about MoE fusion #3

@mistycheney

Description

@mistycheney

Thanks for producing this interesting paper. I have a confusion about the MoE part and wonder if you can clarify it.

In the paper, Eq (6) is the objective to be optimized, which involves KL(sum_i expert_i || prior). Suppose expert_i is Gaussian, how do you compute the KL between a mixture-of-gaussian and the prior? I don't think this has a closed form.

I tried to find the answer in the code, and came down to moe_fusion and
mixture_component_selection. It seems to be performing some sort of sampling. Is this the same as the importance sampling in MMVAE (Shi 2019)?

Any clarification would be much appreciated. Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions