Confusion about MoE fusion

Thanks for producing this interesting paper. I have a confusion about the MoE part and wonder if you can clarify it.

In the paper, Eq (6) is the objective to be optimized, which involves `KL(sum_i expert_i || prior)`. Suppose expert_i is Gaussian, how do you compute the KL between a mixture-of-gaussian and the prior? I don't think this has a closed form.

I tried to find the answer in the code, and came down to [moe_fusion](https://github.com/thomassutter/MoPoE/blob/477a441ecb6c735a0b8af4d643fe3ac04c58171f/utils/BaseMMVae.py#L93) and 
[mixture_component_selection](https://github.com/thomassutter/MoPoE/blob/477a441ecb6c735a0b8af4d643fe3ac04c58171f/utils/utils.py#L61). It seems to be performing some sort of sampling. Is this the same as the importance sampling in MMVAE (Shi 2019)?

Any clarification would be much appreciated. Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Confusion about MoE fusion #3

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Confusion about MoE fusion #3

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions