-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
Thanks for producing this interesting paper. I have a confusion about the MoE part and wonder if you can clarify it.
In the paper, Eq (6) is the objective to be optimized, which involves KL(sum_i expert_i || prior). Suppose expert_i is Gaussian, how do you compute the KL between a mixture-of-gaussian and the prior? I don't think this has a closed form.
I tried to find the answer in the code, and came down to moe_fusion and
mixture_component_selection. It seems to be performing some sort of sampling. Is this the same as the importance sampling in MMVAE (Shi 2019)?
Any clarification would be much appreciated. Thank you.
Metadata
Metadata
Assignees
Labels
No labels