Mixture of Experts (MoE) Visualization

Mixture of Experts (MoE) is a deep learning technique where a model is composed of multiple sub-models ("experts"), each specializing in different parts of the input space. A "gating network" determines which experts effectively process a given input, allowing for huge model capacity with efficient inference cost (Sparsity).

Input
Gating Network
Experts