Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection]
Combination-of-Consultants (MoEs) architectures supply a promising answer by sparsely activating particular elements of the mannequin, decreasing the inference overhead. Nevertheless, ...