Structured-Then-Unstructured Pruning for Scalable MoE Pruning [Paper Reflection]
Combination-of-Consultants (MoEs) architectures supply a promising answer by sparsely activating particular components of the mannequin, decreasing the inference overhead. Nonetheless, ...