How do mixture-of-experts layers affect transformer models?
This new LLM technique has started improving the results of models without additional training.
