Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
-
Updated
Nov 11, 2024 - Python
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity"
The implementation of mixtures for different tasks.
Add a description, image, and links to the mixture-of-models topic page so that developers can more easily learn about it.
To associate your repository with the mixture-of-models topic, visit your repo's landing page and select "manage topics."