Skip to content

Commit

Permalink
minor consistency edit
Browse files Browse the repository at this point in the history
  • Loading branch information
martinjaggi committed Nov 29, 2023
1 parent 56b8866 commit 62de0b2
Show file tree
Hide file tree
Showing 3 changed files with 7 additions and 5 deletions.
1 change: 1 addition & 0 deletions AUTHORS
Original file line number Diff line number Diff line change
Expand Up @@ -11,5 +11,6 @@ Kyle Matoba, Idiap Research Institute and EPFL
Amirkeivan Mohtashami, EPFL
Matteo Pagliardini, EPFL
Francesco Salvi,
Xingyao Wang


2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ If you use this software please cite it:
Francesco Salvi and
Antoine Bosselut and
Martin Jaggi},
title = {epfLLM Megatron-LM},
title = {epfLLM Megatron-LLM},
year = 2023,
url = {https://github.com/epfLLM/Megatron-LLM}
}
Expand Down
9 changes: 5 additions & 4 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,14 @@ Our repository is a modification of the `original Megatron-LM codebase <https://

Added key features include:

- `LLaMa <https://arxiv.org/abs/2302.13971>`_, `LLaMa 2 <https://arxiv.org/abs/2307.09288>`_, `Falcon <https://huggingface.co/tiiuae>`_, `Code Llama <https://together.ai/blog/llama-2-7b-32k>`_ `Mistral https://arxiv.org/abs/2310.06825`_ support.
- support training of large models (70B Llama 2, 65B Llama 1, 34B Code Llama, and 40B Falcon) on commodity hardware on multiple nodes
- architectures supported: `LLaMa <https://arxiv.org/abs/2302.13971>`_, `LLaMa 2 <https://arxiv.org/abs/2307.09288>`_, `Falcon <https://huggingface.co/tiiuae>`_, `Code Llama <https://together.ai/blog/llama-2-7b-32k>`_ and `Mistral https://arxiv.org/abs/2310.06825`_.
- support training of large models (70B Llama 2, 65B Llama 1, 34B Code Llama, 40B Falcon and Mistral) on commodity hardware on multiple nodes
- 3-way parallelism: tensor parallel, pipeline parallel and data parallel training (inherited from Megatron)
- full pretraining, finetuning and instruct tuning support
- Support for special tokens & tokenizers
- grouped-query attention (GQA) and multi-query attention (MQA)
- Rotary Position Embeddings (RoPE), RMS layer norm, Lima dropout
- `ROPE scaling <https://together.ai/blog/llama-2-7b-32k>`_ for longer attention context support
- `RoPE scaling <https://together.ai/blog/llama-2-7b-32k>`_ for longer attention context support
- FlashAttention 2
- BF16 / FP16 training
- WandB integration
Expand Down Expand Up @@ -61,6 +61,7 @@ If you use this software please cite it:
Andreas Köpf and
Kyle Matoba and
Amirkeivan Mohtashami and
Xingyao Wang and
Olivia Simin Fan and
Axel Marmet and
Deniz Bayazit and
Expand All @@ -69,7 +70,7 @@ If you use this software please cite it:
Francesco Salvi and
Antoine Bosselut and
Martin Jaggi},
title = {epfLLM Megatron-LM},
title = {epfLLM Megatron-LLM},
year = 2023,
url = {https://github.com/epfLLM/Megatron-LLM}
}

0 comments on commit 62de0b2

Please sign in to comment.