minor consistency edit

epfLLM · Nov 29, 2023 · 62de0b2 · 62de0b2
1 parent 56b8866
commit 62de0b2
Show file tree

Hide file tree

Showing 3 changed files with 7 additions and 5 deletions.
diff --git a/AUTHORS b/AUTHORS
@@ -11,5 +11,6 @@ Kyle Matoba, Idiap Research Institute and EPFL
 Amirkeivan Mohtashami, EPFL 
 Matteo Pagliardini, EPFL 
 Francesco Salvi, 
+Xingyao Wang
 
 
diff --git a/README.md b/README.md
@@ -60,7 +60,7 @@ If you use this software please cite it:
                   Francesco Salvi  and
                   Antoine Bosselut  and
                   Martin Jaggi},
-  title        = {epfLLM Megatron-LM},
+  title        = {epfLLM Megatron-LLM},
   year         = 2023,
   url          = {https://github.com/epfLLM/Megatron-LLM}
 }

diff --git a/docs/index.rst b/docs/index.rst
@@ -8,14 +8,14 @@ Our repository is a modification of the `original Megatron-LM codebase <https://
 
 Added key features include:
 
-- `LLaMa <https://arxiv.org/abs/2302.13971>`_, `LLaMa 2 <https://arxiv.org/abs/2307.09288>`_, `Falcon <https://huggingface.co/tiiuae>`_, `Code Llama <https://together.ai/blog/llama-2-7b-32k>`_ `Mistral https://arxiv.org/abs/2310.06825`_ support.
-- support training of large models (70B Llama 2, 65B Llama 1, 34B Code Llama, and 40B Falcon) on commodity hardware on multiple nodes
+- architectures supported: `LLaMa <https://arxiv.org/abs/2302.13971>`_, `LLaMa 2 <https://arxiv.org/abs/2307.09288>`_, `Falcon <https://huggingface.co/tiiuae>`_, `Code Llama <https://together.ai/blog/llama-2-7b-32k>`_ and `Mistral https://arxiv.org/abs/2310.06825`_.
+- support training of large models (70B Llama 2, 65B Llama 1, 34B Code Llama, 40B Falcon and Mistral) on commodity hardware on multiple nodes
 - 3-way parallelism: tensor parallel, pipeline parallel and data parallel training (inherited from Megatron)
 - full pretraining, finetuning and instruct tuning support
 - Support for special tokens & tokenizers
 - grouped-query attention (GQA) and multi-query attention (MQA)
 - Rotary Position Embeddings (RoPE), RMS layer norm, Lima dropout
-- `ROPE scaling <https://together.ai/blog/llama-2-7b-32k>`_ for longer attention context support
+- `RoPE scaling <https://together.ai/blog/llama-2-7b-32k>`_ for longer attention context support
 - FlashAttention 2
 - BF16 / FP16 training
 - WandB integration
@@ -61,6 +61,7 @@ If you use this software please cite it:
                      Andreas Köpf  and
                      Kyle Matoba  and
                      Amirkeivan Mohtashami  and
+                     Xingyao Wang  and
                      Olivia Simin Fan  and
                      Axel Marmet  and
                      Deniz Bayazit  and
@@ -69,7 +70,7 @@ If you use this software please cite it:
                      Francesco Salvi  and
                      Antoine Bosselut  and
                      Martin Jaggi},
-     title        = {epfLLM Megatron-LM},
+     title        = {epfLLM Megatron-LLM},
      year         = 2023,
      url          = {https://github.com/epfLLM/Megatron-LLM}
    }