Transformer introduced a new approach to sequence processing through the Attention Mechanism, revolutionizing the traditional sequential data processing methods. Along with its success, many research studies based on Transformer has conducted. However, most of these studies focused on utilizing Transformer as it is and exploring additional advancements, resulting in a relatively limited number of studies comparing the performance of natural language processing based on the structural changes of the Transformer model itself.
To mend this situation, this repo focuses on structure of the Transformer and implements three Transformer models: Standard Transformer, Recurrent Transformer, and Evolved Transformer. The performance evaluation of each model is conducted in three natural language generation tasks: Neural Machine Translation, Dialogue Generation, and Text Summarization.
Standard Transformer | Recurrent Transformer | Evolved Transformer |
---|---|---|
The most basic Transformer Model Architecture introduced in the Attention Is All You Need paper | The recursive layer-connected Transformer model structure introduced in the Universal Transformers paper | The advanced Transformer model structure introduced in the The Evolved Transformer |
Data Setup | Model Setup | Training Setup |
---|---|---|
Machine Translation: WMT14 En-De |
Embedding Dimension: 256 |
Epochs: 10 |
Dialogue Generation: Daily Dialogue |
Hidden Dimension: 256 |
Batch Size: 32 |
Text Summarization: Daily Mail |
PFF Dimension: 512 |
Learning Rate: 5e-4 |
Train Data Volumn: 100,000 |
N Heads: 512 |
iters_to_accumulate: 4 |
Valid Data Volumn: 1,000 |
N Layers: 6 |
Gradient Clip Max Norm: 1 |
Vocab Size: 15,000 |
N Cells: 3 |
Apply AMP: True |
Model | Translation | Dialogue Generation | Summarization |
---|---|---|---|
Standard Transformer | - | - | - |
Recurrent Transformer | - | - | - |
Evolved Transformer | - | - | - |
Clone git repo in your env
git clone https://github.com/moon23k/Transformer_Variants.git
Setup Datasets and Tokenizer via setup.py file
python3 setup.py -task ['all', 'translation', 'dialogue', 'summarization']
Actual tasks are done by running run.py file
python3 run.py -task ['translation', 'dialogue', 'summarization']
-mode ['train', 'test', 'inference']
-model ['standard', 'recurrent', 'evolved']
-search ['greedy', 'beam']