Skip to content

Transformer Architectures Comparison in Natural Language Generation Tasks

Notifications You must be signed in to change notification settings

moon23k/Transformer_Variants

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Transformer Variants

Transformer introduced a new approach to sequence processing through the Attention Mechanism, revolutionizing the traditional sequential data processing methods. Along with its success, many research studies based on Transformer has conducted. However, most of these studies focused on utilizing Transformer as it is and exploring additional advancements, resulting in a relatively limited number of studies comparing the performance of natural language processing based on the structural changes of the Transformer model itself.

To mend this situation, this repo focuses on structure of the Transformer and implements three Transformer models: Standard Transformer, Recurrent Transformer, and Evolved Transformer. The performance evaluation of each model is conducted in three natural language generation tasks: Neural Machine Translation, Dialogue Generation, and Text Summarization.



Model Architectures

Standard Transformer Recurrent Transformer Evolved Transformer
The most basic Transformer Model Architecture introduced in the Attention Is All You Need paper The recursive layer-connected Transformer model structure introduced in the Universal Transformers paper The advanced Transformer model structure introduced in the The Evolved Transformer



Experimental Setups

Data Setup Model Setup Training Setup
Machine Translation:WMT14 En-De Embedding Dimension: 256 Epochs: 10
Dialogue Generation:Daily Dialogue Hidden Dimension: 256 Batch Size: 32
Text Summarization:Daily Mail PFF Dimension: 512 Learning Rate: 5e-4
Train Data Volumn:100,000 N Heads: 512 iters_to_accumulate: 4
Valid Data Volumn:1,000 N Layers: 6 Gradient Clip Max Norm: 1
Vocab Size: 15,000 N Cells: 3 Apply AMP: True



Result

Model Translation Dialogue Generation Summarization
Standard Transformer - - -
Recurrent Transformer - - -
Evolved Transformer - - -



How to Use

Clone git repo in your env

git clone https://github.com/moon23k/Transformer_Variants.git


Setup Datasets and Tokenizer via setup.py file

python3 setup.py -task ['all', 'translation', 'dialogue', 'summarization']


Actual tasks are done by running run.py file

python3 run.py -task ['translation', 'dialogue', 'summarization']
               -mode ['train', 'test', 'inference']
               -model ['standard', 'recurrent', 'evolved']
               -search ['greedy', 'beam']



Reference

Releases

No releases published

Packages

No packages published

Languages