GitHub - moon23k/Transformer_Variants: Transformer Architectures Comparison in Natural Language Generation Tasks

Transformer Variants

Transformer introduced a new approach to sequence processing through the Attention Mechanism, revolutionizing the traditional sequential data processing methods. Along with its success, many research studies based on Transformer has conducted. However, most of these studies focused on utilizing Transformer as it is and exploring additional advancements, resulting in a relatively limited number of studies comparing the performance of natural language processing based on the structural changes of the Transformer model itself.

To mend this situation, this repo focuses on structure of the Transformer and implements three Transformer models: Standard Transformer, Recurrent Transformer, and Evolved Transformer. The performance evaluation of each model is conducted in three natural language generation tasks: Neural Machine Translation, Dialogue Generation, and Text Summarization.

Model Architectures

Standard Transformer	Recurrent Transformer	Evolved Transformer

The most basic Transformer Model Architecture introduced in the Attention Is All You Need paper	The recursive layer-connected Transformer model structure introduced in the Universal Transformers paper	The advanced Transformer model structure introduced in the The Evolved Transformer

Experimental Setups

Data Setup	Model Setup	Training Setup
`Machine Translation:` `WMT14 En-De`	`Embedding Dimension:` `256`	`Epochs:` `10`
`Dialogue Generation:` `Daily Dialogue`	`Hidden Dimension:` `256`	`Batch Size:` `32`
`Text Summarization:` `Daily Mail`	`PFF Dimension:` `512`	`Learning Rate:` `5e-4`
`Train Data Volumn:` `100,000`	`N Heads:` `512`	`iters_to_accumulate:` `4`
`Valid Data Volumn:` `1,000`	`N Layers:` `6`	`Gradient Clip Max Norm:` `1`
`Vocab Size:` `15,000`	`N Cells:` `3`	`Apply AMP:` `True`

Result

Model	Translation	Dialogue Generation	Summarization
Standard Transformer	-	-	-
Recurrent Transformer	-	-	-
Evolved Transformer	-	-	-

How to Use

Clone git repo in your env

git clone https://github.com/moon23k/Transformer_Variants.git

Setup Datasets and Tokenizer via setup.py file

python3 setup.py -task ['all', 'translation', 'dialogue', 'summarization']

Actual tasks are done by running run.py file

python3 run.py -task ['translation', 'dialogue', 'summarization']
               -mode ['train', 'test', 'inference']
               -model ['standard', 'recurrent', 'evolved']
               -search ['greedy', 'beam']

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
ckpt		ckpt
data		data
model		model
module		module
README.md		README.md
config.yaml		config.yaml
run.py		run.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transformer Variants

Model Architectures

Experimental Setups

Result

How to Use

Reference

About

Releases

Packages

Languages

moon23k/Transformer_Variants

Folders and files

Latest commit

History

Repository files navigation

Transformer Variants

Model Architectures

Experimental Setups

Result

How to Use

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages