Causal and Synthesizer Multihead Attention

We have implemented different variants of Multihead Attention mechanisms:

1. Causal Self-Attention

Causal Self-Attention is the vanilla multi-head masked self-attention layer with a projection at the end. It employs the scaled dot-product as the scoring function:

$$\text{Attention}(Q, K, V) = \text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)V$$

Where:

Q, K, and V are the query, key, and value matrices.
d_k is the dimensionality of the key vectors.

This mechanism computes a block_size × block_size attention matrix, which makes the computation quadratic in the sequence length.

2. Synthesizer Self-Attention

Synthesizer Self-Attention is a recent alternative to causal self-attention that removes the need for pairwise dot-product operations. Instead, it directly computes the block_size × block_size matrix of attention scores:

$$A = W_2 \sigma(W_1X + b_1) + b_2$$

Where:

W_1, W_2 are learnable weight matrices.
b_1, b_2 are biases.
\sigma is a non-linear activation function.

Synthesizer Self-Attention reduces the quadratic computational cost associated with the scaled dot-product operation and offers an efficient alternative for long sequences.

References

Synthesizer: Rethinking Self-Attention in Transformer Models

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.vscode		.vscode
LICENSE		LICENSE
README.md		README.md
causal.py		causal.py
synthesizer.py		synthesizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causal and Synthesizer Multihead Attention

1. Causal Self-Attention

2. Synthesizer Self-Attention

References

About

Releases

Packages

Languages

License

iafarhan/causal-synthesizer-multihead-attention

Folders and files

Latest commit

History

Repository files navigation

Causal and Synthesizer Multihead Attention

1. Causal Self-Attention

2. Synthesizer Self-Attention

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages