AmesFormer: A graph transformer neural network for state-of-the-art mutagenicity prediction.
- We achieve state-of-the-art mutagenicity prediction on a standardised Ames dataset.
- We provide a large, clean open-source dataset of Ames mutagenicity.
data/
is empty, users should place their own dataset here, or use our providedcombined.csv
dataset.data_cleaning/
contains the code used to createcombined.csv
from the individual raw datasets and code to generate some of our results figures.- Our Ames dataset, excluding the proprietary "Honma" components, is available in
datacleaning/Combined_2s_as_0s_publication.csv
- Our Ames dataset, excluding the proprietary "Honma" components, is available in
gnn-tools/
is our custom Rust library for calculating the shortest path distance (SPD) for the edge and spatial encoding modules.graphormer/
contains the AmesFormer model, in essence a full reimplimentation of Graphormer in PyTorch Geometric.hparams/
contains the hyperparameter configurations we used in our experiments. The one used for our final model for which results are reported ishparams/best_32_1_5e4.toml
.pretrained_models/
contains the saved model checkpoints for our final AmesFormer model.tests/
contains unit tests for attention, encodings, etc.
This repository includes some tools which are built using Rust and create python bindings with Maturin. These must both be installed in order to build from source.
Installation is simplest with Poetry. Run:
poetry lock --no-update
to gather the required information and populate cachespoetry install
to furnish a virtual environment.poetry run inference --dataset Combined --name AmesFormer-Pro
to run inference our best model.poetry run train --dataset Combined
to begin training the model. Seepoetry run train --help
for options.
Visualization is provided with tensorboard
. To see local readouts during training, first install tensorboard
on your system. We recommend via pipx:
pipx install tensorboard
You can start a local tensorboard server via the following command:
tensorboard --logdir=<logdir>
, where<logdir>
is the path to the tensorboard logs. By default, these are created in theruns
folder.
- Centrality Encoding
- Spatial Encoding
- Edge Encoding
- Multi-Head Self-Attention
- VNODE global attention