-
Notifications
You must be signed in to change notification settings - Fork 8
Database introduction
Jin Su edited this page Dec 27, 2024
·
2 revisions
Database | Introduction | Number of proteins | Reference link |
---|---|---|---|
Swiss-Prot | Human-reviewed protein sequence database | 500K | https://www.uniprot.org/uniprotkb?query=reviewed:true |
UniRef50 | Generated by clustering UniProt proteins at 50% sequence identity | 50M | https://www.uniprot.org/help/uniref |
Uncharacterized | All proteins labeled as "Uncharacterized" at UniProt website | 30M | https://www.uniprot.org/uniprotkb?query=Uncharacterized |
OMG_prot50 | Created by clustering the Open MetaGenomic dataset (OMG) at 50% sequence identity | 200M | https://huggingface.co/datasets/tattabio/OMG_prot50 |
PDB | A database for the three-dimensional structural data of proteins | 700K (every chain in a structure was extracted and counted as one protein) | https://www.rcsb.org/ |
GOPC | Global ocean microbiome protein catalog sequences | 2B | https://db.cngb.org/maya/datasets/MDB0000002 |