Skip to content

Database introduction

Jin Su edited this page Dec 27, 2024 · 2 revisions
Database Introduction Number of proteins Reference link
Swiss-Prot Human-reviewed protein sequence database 500K https://www.uniprot.org/uniprotkb?query=reviewed:true
UniRef50 Generated by clustering UniProt proteins at 50% sequence identity 50M https://www.uniprot.org/help/uniref
Uncharacterized All proteins labeled as "Uncharacterized" at UniProt website 30M https://www.uniprot.org/uniprotkb?query=Uncharacterized
OMG_prot50 Created by clustering the Open MetaGenomic dataset (OMG) at 50% sequence identity 200M https://huggingface.co/datasets/tattabio/OMG_prot50
PDB A database for the three-dimensional structural data of proteins 700K (every chain in a structure was extracted and counted as one protein) https://www.rcsb.org/
GOPC Global ocean microbiome protein catalog sequences 2B https://db.cngb.org/maya/datasets/MDB0000002
Clone this wiki locally