Skip to content

Latest commit

 

History

History
182 lines (114 loc) · 4.66 KB

README.rst

File metadata and controls

182 lines (114 loc) · 4.66 KB

pdbsearch

travis coveralls pypi version commit

pdbsearch is a Python library for searching for PDB structures using the RCSB web services.

Example

>>> import pdbsearch
>>> codes = pdbsearch.search(limit=5, ligand_name="CU")
>>> codes
['3HW7', '2WKO', '2WOF', '2WOH', '2WO0']

Installing

pip

pdbsearch can be installed using pip (you may need to use pip3):

$ pip install pdbsearch

If you get permission errors, try using sudo:

$ sudo pip install pdbsearch

Development

The repository for pdbsearch, containing the most recent iteration, can be found here. To clone the pdbsearch repository directly from there, use:

$ git clone git://github.com/samirelanduk/pdbsearch.git

Requirements

pdbsearch requires requests.

Testing

To test a local version of pdbsearch, cd to the pdbsearch directory and run:

$ python -m unittest discover tests

You can opt to only run unit tests or integration tests:

$ python -m unittest discover tests.unit $ python -m unittest discover tests.integration

Overview

pdbsearch is a Python library for searching for PDB structures using the RCSB web services.

Returning all PDB Codes

You can get all PDB codes without any particular search expression like so:

>>> import pdbsearch
>>> codes = pdbsearch.search(limit=None)
>>> len(codes)
174994

This will take a few seconds, and requires downloading a rather large JSON object over the network. Generally it is better to paginate the results:

>>> first_ten_codes = pdbsearch.search(limit=10)
>>> second_ten_codes = pdbsearch.search(start=10, limit=10)
>>> third_ten_codes = pdbsearch.search(start=20, limit=10)

You can sort the results by any of the terms at https://search.rcsb.org/structure-search-attributes.html:

>>> most_recent_codes = pdbsearch.search(sort="rcsb_accession_info.deposit_date")
>>> earliest_codes = pdbsearch.search(sort="-rcsb_accession_info.deposit_date")

As these are somewhat cumbersome, some of them have a shorthand:

>>> pdbsearch.search(limit=5, sort="code")
['9XIM', '9XIA', '9WGA', '9RUB', '9RSA']
>>> pdbsearch.search(limit=5, sort="-resolution")
['3NIR', '5D8V', '1EJG', '3P4J', '5NW3']

You can sort by multiple criteria:

>>> pdbsearch.search(limit=5, sort=["-atoms", "released"])
['1ANP', '6UOU', '6UOW', '1Q7O', '6QTF']

Search Criteria

You can search by passing keywords to the search function:

>>> pdbsearch.search(limit=5, ligand_name="ZN")
['3HW7', '3I7I', '3I7G', '2WFX', '2WGT']

You can modify the operator used with double underscores:

>>> pdbsearch.search(limit=5, ligand_name__in=["ZN", "CU"])
['3HW7', '3I7I', '3I7G', '2WFX', '2WGT']
>>> pdbsearch.search(limit=5, resolution__lt=2)
['3HW3', '3I83', '3HVS', '3HW4', '3HW5']
>>> pdbsearch.search(limit=5, atoms__within=[200, 300])
['2WH9', '2WPY', '395D', '396D', '2X8Q']

These are some shorthands, but you can search by any of the terms in the above linked list by replacing the dot with a double underscore:

>>> pdbsearch.search(limit=5, citation__rcsb_authors="Sula, A.")
['4CAH', '4CAI', '4X8A', '4X88', '4X89']

If you use more than one term, they will be combined with AND operators:

>>> pdbsearch.search(limit=5, ligand_name="ZN", atoms__within=[200, 300])
['3WUP', '3ZNF', '2YTA', '2YTB', '2YSV']

Changelog

Release 0.4.0

24 Jul 2022

  • Updated library for v2 of the RCSB search API.

Release 0.3.0

29 May 2021

  • Added search criteria.
  • Added AND chaining for search criteria.

Release 0.2.0

25 April 2021

  • Added ability to sort results.
  • Created shorthand system for common sort criteria.

Release 0.1.0

2 March 2021

  • Started library.
  • Added ability to fetch all PDB codes.
  • Basic pagination.