doi2bibtex is a small Python package that can be used to resolve DOIs (and other identifiers) into a BibTeX entry and format them according to a customizable set of rules (see below for a full list of features).
Most features of doi2bibtex are availabe in other tools.
For example, you can chain together doi2bib with bibtool or bibtex-tidy and recover most of the functionality in this package (and some of these tools are actually used under the hood).
If you use a reference manager like zotero or Mendeley, you can also resolve papers based on an identifier and later export entries to a .bib
file.
The motivation for doi2bibtex was rather personal and came from two facts: 1. I have a rather strong opinion on how I want my .bib
files to look like, and 2. I work on the intersection of astrophysics and machine learning, meaning that I often need the NASA/ADS bibcodes for the adsurl
field, but I can’t solely rely on ADS to retrieve BibTeX entries because I also frequently cite papers that are not indexed by ADS.
At some point, I got tired of the ever-growing mess of shell scripts and bash commands that I used to achieve this, and decided to re-write as a single package that would be easier to maintain and extend.
Follow these instructions to get started with doi2bibtex
:
You can simply pip
-install the package using:
pip install doi2bibtex
Alternatively, you can also clone the repository and install the package locally:
git clone https://github.com/timothygebhard/doi2bibtex.git
cd doi2bibtex
pip install .
Note
If you do not want to use ADS, you can disable this feature (which is enabled by default) by setting resolve_adsurl: false
in your ~/.doi2bibtex/config.yaml
file.
If you want to use the ads
backend to resolve the adsurl
field, you need to create an ADS account (if you do not alreay have one) and set up an API token to be able to query ADS. You can actually do this in two different ways:
-
Set the environment variable
ADS_TOKEN
to your API key:export ADS_TOKEN="your-token";
Ideally, you should add this line to your
.bashrc
or.zshrc
file. -
Create a file
~/.doi2bibtex/ads_token
and put your API key in there.
Once installed, using the package is as simple as running the d2b
command in your terminal:
d2b <doi-or-arxiv_id>
You can also add the --plain
flag to output only the BibTeX entry without any fancy formatting. This can be useful if you, for example, want to pipe the output of the d2b
command to another program.
A lot of the features of doi2bibtex can be configured via a ~/.doi2bibtex/config.yaml
file. Here is an overview of all the supported options (with the default values):
abbreviate_journal_names: true # Convert journal names to LaTeX macros (e.g., "\apj" instead of "The Astrophysical Journal")
citekey_delimiter: '_' # Delimiter between the author name and the year of publication
convert_latex_chars: true # Convert LaTeX-encoded characters in author names to Unicode
convert_month_to_number: true # Convert month names to numbers (e.g., "1" instead of "jan")
crossmatch_with_dblp: false # [EXPERIMENTAL] Try to crossmatch the paper with DBLP to add venue information to `addendum` (for ML conferences papers)
fix_arxiv_entrytype: true # Convert arXiv entries to `@article`, set `journal` to "arXiv preprints", and drop the `eprinttype` field
format_author_names: true # Convert author names to the "{Lastname}, Firstname" format
generate_citekey: true # Create a citekey based on the first author and year of publication
limit_authors: 1000 # Limit the number of authors in the BibTeX entry
pygments_theme: 'dracula' # Pygments theme used for syntax highlighting in the terminal
remove_fields: # Remove undesired fields (e.g., keywords) from the BibTeX entry
all: ['abstract'] # Remove the `abstract` from all entries, regardless of entrytype
article: ['publisher'] # Remove the `publisher` field from @article entries
remove_url_if_doi: true # Remove the `url` field if it is redundant with the `doi` field
resolve_adsurl: true # Query ADS to resolve the `adsurl` field, requires API token
update_arxiv_if_doi: true # Update arXiv entries with DOI information, if available ("related DOI")
Besides the eponymous ability of resolving DOIs (and other identifiers) to BibTeX entries, this package offers a lot more features for post-processing the entries. Here are some highlights:
- Automatically resolve the
adsurl
field required by some astrophysics journals (requires an API token for ADS) - Cross-match entries (in particular: arXiv preprints) with dblp.org to retrieve the venue information for conference papers from machine learning (e.g., "ICLR 2021"). Note: This feature is still experimental because querying dblp is somewhat fickle.
- Convert LaTeX-encoded characters in author names to Unicode, for example,
Müller
instead ofM{\"u}ller
- Author names can automatically be converted to the
{Lastname}, Firstname
format - You can limit the number of authors in the BibTeX entry
- Create a
citekey
based on the first author and year of publication. The author name is automatically made ASCII-compatible: for example,Đà Nẵng et al. (2023)
becomesDaNang_2023
. - Journal names can automatically be abbreviated according to the common LaTeX macros (e.g.,
\apj
instead ofThe Astrophysical Journal
) - Undesired fields (e.g,
keywords
) can be removed from the BibTeX entry (customizable for eachentrytype
— e.g., remove thepublisher
for articles, but keep it for books) - Easy to extend / modify: Feel free to fork this repository and adjust things to your own needs!
Contributions in the form of pull requests are always welcome! Otherwise, you can of course also help the development by creating issues for bugs that you have encountered, or for new features that you would like to see implemented.
This project is published under a BSD 3-Clause license; see the LICENSE file for details.