dbparser is an rOpenSci peer-reviewed R
package that parses and integrates major pharmacological databases into
standardized, analysis-ready R objects called dvobjects (drugverse
objects).
Pharmacological databases use incompatible formats and structures,
forcing researchers to write custom parsing scripts — a process that
consumes 60–80% of analysis time. dbparser eliminates this bottleneck
with unified parsing functions, chainable merge operations, and a
consistent output structure that enables reproducible, cross-database
analyses.
With recent updates, dbparser has evolved into an integration
engine, allowing you to merge mechanistic data (DrugBank) with
real-world phenotypic data (OnSIDES) and drug-drug interaction risks
(TWOSIDES).
# From CRAN (stable)
install.packages("dbparser")
# From GitHub (development)
# install.packages("pak")
pak::pak("ropensci/dbparser")DrugBank is a comprehensive database containing detailed drug, pharmacological, and target information. As both a bioinformatics and a cheminformatics resource, DrugBank combines detailed drug data (chemical, pharmacological, pharmaceutical) with comprehensive drug target information (sequence, structure, pathway). More information can be found here.
- Parser:
parseDrugBank() - Input: Full XML database (download — requires free account, may take a couple of days)
- Tested versions: 5.1.0 through 5.1.12
- Alternative: Use dbdataset for pre-parsed data without downloading the XML (GitHub only, exceeds CRAN size limit)
- Tutorial: DrugBank Parsing Vignette
If you find errors with any DrugBank version, please submit an issue here.
OnSIDES provides adverse drug events extracted from thousands of FDA drug labels using machine learning.
- Parser:
parseOnSIDES() - Input: Directory containing OnSIDES CSV files
TWOSIDES provides data on adverse events arising when two drugs are taken together.
- Parser:
parseTWOSIDES() - Input:
TWOSIDES.csv.gzfile
library(dbparser)
# Parse DrugBank
drugbank_db <- parseDrugBank("data/drugbank.xml")
# Parse OnSIDES
onsides_db <- parseOnSIDES("data/onsides/")
# Parse TWOSIDES
twosides_db <- parseTWOSIDES("data/TWOSIDES.csv.gz")The power of dbparser lies in its ability to chain parsers and mergers
together. Here is how you can build a complete pharmacovigilance
dataset:
library(dbparser)
library(dplyr)
# 1. Parse the raw databases
drugbank_db <- parseDrugBank("data/drugbank.xml")
onsides_db <- parseOnSIDES("data/onsides/")
twosides_db <- parseTWOSIDES("data/TWOSIDES.csv.gz")
# 2. Build the Integrated Knowledge Graph
# DrugBank serves as the hub. Chain the merges.
final_db <- drugbank_db %>%
merge_drugbank_onsides(onsides_db) %>%
merge_drugbank_twosides(twosides_db)
# 3. Analyze Results
head(final_db$integrated_data$drug_drug_interactions)For a detailed case study, see the Integrated Pharmacovigilance Vignette.
dvobject is a unified, compressed format for pharmacological data — an
R list object that preserves complex relational hierarchies while
enabling consistent access patterns.
For a single database (e.g., DrugBank):
- drugs: list of data frames containing drug information (synonyms, classifications, etc.) — the only mandatory component
- salts: data frame of drug salt information
- products: data frame of commercially available drug products worldwide
- references: data frame of articles, links, and textbooks about drugs or CETT data
- cett: list of data frames containing targets, enzymes, carriers, and transporters information
For a merged database (Integrated Pharmacovigilance):
When databases are merged using merge_drugbank_onsides or
merge_drugbank_twosides, the dvobject becomes a nested structure:
- drugbank: The mechanistic hub
- onsides: Side-effect data (from FDA labels)
- twosides: Drug-drug interaction data
- integrated_data: Enriched tables bridging databases (e.g., linking DrugBank IDs to OnSIDES adverse events)
- metadata: Detailed provenance for all contained datasets
dbparser has enabled 10+ peer-reviewed publications in leading
journals:
| Domain | Journal | Reference |
|---|---|---|
| Alzheimer’s Drug Repurposing | Nature Scientific Reports | Parolo et al. (2023) |
| COVID-19 Therapeutics | Pharmaceutics | Pérez-Moraga et al. (2021) |
| Pan-Cancer Biomarkers | Briefings in Bioinformatics | Mercatelli et al. (2022) |
| Pathway Modeling | Computer Methods and Programs in Biomedicine | Hammoud et al. (2025) |
| Clinical Trial Analysis | Frontiers in Pharmacology | Namiot et al. (2023) |
📊 50,000+ CRAN downloads | Featured in the CRAN Epidemiology Task View
For the full list, see our JOSS paper.
| Package | Description | Links |
|---|---|---|
| dbdataset | Pre-parsed DrugBank datasets ready for analysis | GitHub |
| covid19dbcand | COVID-19 drug candidate datasets | GitHub |
| periscope2 | Shiny framework for interactive dashboards | CRAN |
If you use dbparser in published research, please cite our JOSS paper:
Ali et al., (2026). dbparser: An R Package for Parsing and Integrating
Pharmacological Databases. Journal of Open Source Software, 11(118),
9950, https://doi.org/10.21105/joss.09950
citation("dbparser")If you find dbparser useful, consider ⭐ starring the GitHub
repository and sharing it with
colleagues.
For custom database integrations, enterprise support, training, or
deployment assistance — dbparser is maintained by Interstellar
Consultation Services.
We welcome contributions! Please review our Contributing Guide.
Please note that the dbparser project is released with a Contributor
Code of
Conduct. By
contributing to this project, you agree to abide by its terms.

