Panoply is a method to assess possible gene or pathway targets for a single sample given genomic information from DNA and RNA. We provide this vignette to demonstrate how to set up Drug-Gene Data for prioritizing drugs for cancer patients based on genomic data.
We curated a set of high-confidence cancer-related genes and used the curl command line interface to download drug-gene interactions for cancer drugs (anti-neoplastic) on the sets of genes. Our gene set was too large to do at once, so we had to do the command in smaller chunks and paste together. The example below shows how to do it for six well-known cancer genes, with a post-download step using python to convert to json format.
# http://dgidb.genome.wustl.edu/api curl
# http://dgidb.genome.wustl.edu/api/v1/interactions.json?drug_types=antineoplastic\&genes=TP53,HER2,ESR1,ATM,BRCA1,BRCA2
# | python -mjson.tool
Next we use the RJSONIO package to load the json file into R. We show a small example of the values for the drug-gene ineraction downloaded from DGI.
library(RJSONIO)
file1 <- "dgiAntiNeo1.json"
dgiAntiNeo1 <- fromJSON(paste(readLines(file1), collapse = ""))
# Gene Drug interactionType [1,] 'PRKCA' 'ELLAGIC ACID' 'inhibitor,competitive'
# [2,] 'PRKCA' 'BRYOSTATIN-1' 'n/a' [3,] 'PRKCA' 'SOPHORETIN' 'inhibitor' [4,]
# 'PRKCA' 'ENZASTAURIN' 'inhibitor' [5,] 'PRKCA' 'MIDOSTAURIN' 'inhibitor' [6,]
# 'PRKCA' 'AFFINITAC' 'antisense oligonucleotide' [7,] 'PRKCA' 'TAMOXIFEN' 'n/a'
# [8,] 'NOTCH1' 'RO4929097' 'inhibitor' [9,] 'NOTCH1' 'RO4929097' 'other/unknown'
We also included drug-gene targets from Drug Bank. The steps included a web download and converting gene ids into gene symbols. A snippet of this data appears as follows:
# Drug_name Drug_ID Target uniprot GeneID Afatinib DB08916 P00533; P04626;
# Q15303; P08183; Q9UNQ0 EGFR;ERBB2;ERBB4;ABCB1;ABCG2 Aflibercept DB08885 P15692;
# P49763; P49765 VEGFA;PGF;VEGFB Anastrozole DB01217 P11511; P05177; P11712;
# P08684 CYP19A1;CYP1A2;CYP2C9;CYP3A4 Azacitidine DB00928 P26358; P32320
# DNMT1;CDA
We show the steps needed to fix up both sources so they could be combined into one common data frame in R. First, fix column names and add the Source name for DGI. For Drug Bank, need to pull apart gene ids and expand the data.frame to one row per drug-gene pair.
## fix up dbi source for combining
dgidb$Source <- "DGIdb"
names(dgidb) <- gsub("interactionType", "type", names(dgidb))
## fix up drugbank for combining
dbank$DRUG <- casefold(dbank$Drug_name, upper = TRUE)
udrugs.dgi <- unique(c(dgidb$Drug, dbank$DRUG))
udrugs.dgi <- udrugs.dgi[!(grepl("\\[", udrugs.dgi) | grepl("\\{", udrugs.dgi) |
grepl("\\(", udrugs.dgi))]
glist <- strsplit(dbank$GeneID, split = ";")
dbankfix <- data.frame(Drug = NULL, Gene = NULL, type = NULL, Source = NULL)
for (k in 1:nrow(dbank)) {
if (length(glist[[k]]) > 0) {
dbankfix <- rbind.data.frame(dbankfix, data.frame(Drug = dbank$DRUG[k], Gene = glist[[k]],
type = "n/a", Source = dbank[k, "Annotation From"]))
}
}
drugdbPan <- rbind.data.frame(dgidb, dbankdf)
Using the pre-made dataset described above, drugdbPan, we
data(drugdbPan)
kable(head(drugdbPan, 20))
Gene | Drug | type | Source | |
---|---|---|---|---|
1 | PRKCA | ELLAGIC ACID | inhibitor,competitive | DGIdb |
2 | PRKCA | BRYOSTATIN-1 | n/a | DGIdb |
3 | PRKCA | SOPHORETIN | inhibitor | DGIdb |
4 | PRKCA | ENZASTAURIN | inhibitor | DGIdb |
5 | PRKCA | MIDOSTAURIN | inhibitor | DGIdb |
6 | PRKCA | AFFINITAC | antisense oligonucleotide | DGIdb |
7 | PRKCA | TAMOXIFEN | n/a | DGIdb |
8 | NOTCH1 | RO4929097 | inhibitor | DGIdb |
10 | APH1A | UNII-DRL23N424R | n/a | DGIdb |
11 | APH1B | UNII-DRL23N424R | n/a | DGIdb |
12 | MAPK11 | REGORAFENIB | inhibitor | DGIdb |
14 | MAPK14 | LY2228820 | n/a | DGIdb |
15 | BIRC3 | LCL161 | antagonist | DGIdb |
16 | BIRC3 | AT-406 | antagonist | DGIdb |
17 | BIRC2 | AT-406 | antagonist | DGIdb |
18 | BIRC2 | LCL161 | antagonist | DGIdb |
19 | BIRC2 | BIRINAPANT | n/a | DGIdb |
20 | NFKB1 | THALIDOMIDE | n/a | DGIdb |
21 | NFKB1 | BARDOXOLONE | n/a | DGIdb |
22 | NFKB1 | BORTEZOMIB | n/a | DGIdb |
annoDrugs <- annotateDrugs(drugdbPan)
drug.gs <- annoDrugs[[1]]
drug.adj <- annoDrugs[[2]]
hist(sapply(drug.gs, length), main = "Drug set length")
hist(rowSums(drug.adj), main = "Drug targets (genes) via adjacency")
hist(colSums(drug.adj), main = "Gene targets (from drugs) via adjacency")
Show the R session information.
sessionInfo()
R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS release 6.9 (Final)
Matrix products: default
BLAS: /usr/lib64/libblas.so.3.2.1
LAPACK: /usr/lib64/atlas/liblapack.so.3.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=C
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] grid parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] knitr_1.20 panoply_0.98 RColorBrewer_1.1-2 randomForest_4.6-12 Rgraphviz_2.22.0
[6] graph_1.56.0 BiocGenerics_0.24.0 circlize_0.4.2 gage_2.28.0 MASS_7.3-47
loaded via a namespace (and not attached):
[1] Rcpp_0.12.15 highr_0.6 formatR_1.5 pillar_1.1.0 compiler_3.4.2
[6] XVector_0.18.0 tools_3.4.2 zlibbioc_1.24.0 digest_0.6.12 bit_1.1-12
[11] evaluate_0.10.1 RSQLite_2.0 memoise_1.1.0 tibble_1.4.2 png_0.1-7
[16] rlang_0.1.6 DBI_0.8 httr_1.3.1 stringr_1.3.0 Biostrings_2.46.0
[21] S4Vectors_0.16.0 GlobalOptions_0.0.12 IRanges_2.12.0 stats4_3.4.2 bit64_0.9-7
[26] Biobase_2.38.0 R6_2.2.2 AnnotationDbi_1.40.0 blob_1.1.0 magrittr_1.5
[31] KEGGREST_1.18.0 shape_1.4.3 colorspace_1.3-2 stringi_1.1.7