This repository contains the code needed to reproduce the following figures from the paper by Brown et al., "A Histone Methylation–MAPK Signaling Axis Drives Durable Epithelial–Mesenchymal Transition in Hypoxic Pancreatic Cancer" (https://doi.org/10.1158/0008-5472.CAN-22-2945):
- Figure 1
- Figure 4A-B
- Supplementary Figures S1, S2, S3, S4, and S10
- Supplementary Figure S5C
- Supplementary Figure S9A
- Supplementary Figure S13B
Codes were written using R 4.1.2 but should run without issues using R ≥ 3.6, but we do not make any guarantees here. Codes are provided as R notebooks, which can be run out-of-the-box using RStudio. R can be downloaded here, and RStudio can be downloaded here.
If you encounter issues running analyses with newly installed packages, the package versions originally used to run each analysis can be found in the sessionInfo.txt
file(s) present in the directory for each data set (e.g., Bulk omics analyses/CPTAC PDAC Discovery Study/sessionInfo_CPTAC_proteomics-phosphoproteomics.txt
).
CPTAC PDAC Discovery Study proteomic and transcriptomic data (Cao et al., 2021) were generated by the Clinical Proteomic Tumor Analysis Consortium (NCI/NIH). Clinical information and tumor histology data were obtained from supplemental data of the same publication, and the processed proteomics data were downloaded from the (LinkedOmics data portal)[https://www.linkedomics.org/login.php#dataSource]. CPTAC data are also available through the Proteomic Data Commons. To maximize the number of proteins retained for consensus clustering of the pcEMT signature, imputation was performed on the CPTAC global proteomics gene-level data for all proteins with non-missing values in at least 50% of samples using the DreamAI algorithm (https://github.com/WangLab-MSSM/DreamAI), which was designed specifically for proteomics data. DreamAI was used with default settings.
TCGA PAAD RNA-seq gene expression data (data set ID: TCGA.PAAD.sampleMap/HiSeqV2; version: 2017-10-13) were downloaded from the UCSC Xena Browser as log2(RSEM+1) normalized counts and were converted first to transcripts per million (TPM) and finally to log2(TPM+1) for all analyses unless otherwise noted. Curated TCGA PAAD phenotype and survival data were also downloaded from UCSC Xena. Only PDAC tumors were retained for analysis, as determined based on provided histology annotations (150 tumors).
The mouse RNA-seq data are available at the Gene Expression Omnibus under the accession number GSE129455. The human RNA-seq data are available at NCBI dbGaP under the accession number phs001840.v1.p1. Annotated and pre-processed PDAC scRNA-seq data from (Elyada et al.)[https://doi.org/10.1158/2159-8290.CD-19-009] were kindly provided by Dr. David Tuveson (Cold Spring Harbor Laboratory). Please contact that Tuveson Lab for obtaining the annotated data.
PDAC patient-derived xenograft (PDX) tumor RNA-seq data are available for download at --------.
CPTAC data should be placed in the directory Bulk omics analyses/CPTAC PDAC Discovery Study/CPTAC PDAC data
. A zipped folder with the data, CPTAC PDAC data.zip
, is provided in the directory Bulk omics analyses/CPTAC PDAC Discovery Study
.
TCGA data should be placed in the directory Bulk omics analyses/TCGA analysis/TCGA PAAD data
. A zipped folder with the data, TCGA PAAD data.zip
, is provided in the directory Bulk omics analyses/TCGA analysis
.
The scRNA-seq data should be placed in the directory Elyada et al scRNA-seq/scRNAseq data
.
PDX RNA-seq data should be placed in the directory Bulk omics analyses/PDX RNA-seq
.
There are three notebooks for running analyses on the CPTAC PDAC data. These notebooks are located under Bulk omics analyses/CPTAC PDAC Discovery Study
:
CPTAC-PDAC_analysis_final.Rmd
: Generates the results found in Figure 1A-D, Figure 4A, and Supplementary Figure S2.CPTAC-PDAC_analysis_RNAseq_final.Rmd
: Repeats most of the calculations inCPTAC-PDAC_analysis_final.Rmd
using CPTAC PDAC transcriptomic (RNA-seq) data.CPTAC-PDAC_imputations.Rmd
: Generates the imputed version of the proteomics data that is needed to run parts ofCPTAC-PDAC_analysis_final.Rmd
, particularly the code that is needed to generate the heatmap found in Figure 1A.
There is one notebook for running analyses on the TCGA PAAD data. This notebook is located under Bulk omics analyses/TCGA analysis
:
TCGA_PAAD_analysis_final_PDAC.Rmd
: Generates the results found in Supplementary Figure S3.
There are two notebooks for running analyses on the Elyada et al. scRNA-seq data. These notebooks are located under Elyada et al scRNA-seq/Code and analyses
:
hs-duct_analysis.Rmd
: Generates the results found in Figure 1E-G, Figure 4B, and Supplementary Figure S10.ms-duct_analysis.Rmd
: Generates the results found in Supplementary Figure S4 and Supplementary Figure S8A.
There is one notebook for running analyses on the PDX tumor data. This notebook is located under Bulk omics analyses/PDX RNA-seq
:
PDX_RNAseq_analysis.Rmd
: Generates the results found in Supplementary Figure S5C.
There is one notebook for running comparisons of the genes making up several gene sets of interest in this publication. This notebook is located under Gene signature comparisons
:
Gene_signature_overlap_comparisons.Rmd
: Generates the results found in Supplementary Figure S1.