PsychENCODE Integrative Analysis




Integrative Analysis

An integration of data across the capstone projects to build a model taking QTLs as inputs and providing both phenotype predictions as well as functional modules involved.

Integrative Analysis for each brain phenotype

  1. Integrative model parameters for all phenotypes:

  2. Multi-level functional enrichment analysis (DSPN-mod) and Weighted Gene Co-Expression Analysis (WGCNA) modules:

  3. Transcription Factor - Target Gene - Enhancer linkages:

    1. Gene regulatory network 1 (GRN 1):

    2. Gene regulatory network 2 (GRN 2):

  4. HiC-derived Enhancer - Gene linkages:

  5. Schizophrenia-associated genes:

  6. Matlab code and formatted data for the DSPN:

  7. Figures of gene regulatory networks (GRNs) for each cell type considered:

    1. Excitatory Neurons: RCircos_GRN_Ex1 pdf2MB, RCircos_GRN_Ex2 pdf1MB, RCircos_GRN_Ex3e pdf7MB, RCircos_GRN_Ex4 pdf2MB, RCircos_GRN_Ex5b pdf1MB, RCircos_GRN_Ex6a pdf2MB, RCircos_GRN_Ex6b pdf1MB, RCircos_GRN_Ex8 pdf3MB, RCircos_GRN_ExS pdf5MB
    2. Inhibitory Neurons: RCircos_GRN_In1a pdf2MB, RCircos_GRN_In1b pdf1MB, RCircos_GRN_In1c pdf1MB, RCircos_GRN_In3 pdf2MB, RCircos_GRN_In4a pdf1MB, RCircos_GRN_In4b pdf2MB, RCircos_GRN_In6a pdf2MB, RCircos_GRN_In6b pdf2MB, RCircos_GRN_In7 pdf2MB, RCircos_GRN_In8 pdf1MB, RCircos_GRN_InS pdf6MB
    3. Non-neuronal cell types: RCircos_GRN_Astrocytes pdf4MB, RCircos_GRN_Endothelial pdf2MB, RCircos_GRN_Microglia pdf2MB, RCircos_GRN_Oligodendrocytes pdf5MB, RCircos_GRN_Oligodendrocyte_Progenitor_Cells_(OPCs) pdf2MB, RCircos_GRN_Pericytes pdf2MB




Derived Data Types

Gene expression matrix, enhancer lists, eQTL and cQTL maps, DEX genes, gene co-expression modules, PCA/RCA-based clustering of RNA-seq data and epigenetic data, decomposition and deconvolution of cell-type-specific RNA-seq.

PsychENCODE enhancer list and H3K27ac peaks

  1. PsychENCODE enhancer set for the PFC:

  2. H3K27ac peaks for the Prefrontal Cortex: DER-05_PFC_H3K27ac_peaks bed3MB

  3. H3K27ac peaks for the Temporal Cortex: DER-06_TC_H3K27ac_peaks bed3MB

  4. H3K27ac peaks for the Cerebellar Cortex: DER-07_CBC_H3K27ac_peaks bed2MB

QTL Maps

Expression QTLs (eQTLs), chromatin QTLs (cQTLs), isoform percentage QTLs (isoQTLs), transcript expression QTLs (tQTLs) and cell fraction QTLs (fQTLs) aligned to both hg19 and hg38 (converted from hg19 using USCS liftOver; some QTLs failed to be lifted over).

  1. List of eQTLs:

    1. Set with FDR<0.05 and a filter requiring genes to have an expression > 0.1 FPKM in at least 10 samples: DER-08a_hg19_eQTL.significant txt359MBRefSet and DER-08a_hg38_eQTL.significant txt359MB

    2. Set with Bonferroni-adjusted FDR < 0.05: DER-08b_hg19_eQTL.bonferroni txt87MB and DER-08b_hg38_eQTL.bonferroni txt87MB

    3. Set with FDR<0.05 and a filter requiring genes to have an expression > 0.1 FPKM in at least 150 samples: DER-08c_hg19_eQTL.FPKM01_min150 txt326MB and DER-08c_hg38_eQTL.FPKM01_min150 txt326MB

    4. Set with FDR<0.05 and a filter requiring genes to have an expression > 1 FPKM in at least 20% of the samples: DER-08d_hg19_eQTL.FPKM1_20per txt255MB and DER-08d_hg38_eQTL.FPKM1_20per txt255MB

  2. List of cQTLs: DER-09_hg19_cQTL.significant txt328KBRefSet and DER-09_hg38_cQTL.significant txt328KB

  3. List of isoQTLs:

    1. Core isoQTL set with FDR<0.001: DER-10a_hg19_isoQTL.significant txt373MBRefSet and DER-10a_hg38_isoQTL.significant txt373MB

    2. Filtered isoQTL set with FDR<0.001 and a filter requiring genes to have an expression > 5 FPKM in all samples: DER-10b_hg19_isoQTL.FPKM5.all txt76MB and DER-10b_hg38_isoQTL.FPKM5.all txt76MB

    3. Core tQTL set with FDR<0.001: DER-10c_hg19_tQTL.all txt327MB and DER-10c_hg38_tQTL.all txt327MB

    4. Filtered tQTL set with FDR<0.001 and a filter requiring genes to have an expression > 5 FPKM in all samples: DER-10d_hg19_tQTL.FPKM5.all txt94MB and DER-10d_hg38_tQTL.FPKM5.all txt94MB

  4. List of fQTLs: DER-11_hg19_fQTL.significant txt244KBRefSet and DER-11_hg38_fQTL.significant txt244KB

  5. List of multiQTLs (QTLs overlapping between two of the categories out of eQTLS, cQTLs and fQTLs): DER-12_hg19_multiQTL.list txt4KB

Differentially Expressed (DEX) and Spliced Genes/Transcripts and Gene/Isoform Co-expression modules

This resource provides sets of genes that exhibit significantly different expression levels between different groups of samples.

  1. Cross-Disorder DEX Genes and Transcripts, and Differentially Spliced Genes of PsychENCODE samples (from Cross-Disorder Analysis Paper):

  2. Gene and Isoform Co-Expression Modules calculated using Weighted Gene Co-Expression Analysis (WGCNA) on the PEC RNA-seq samples (from the Cross-Disorder Analysis; included as supplementary table S5 in Gandal et al 2018; see Cross-disorder_README for details on annotations):

Bulk RNA-seq Decomposition and Deconvolution with Single-cell Data

  1. Brain Cell-type Marker Genes and Single-cell Expression Data (in units of TPM), from PEC (Developmental), Darmanis et al. 2015 and Lake et al. 2016

  2. Brain Cell-type Marker Genes and Single-cell Expression Data (in units of UMI), from PEC (Adult) and Lake et al. 2018

  3. External references: Darmanis et al. 2015, Proc. Nat. Acad. Sci. U.S.A. 112(23), Pgs. 7285-90; Lake et al. 2016, Science 352(6293), Pgs. 1586-90; Lake et al. 2018, Nat. Biotechnol. 36(1), Pgs. 70-80

  4. Cell Fractions Derived from Deconvolution:

  5. Decomposition through Non-negative Matrix Factorization (NMF):




Pipeline-Processing Results

RNA-seq quantifications, ChIP-seq signals and peaks, Brain Transcriptionally Active Regions (TARs), Imputed Genotypes (secured),and Phenotypes.

Access to all files tagged as "private" is login-secured. The raw data used in these publications are available to the research community as described under Access Instructions. Note that Synapse files will be made accessible in December 2018.



Raw Data

Alignment files for the various experiments, chip arrays for the SNP genotyping assays and phenotype metadata for the different studies under the consortium; external links are provided for the data sources on Synapse, and the GTEx consortium and Roadmap Epigenomics Consortium web portals.

Access to all files tagged as "private" is login-secured. The raw data used in these publications are available to the research community as described under Access Instructions. Note that Synapse files will be made accessible in December 2018.

List of all datasets used in the integrative analysis

List of datasets including ssource study, disease status of samples, source tissue(s), downstream analyses conducted using the data and the number of datasets: RAW-01_PEC_Table_of_Datasets xlsx28KB

Developmental Analysis Data

FASTA file for RNA-seq spike-in sequences:

*All datasets are aligned to the reference genome hg19 and gencode v19, unless mentioned otherwise.
The "RefSet" tag marks datasets that were used in the primary analyses of the manuscript.

Return to top