PsychENCODE Integrative Analysis




Integrative Analysis

An integration of data across the capstone projects to build a model taking QTLs as inputs and providing both phenotype predictions as well as functional modules involved.

Integrative Analysis for each brain phenotype

  1. Integrative model parameters for all phenotypes:

  2. Multi-level functional enrichment analysis (DSPN-mod) and Weighted Gene Co-Expression Analysis (WGCNA) modules:

  3. Transcription Factor - Target Gene - Enhancer linkages: The Coding Region Transcription Start Sites (TSS) were obtained from the hg19-based GENCODE v19 annotation: TSS for individual transcripts csv4MB.

    1. Gene regulatory network 1 (GRN 1):

    2. Gene regulatory network 2 (GRN 2):

  4. HiC-derived Enhancer - Gene and Promoter linkages:

  5. HiC Chromatin loops associated only with the promoter regions:
  6. Schizophrenia-associated genes:

  7. Matlab code and formatted data for the DSPN:

  8. Figures of gene regulatory networks (GRNs) targeting cell type biomarker genes for each cell type considered:

    1. Excitatory Neurons: RCircos_GRN_Ex1 pdf2MB, RCircos_GRN_Ex2 pdf1MB, RCircos_GRN_Ex3e pdf7MB, RCircos_GRN_Ex4 pdf2MB, RCircos_GRN_Ex5b pdf1MB, RCircos_GRN_Ex6a pdf2MB, RCircos_GRN_Ex6b pdf1MB, RCircos_GRN_Ex8 pdf3MB, RCircos_GRN_ExS pdf5MB
    2. Inhibitory Neurons: RCircos_GRN_In1a pdf2MB, RCircos_GRN_In1b pdf1MB, RCircos_GRN_In1c pdf1MB, RCircos_GRN_In3 pdf2MB, RCircos_GRN_In4a pdf1MB, RCircos_GRN_In4b pdf2MB, RCircos_GRN_In6a pdf2MB, RCircos_GRN_In6b pdf2MB, RCircos_GRN_In7 pdf2MB, RCircos_GRN_In8 pdf1MB, RCircos_GRN_InS pdf6MB
    3. Non-neuronal cell types: RCircos_GRN_Astrocytes pdf4MB, RCircos_GRN_Endothelial pdf2MB, RCircos_GRN_Microglia pdf2MB, RCircos_GRN_Oligodendrocytes pdf5MB, RCircos_GRN_Oligodendrocyte_Progenitor_Cells_(OPCs) pdf2MB, RCircos_GRN_Pericytes pdf2MB




Derived Data Types

Gene expression matrix, enhancer lists, eQTL and cQTL maps, DEX genes, gene co-expression modules, PCA/RCA-based clustering of RNA-seq data and epigenetic data, decomposition and deconvolution of cell-type-specific RNA-seq.

Controls Only: Merged PsychENCODE and GTEx gene expression matrix for PFC

Gene expression matrix for 532 control samples from the PFC (normalized from original FPKM count matrix, filtered such that only genes with FPKM >= 0.1 in at least 10 samples):

PsychENCODE enhancer list and H3K27ac peaks

  1. PsychENCODE enhancer set for the PFC:

  2. H3K27ac peaks for the Prefrontal Cortex: DER-05_PFC_H3K27ac_peaks bed3MB

  3. H3K27ac peaks for the Temporal Cortex: DER-06_TC_H3K27ac_peaks bed3MB

  4. H3K27ac peaks for the Cerebellar Cortex: DER-07_CBC_H3K27ac_peaks bed2MB

QTL Maps

Expression QTLs (eQTLs), chromatin QTLs (cQTLs), isoform percentage QTLs (isoQTLs), transcript expression QTLs (tQTLs) and cell fraction QTLs (fQTLs) aligned to both hg19 and hg38 (converted from hg19 using USCS liftOver; some QTLs failed to be lifted over).

  1. SNP information for all QTLs considered, including rsIDs (if available), location, and reference and alternate alleles: SNP Information Table with Alleles txt208MB

  2. List of eQTLs:

    1. Full set of cis-eQTLs with no p-value or FDR filtering: Full_hg19_cis-eQTL txt.gz3GB

    2. Set with FDR<0.05 and a filter requiring genes to have an expression > 0.1 FPKM in at least 10 samples: DER-08a_hg19_eQTL.significant txt359MBRefSet and DER-08a_hg38_eQTL.significant txt359MB

    3. Set with Bonferroni-adjusted FDR < 0.05: DER-08b_hg19_eQTL.bonferroni txt87MB and DER-08b_hg38_eQTL.bonferroni txt87MB

    4. Set with FDR<0.05 and a filter requiring genes to have an expression > 0.1 FPKM in at least 150 samples: DER-08c_hg19_eQTL.FPKM01_min150 txt326MB and DER-08c_hg38_eQTL.FPKM01_min150 txt326MB

    5. Set with FDR<0.05 and a filter requiring genes to have an expression > 1 FPKM in at least 20% of the samples: DER-08d_hg19_eQTL.FPKM1_20per txt255MB and DER-08d_hg38_eQTL.FPKM1_20per txt255MB

    6. Cross-Disorder Analysis Set used for summary-data mendelian randomization (SMR) including 100 hidden covariate (HCP) factors: DER-08e_hg19_eQTL_HCP100_forSMR txt520MB and in binary format (.besd, .epi, .esi files zipped together) DER-08e_hg19_eQTL_HCP100_forSMR_binary_format zip76MB

  3. List of cQTLs: DER-09_hg19_cQTL.significant txt328KBRefSet and DER-09_hg38_cQTL.significant txt328KB

  4. List of isoQTLs:

    1. Core isoQTL set with FDR<0.001: DER-10a_hg19_isoQTL.significant txt373MBRefSet and DER-10a_hg38_isoQTL.significant txt373MB

    2. Filtered isoQTL set with FDR<0.001 and a filter requiring genes to have an expression > 5 FPKM in all samples: DER-10b_hg19_isoQTL.FPKM5.all txt76MB and DER-10b_hg38_isoQTL.FPKM5.all txt76MB

    3. Core tQTL set with FDR<0.001: DER-10c_hg19_tQTL.all txt327MB and DER-10c_hg38_tQTL.all txt327MB

    4. Filtered tQTL set with FDR<0.001 and a filter requiring genes to have an expression > 5 FPKM in all samples: DER-10d_hg19_tQTL.FPKM5.all txt94MB and DER-10d_hg38_tQTL.FPKM5.all txt94MB

  5. List of fQTLs: DER-11_hg19_fQTL.significant txt244KBRefSet and DER-11_hg38_fQTL.significant txt244KB

  6. List of multiQTLs (QTLs overlapping between two of the categories out of eQTLS, cQTLs and fQTLs): DER-12_hg19_multiQTL.list txt4KB

Differentially Expressed (DEX) and Spliced Genes/Transcripts and Gene/Isoform Co-expression modules

This resource provides sets of genes that exhibit significantly different expression levels between different groups of samples.

  1. Cross-Disorder DEX Genes and Transcripts, and Differentially Spliced Genes of PsychENCODE samples (from Cross-Disorder Analysis Paper):

  2. Gene and Isoform Co-Expression Modules calculated using Weighted Gene Co-Expression Analysis (WGCNA) on the PEC RNA-seq samples (from the Cross-Disorder Analysis; included as supplementary table S5 in Gandal et al 2018; see Cross-disorder_README for details on annotations):

Cross-Disorder Analysis TWAS weights

This resource provides the weights associated with the Transcriptome-wide Association Study (TWAS) conducted as part of the Cross-Disorder Analysis: PEC_TWAS_weights txt.tar.gz896MB

Bulk RNA-seq Decomposition and Deconvolution with Single-cell Data

  1. Brain Cell-type Marker Genes and Single-cell Expression Data (in units of TPM), from PEC (Developmental), Darmanis et al. 2015 and Lake et al. 2016

  2. Brain Cell-type Marker Genes and Single-cell Expression Data (in units of UMI), from PEC (Adult) and Lake et al. 2018

  3. External references: Darmanis et al. 2015, Proc. Nat. Acad. Sci. U.S.A. 112(23), Pgs. 7285-90; Lake et al. 2016, Science 352(6293), Pgs. 1586-90; Lake et al. 2018, Nat. Biotechnol. 36(1), Pgs. 70-80

  4. Cell Fractions Derived from Deconvolution:

  5. Decomposition through Non-negative Matrix Factorization (NMF):




Pipeline-Processing Results

RNA-seq quantifications, ChIP-seq signals and peaks, Brain Transcriptionally Active Regions (TARs), Imputed Genotypes (secured),and Phenotypes.

Access to all files tagged as "controlled" is login-secured. The raw data used in these publications are available to the research community as described under Access Instructions.



Raw Data

Alignment files for the various experiments, chip arrays for the SNP genotyping assays and phenotype metadata for the different studies under the consortium; external links are provided for the data sources on Synapse, and the GTEx consortium and Roadmap Epigenomics Consortium web portals.

Access to all files tagged as "controlled" is login-secured. The raw data used in these publications are available to the research community as described under Access Instructions.

List of all datasets used in the integrative analysis

List of datasets including ssource study, disease status of samples, source tissue(s), downstream analyses conducted using the data and the number of datasets: RAW-01_PEC_Table_of_Datasets xlsx28KB

Genotypes

The integrative analyses, such as the QTL calculations, included samples external to the PsychENCODE consortium and it is therefore not appropriate to include the full genotype set on Synapse under the PsychENCODE project. We avoid creating a merged genotype file of the remaining samples to avoid any confusion, as the results of this partial file would not be the same as the full sets of results herein. Instead, please see Genotype Sample ID map for a list of genotyping sample IDs for all individuals used in this analysis.

  1. Integrative Analysis and Cross-Disorder Analysis Genotype vcf files (link to Synapse):

  2. Metadata for imputed genotypes (link to Synapse):

Phenotypes

In the following we provide two links: one is for the full set of clinical metadata, available to those with full access permissions granted by Synapse; the other is a publicly accessible, limited set of clinical metadata.

  1. Full demographic and clinical information on individuals included in this analysis (link to Synapse):

  2. Publicly accessible, limited demographic and clinical information on individuals included in this analysis (link to Synapse):

Developmental Analysis Data

FASTA file for RNA-seq spike-in sequences:

*All datasets are aligned to the reference genome hg19 and gencode v19, unless mentioned otherwise.
The "RefSet" tag marks datasets that were used in the primary analyses of the manuscript.

Return to top

Citation

For Cross-disorder data, including DEX and WGCNA modules, please cite: Gandal et al, Science 2018, Vol. 362, Issue 6420, eaat8127.

For all other data please cite: Wang et al, Science 2018, Vol. 362, Issue 6420, eaat8464.

Return to top