Bulk download of PsychENCODE (PEC) "Integrative Analysis" and "Derived Data Types" files: PEC Datasets tgz
Interactive elements: PEC Interactive Portal
For citation purposes: Citation of data sources
An integration of data across the capstone projects to build a model taking QTLs as inputs and providing both phenotype predictions as well as functional modules involved.
Integrative model parameters for all phenotypes:
Conditional Restricted Boltzmann Machine:
Deep Structured Phenotype Network with Imputation:
Modular Deep Structured Phenotype Network (uses WGCNA modules shared in this section):
Full Deep Structured Phenotype Network:
Multi-level functional enrichment analysis (DSPN-mod) and Weighted Gene Co-Expression Analysis (WGCNA) modules:
Pathway and cell-type enrichment scores: INT-07_DSPN_prioritized_module_enrichmentsxlsx120KB
WGCNA modules (Ensembl IDs): INT-08_WGCNA_modules_ensembl_ids xlsx17MB
WGCNA modules (HGNC IDs): INT-09_WGCNA_modules_hgnc_ids xlsx14MB
Transcription Factor - Target Gene - Enhancer linkages: The Coding Region Transcription Start Sites (TSS) were obtained from the hg19-based GENCODE v19 annotation: TSS for individual transcripts csv4MB.
Gene regulatory network 1 (GRN 1):
Gene regulatory network 2 (GRN 2):
HiC-derived Enhancer - Gene and Promoter linkages:
Matlab code and formatted data for the DSPN:
Figures of gene regulatory networks (GRNs) targeting cell type biomarker genes for each cell type considered:
Gene expression matrix, enhancer lists, eQTL and cQTL maps, DEX genes, gene co-expression modules, PCA/RCA-based clustering of RNA-seq data and epigenetic data, decomposition and deconvolution of cell-type-specific RNA-seq.
Gene expression matrix for the PFC (normalized from original FPKM count matrix):
Gene expression matrix for the PFC (in TPM):
Gene expression matrix for 532 control samples from the PFC (normalized from original FPKM count matrix, filtered such that only genes with FPKM >= 0.1 in at least 10 samples):
PsychENCODE enhancer set for the PFC:
H3K27ac peaks for the Prefrontal Cortex: DER-05_PFC_H3K27ac_peaks bed3MB
H3K27ac peaks for the Temporal Cortex: DER-06_TC_H3K27ac_peaks bed3MB
H3K27ac peaks for the Cerebellar Cortex: DER-07_CBC_H3K27ac_peaks bed2MB
Expression QTLs (eQTLs), chromatin QTLs (cQTLs), isoform percentage QTLs (isoQTLs), transcript expression QTLs (tQTLs) and cell fraction QTLs (fQTLs) aligned to both hg19 and hg38 (converted from hg19 using USCS liftOver; some QTLs failed to be lifted over).
SNP information for all QTLs considered, including rsIDs (if available), location, and reference and alternate alleles: SNP Information Table with Alleles txt208MB
List of eQTLs:
Full set of cis-eQTLs with no p-value or FDR filtering: Full_hg19_cis-eQTL txt.gz3GB
Cross-Disorder Analysis Set used for summary-data mendelian randomization (SMR) including 100 hidden covariate (HCP) factors: DER-08e_hg19_eQTL_HCP100_forSMR txt520MB and in binary format (.besd, .epi, .esi files zipped together) DER-08e_hg19_eQTL_HCP100_forSMR_binary_format zip76MB
List of isoQTLs:
List of multiQTLs (QTLs overlapping between two of the categories out of eQTLS, cQTLs and fQTLs): DER-12_hg19_multiQTL.list txt4KB
This resource provides sets of genes that exhibit significantly different expression levels between different groups of samples.
Cross-Disorder DEX Genes and Transcripts, and Differentially Spliced Genes of PsychENCODE samples (from Cross-Disorder Analysis Paper):
Gene and Isoform Co-Expression Modules calculated using Weighted Gene Co-Expression Analysis (WGCNA) on the PEC RNA-seq samples (from the Cross-Disorder Analysis; included as supplementary table S5 in Gandal et al 2018; see Cross-disorder_README for details on annotations):
This resource provides the weights associated with the Transcriptome-wide Association Study (TWAS) conducted as part of the Cross-Disorder Analysis: PEC_TWAS_weights txt.tar.gz896MB
Brain Cell-type Marker Genes and Single-cell Expression Data (in units of TPM), from PEC (Developmental), Darmanis et al. 2015 and Lake et al. 2016
Marker genes merged from Darmanis 2015 and Lake 2016 sources: DER-19_Single_cell_markergenes_TPM xlsx44KB
Processed single-cell expression data merged from all three sources: DER-20_Single_cell_expression_processed_TPM tsv275MB
Brain Cell-type Marker Genes and Single-cell Expression Data (in units of UMI), from PEC (Adult) and Lake et al. 2018
External references: Darmanis et al. 2015, Proc. Nat. Acad. Sci. U.S.A. 112(23), Pgs. 7285-90; Lake et al. 2016, Science 352(6293), Pgs. 1586-90; Lake et al. 2018, Nat. Biotechnol. 36(1), Pgs. 70-80
Cell Fractions Derived from Deconvolution:
Decomposition through Non-negative Matrix Factorization (NMF):
RNA-seq quantifications, ChIP-seq signals and peaks, Brain Transcriptionally Active Regions (TARs), Imputed Genotypes (secured),and Phenotypes.Access to all files tagged as "controlled" is login-secured. The raw data used in these publications are available to the research community as described under Access Instructions.
Signal tracks (.bigwig) and peak (.bedgraph) files calculated using PsychENCODE pipeline, available on Synapse:
Signal tracks (.bigwig) and peak (.bed, .narrowPeak, .gappedPeak, and .broadPeak) files calculated using PsychENCODE pipeline, available on Synapse:
Alignment files for the various experiments, chip arrays for the SNP genotyping assays and phenotype metadata for the different studies under the consortium; external links are provided for the data sources on Synapse, and the GTEx consortium and Roadmap Epigenomics Consortium web portals.Access to all files tagged as "controlled" is login-secured. The raw data used in these publications are available to the research community as described under Access Instructions.
List of datasets including ssource study, disease status of samples, source tissue(s), downstream analyses conducted using the data and the number of datasets: RAW-01_PEC_Table_of_Datasets xlsx28KB
PsychENCODE Bulk RNA-seq alignment (.bam) files (link to Synapse):
GTEx Bulk RNA-seq alignment (.bam) files (link to GTEX consortium web portal):
PsychENCODE Single-cell RNA-seq (.fastq) files (link to Synapse):
PsychENCODE Bulk ChIP-seq alignment (.bam) files (link to Synapse):
Roadmap Epigenomics Consortium (REMC) Bulk ChIP-seq alignment (.tagAlign) files (link to the REMC web portal):
The integrative analyses, such as the QTL calculations, included samples external to the PsychENCODE consortium and it is therefore not appropriate to include the full genotype set on Synapse under the PsychENCODE project. We avoid creating a merged genotype file of the remaining samples to avoid any confusion, as the results of this partial file would not be the same as the full sets of results herein. Instead, please see Genotype Sample ID map for a list of genotyping sample IDs for all individuals used in this analysis.
Integrative Analysis and Cross-Disorder Analysis Genotype vcf files (link to Synapse):
Metadata for imputed genotypes (link to Synapse):
In the following we provide two links: one is for the full set of clinical metadata, available to those with full access permissions granted by Synapse; the other is a publicly accessible, limited set of clinical metadata.
Full demographic and clinical information on individuals included in this analysis (link to Synapse):
Publicly accessible, limited demographic and clinical information on individuals included in this analysis (link to Synapse):
RNA-seq and SNP array files used in the Cross-Disorder Analysis (link to Synapse):
FASTA file for RNA-seq spike-in sequences:
*All datasets are aligned to the reference genome hg19 and gencode v19, unless mentioned otherwise.
The "RefSet" tag marks datasets that were used in the primary analyses of the manuscript.
For Cross-disorder data, including DEX and WGCNA modules, please cite: Gandal et al, Science 2018, Vol. 362, Issue 6420, eaat8127.
For all other data please cite: Wang et al, Science 2018, Vol. 362, Issue 6420, eaat8464.Return to top