Bulk download of PsychENCODE (PEC) "Integrative Analysis" and "Derived Data Types" files: PEC Datasets tgz
Interactive elements: PEC Interactive Portal
Further details on data generation and analysis*: Integrative Flagship Paper and Supplement; Cross-Disorder Flagship Paper
For citation purposes: Citation of data sources
An integration of data across the capstone projects to build a model taking QTLs as inputs and providing both phenotype predictions as well as functional modules involved.
Integrative model parameters for all phenotypes:
Logistic Regression-Genotype:
Logistic Regression-Transcriptome:
Conditional Restricted Boltzmann Machine:
Deep Structured Phenotype Network with Imputation:
Modular Deep Structured Phenotype Network (uses WGCNA modules shared in this section):
Full Deep Structured Phenotype Network:
Multi-level functional enrichment analysis (DSPN-mod) and Weighted Gene Co-Expression Analysis (WGCNA) modules:
Pathway and cell-type enrichment scores: INT-07_DSPN_prioritized_module_enrichmentsxlsx120KB
WGCNA modules (Ensembl IDs): INT-08_WGCNA_modules_ensembl_ids xlsx17MB
WGCNA modules (HGNC IDs): INT-09_WGCNA_modules_hgnc_ids xlsx14MB
Transcription Factor - Target Gene - Enhancer linkages: The Coding Region Transcription Start Sites (TSS) were obtained from the hg19-based GENCODE v19 annotation: TSS for individual transcripts csv4MB.
Gene regulatory network 1 (GRN 1):
Gene regulatory network 2 (GRN 2):
HiC-derived Enhancer - Gene and Promoter linkages:
Schizophrenia-associated genes:
Matlab code and formatted data for the DSPN:
Figures of gene regulatory networks (GRNs) targeting cell type biomarker genes for each cell type considered:
Gene expression matrix, enhancer lists, eQTL and cQTL maps, DEX genes, gene co-expression modules, PCA/RCA-based clustering of RNA-seq data and epigenetic data, decomposition and deconvolution of cell-type-specific RNA-seq.
Gene expression matrix for the PFC (normalized from original FPKM count matrix):
Gene expression matrix for the PFC (in TPM):
Gene expression matrix for 532 control samples from the PFC (normalized from original FPKM count matrix, filtered such that only genes with FPKM >= 0.1 in at least 10 samples):
PsychENCODE enhancer set for the PFC:
H3K27ac peaks for the Prefrontal Cortex: DER-05_PFC_H3K27ac_peaks bed3MB
H3K27ac peaks for the Temporal Cortex: DER-06_TC_H3K27ac_peaks bed3MB
H3K27ac peaks for the Cerebellar Cortex: DER-07_CBC_H3K27ac_peaks bed2MB
Expression QTLs (eQTLs), chromatin QTLs (cQTLs), isoform percentage QTLs (isoQTLs), transcript expression QTLs (tQTLs) and cell fraction QTLs (fQTLs) aligned to both hg19 and hg38 (converted from hg19 using USCS liftOver; some QTLs failed to be lifted over).
SNP information for all QTLs considered, including rsIDs (if available), location, and reference and alternate alleles: SNP Information Table with Alleles txt208MB
List of eQTLs:
Full set of cis-eQTLs with no p-value or FDR filtering: Full_hg19_cis-eQTL txt.gz3GB
Set with FDR<0.05 and a filter requiring genes to have an expression > 0.1 FPKM in at least 10 samples: DER-08a_hg19_eQTL.significant txt359MBRefSet and DER-08a_hg38_eQTL.significant txt359MB
Set with Bonferroni-adjusted FDR < 0.05: DER-08b_hg19_eQTL.bonferroni txt87MB and DER-08b_hg38_eQTL.bonferroni txt87MB
Set with FDR<0.05 and a filter requiring genes to have an expression > 0.1 FPKM in at least 150 samples: DER-08c_hg19_eQTL.FPKM01_min150 txt326MB and DER-08c_hg38_eQTL.FPKM01_min150 txt326MB
Set with FDR<0.05 and a filter requiring genes to have an expression > 1 FPKM in at least 20% of the samples: DER-08d_hg19_eQTL.FPKM1_20per txt255MB and DER-08d_hg38_eQTL.FPKM1_20per txt255MB
Cross-Disorder Analysis Set used for summary-data mendelian randomization (SMR) including 100 hidden covariate (HCP) factors: DER-08e_hg19_eQTL_HCP100_forSMR txt520MB and in binary format (.besd, .epi, .esi files zipped together) DER-08e_hg19_eQTL_HCP100_forSMR_binary_format zip76MB
List of cQTLs: DER-09_hg19_cQTL.significant txt328KBRefSet and DER-09_hg38_cQTL.significant txt328KB
List of isoQTLs:
Core isoQTL set with FDR<0.001: DER-10a_hg19_isoQTL.significant txt373MBRefSet and DER-10a_hg38_isoQTL.significant txt373MB
Filtered isoQTL set with FDR<0.001 and a filter requiring genes to have an expression > 5 FPKM in all samples: DER-10b_hg19_isoQTL.FPKM5.all txt76MB and DER-10b_hg38_isoQTL.FPKM5.all txt76MB
List of fQTLs: DER-11_hg19_fQTL.significant txt244KBRefSet and DER-11_hg38_fQTL.significant txt244KB
List of multiQTLs (QTLs overlapping between two of the categories out of eQTLS, cQTLs and fQTLs): DER-12_hg19_multiQTL.list txt4KB
This resource provides sets of genes that exhibit significantly different expression levels between different groups of samples.
Cross-Disorder DEX Genes and Transcripts, and Differentially Spliced Genes of PsychENCODE samples (from Cross-Disorder Analysis Paper):
Gene and Isoform Co-Expression Modules calculated using Weighted Gene Co-Expression Analysis (WGCNA) on the PEC RNA-seq samples (from the Cross-Disorder Analysis; included as supplementary table S5 in Gandal et al 2018; see Cross-disorder_README for details on annotations):
This resource provides the weights associated with the Transcriptome-wide Association Study (TWAS) conducted as part of the Cross-Disorder Analysis: PEC_TWAS_weights txt.tar.gz896MB
Brain Cell-type Marker Genes and Single-cell Expression Data (in units of TPM), from PEC (Developmental), Darmanis et al. 2015 and Lake et al. 2016
Marker genes merged from Darmanis 2015 and Lake 2016 sources: DER-19_Single_cell_markergenes_TPM xlsx44KB
Processed single-cell expression data merged from all three sources: DER-20_Single_cell_expression_processed_TPM tsv275MB
Brain Cell-type Marker Genes and Single-cell Expression Data (in units of UMI), from PEC (Adult) and Lake et al. 2018
Marker genes merged from both sources: DER-21_Single_cell_markergenes_UMI xlsx112KB
Raw single-cell expression data merged from both sources: DER-22_Single_cell_expression_raw_UMI tsv899MB
External references: Darmanis et al. 2015, Proc. Nat. Acad. Sci. U.S.A. 112(23), Pgs. 7285-90; Lake et al. 2016, Science 352(6293), Pgs. 1586-90; Lake et al. 2018, Nat. Biotechnol. 36(1), Pgs. 70-80
Cell Fractions Derived from Deconvolution:
Raw Fractions: DER-23_Cell_fractions_Raw xlsx388KB
Normalized Fractions: DER-24_Cell_fractions_Normalized xlsx388KB
Decomposition through Non-negative Matrix Factorization (NMF):
NMF Components: DER-25_NMF_comp xlsx716KB
NMF Fractions/Coefficients: DER-26_NMF_coef xlsx216KB
RNA-seq quantifications, ChIP-seq signals and peaks, Brain Transcriptionally Active Regions (TARs), Imputed Genotypes (secured),and Phenotypes.
Access to all files tagged as "controlled" is login-secured. The raw data used in these publications are available to the research community as described under Access Instructions.Signal tracks (.bigwig) and peak (.bedgraph) files calculated using PsychENCODE pipeline, available on Synapse:
Signal tracks (.bigwig) and peak (.bed, .narrowPeak, .gappedPeak, and .broadPeak) files calculated using PsychENCODE pipeline, available on Synapse:
Hi-C matrix for DLPFC at 10kb resolution: PIP-01_DLPFC.10kb txt.tar.gz6GB
Hi-C matrix for DLPFC at 40kb resolution: PIP-02_DLPFC.40kb txt.tar.gz6GB
Alignment files for the various experiments, chip arrays for the SNP genotyping assays and phenotype metadata for the different studies under the consortium; external links are provided for the data sources on Synapse, and the GTEx consortium and Roadmap Epigenomics Consortium web portals.
Access to all files tagged as "controlled" is login-secured. The raw data used in these publications are available to the research community as described under Access Instructions.List of datasets including ssource study, disease status of samples, source tissue(s), downstream analyses conducted using the data and the number of datasets: RAW-01_PEC_Table_of_Datasets xlsx28KB
Master table that includes mappings between matched RNA-seq, ChIP-seq and Genotype files (link to Synapse): RAW-02_Assay_Cross_Reference and Links from Individuals to Studies
PsychENCODE Bulk RNA-seq alignment (.bam) files (link to Synapse):
GTEx Bulk RNA-seq alignment (.bam) files (link to GTEX consortium web portal):
PsychENCODE Single-cell RNA-seq (.fastq) files (link to Synapse):
PsychENCODE Bulk ChIP-seq alignment (.bam) files (link to Synapse):
Roadmap Epigenomics Consortium (REMC) Bulk ChIP-seq alignment (.tagAlign) files (link to the REMC web portal):
The integrative analyses, such as the QTL calculations, included samples external to the PsychENCODE consortium and it is therefore not appropriate to include the full genotype set on Synapse under the PsychENCODE project. We avoid creating a merged genotype file of the remaining samples to avoid any confusion, as the results of this partial file would not be the same as the full sets of results herein. Instead, please see Genotype Sample ID map for a list of genotyping sample IDs for all individuals used in this analysis.
Integrative Analysis and Cross-Disorder Analysis Genotype vcf files (link to Synapse):
Metadata for imputed genotypes (link to Synapse):
In the following we provide two links: one is for the full set of clinical metadata, available to those with full access permissions granted by Synapse; the other is a publicly accessible, limited set of clinical metadata.
Full demographic and clinical information on individuals included in this analysis (link to Synapse):
Publicly accessible, limited demographic and clinical information on individuals included in this analysis (link to Synapse):
RNA-seq and SNP array files used in the Cross-Disorder Analysis (link to Synapse):
FASTA file for RNA-seq spike-in sequences:
*All datasets are aligned to the reference genome hg19 and gencode v19, unless mentioned otherwise.
The "RefSet" tag marks datasets that were used in the primary analyses of the manuscript.
For Cross-disorder data, including DEX and WGCNA modules, please cite: Gandal et al, Science 2018, Vol. 362, Issue 6420, eaat8127.
For all other data please cite: Wang et al, Science 2018, Vol. 362, Issue 6420, eaat8464.
Return to top