Sort by year:

Supervised enhancer prediction with epigenetic pattern recognition and targeted validation

Anurag Sethi*, Mengting Gu*, Emrah Gumusgoz, Landon Chan, Koon-Kiu Yan, Joel Rozowsky, Iros Barozzi, Veena Afzal, Jennifer A. Akiyama, Ingrid Plajzer-Frick, Chengfei Yan, Catherine S. Novak, Momoe Kato, Tyler H. Garvin, Quan Pham, Anne Harrington, Brandon J. Mannion, Elizabeth A. Lee, Yoko Fukuda-Yuzawa, Axel Visel, Diane E. Dickel, Kevin Y. Yip, Richard Sutton, Len A. Pennacchio & Mark Gerstein
Journal PapersNature Methods (2020)

Abstract

Enhancers are important non-coding elements, but they have traditionally been hard to characterize experimentally. The development of massively parallel assays allows the characterization of large numbers of enhancers for the first time. Here, we developed a framework using Drosophila STARR-seq to create shape-matching filters based on meta-profiles of epigenetic features. We integrated these features with supervised machine-learning algorithms to predict enhancers. We further demonstrated that our model could be transferred to predict enhancers in mammals. We comprehensively validated the predictions using a combination of in vivo and in vitro approaches, involving transgenic assays in mice and transduction-based reporter assays in human cell lines (153 enhancers in total). The results confirmed that our model can accurately predict enhancers in different species without re-parameterization. Finally, we examined the transcription factor binding patterns at predicted enhancers versus promoters. We demonstrated that these patterns enable the construction of a secondary model that effectively distinguishes enhancers and promoters.

An integrative ENCODE resource for cancer genomics

Jing Zhang*, Donghoon Lee*, Vineet Dhiman*, Peng Jiang*, Jie Xu*, Patrick McGillivray*, Hongbo Yang*, Jason Liu, William Meyerson, Declan Clarke, Mengting Gu, Shantao Li, Shaoke Lou, Jinrui Xu, Lucas Lochovsky, Matthew Ung, Lijia Ma, Shan Yu, Qin Cao, Arif Harmanci, Koon-Kiu Yan, Anurag Sethi, Gamze Gürsoy, Michael Rutenberg Schoenberg, Joel Rozowsky, Jonathan Warrell, Prashant Emani, Yucheng T. Yang, Timur Galeev, Xiangmeng Kong, Shuang Liu, Xiaotong Li, Jayanth Krishnan, Yanlin Feng, Juan Carlos Rivera-Mulia, Jessica Adrian, James R Broach, Michael Bolt, Jennifer Moran, Dominic Fitzgerald, Vishnu Dileep, Tingting Liu, Shenglin Mei, Takayo Sasaki, Claudia Trevilla-Garcia, Su Wang, Yanli Wang, Chongzhi Zang, Daifeng Wang, Robert J. Klein, Michael Snyder, David M. Gilbert, Kevin Yip, Chao Cheng, Feng Yue, X. Shirley Liu, Kevin P. White & Mark Gerstein
Journal PapersNature Communications (2020)

Abstract

ENCODE comprises thousands of functional genomics datasets, and the encyclopedia covers hundreds of cell types, providing a universal annotation for genome interpretation. However, for particular applications, it may be advantageous to use a customized annotation. Here, we develop such a custom annotation by leveraging advanced assays, such as eCLIP, Hi-C, and whole-genome STARR-seq on a number of data-rich ENCODE cell types. A key aspect of this annotation is comprehensive and experimentally derived networks of both transcription factors and RNA-binding proteins (TFs and RBPs). Cancer, a disease of system-wide dysregulation, is an ideal application for such a network-based annotation. Specifically, for cancer-associated cell types, we put regulators into hierarchies and measure their network change (rewiring) during oncogenesis. We also extensively survey TF-RBP crosstalk, highlighting how SUB1, a previously uncharacterized RBP, drives aberrant tumor expression and amplifies the effect of MYC, a well-known oncogenic TF. Furthermore, we show how our annotation allows us to place oncogenic transformations in the context of a broad cell space; here, many normal-to-tumor transitions move towards a stem-like state, while oncogene knockdowns show an opposing trend. Finally, we organize the resource into a coherent workflow to prioritize key elements and variants, in addition to regulators. We showcase the application of this prioritization to somatic burdening, cancer differential expression and GWAS. Targeted validations of the prioritized regulators, elements and variants using siRNA knockdowns, CRISPR-based editing, and luciferase assays demonstrate the value of the ENCODE resource.

Shaping the nebulous enhancer in the era of high-throughput assays and genome editing

Edwin Yu-Kiu Ho*, Qin Cao*, Mengting Gu, Ricky Wai-Lun Chan, Qiong Wu, Mark Gerstein, Kevin Y Yip
Journal PapersBriefings in Bioinformatics (2020)

Abstract

Since the 1st discovery of transcriptional enhancers in 1981, their textbook definition has remained largely unchanged in the past 37 years. With the emergence of high-throughput assays and genome editing, which are switching the paradigm from bottom-up discovery and testing of individual enhancers to top-down profiling of enhancer activities genome-wide, it has become increasingly evidenced that this classical definition has left substantial gray areas in different aspects. Here we survey a representative set of recent research articles and report the definitions of enhancers they have adopted. The results reveal that a wide spectrum of definitions is used usually without the definition stated explicitly, which could lead to difficulties in data interpretation and downstream analyses. Based on these findings, we discuss the practical implications and suggestions for future studies.

Genomics and data science: an application within an umbrella

Fábio CP Navarro, Hussein Mohsen, Chengfei Yan, Shantao Li, Mengting Gu, William Meyerson, Mark Gerstein
Journal PapersGenome Biology (2019)

Abstract

Data science allows the extraction of practical insights from large-scale data. Here, we contextualize it as an umbrella term, encompassing several disparate subdomains. We focus on how genomics fits as a specific application subdomain, in terms of well-known 3 V data and 4 M process frameworks (volume-velocity-variety and measurement-mining-modeling-manipulation, respectively). We further analyze the technical and cultural “exports” and “imports” between genomics and other data-science subdomains (e.g., astronomy). Finally, we discuss how data value, privacy, and ownership are pressing issues for data science applications, in general, and are especially relevant to genomics, due to the persistent nature of DNA.

MicroRNA-dependent regulation of biomechanical genes establishes tissue stiffness homeostasis

Albertomaria Moro*, Tristan P Driscoll*, Liana C Boraas, William Armero, Dionna M Kasper, Nicolas Baeyens, Charlene Jouy, Venkatesh Mallikarjun, Joe Swift, Sang Joon Ahn, Donghoon Lee, Jing Zhang, Mengting Gu, Mark Gerstein, Martin Schwartz, Stefania Nicoli
Journal PapersNature Cell Biology (2019)

Abstract

Vertebrate tissues exhibit mechanical homeostasis, showing stable stiffness and tension over time and recovery after changes in mechanical stress. However, the regulatory pathways that mediate these effects are unknown. A comprehensive identification of Argonaute 2-associated microRNAs and mRNAs in endothelial cells identified a network of 122 microRNA families that target 73 mRNAs encoding cytoskeletal, contractile, adhesive and extracellular matrix (CAM) proteins. The level of these microRNAs increased in cells plated on stiff versus soft substrates, consistent with homeostasis, and suppressed targets via microRNA recognition elements within the 3′ untranslated regions of CAM mRNAs. Inhibition of DROSHA or Argonaute 2, or disruption of microRNA recognition elements within individual target mRNAs, such as connective tissue growth factor, induced hyper-adhesive, hyper-contractile phenotypes in endothelial and fibroblast cells in vitro, and increased tissue stiffness, contractility and extracellular matrix deposition in the zebrafish fin fold in vivo. Thus, a network of microRNAs buffers CAM expression to mediate tissue mechanical homeostasis.

Comprehensive functional genomic resource and integrative model for the human brain

Daifeng Wang*, Shuang Liu*, Jonathan Warrell*, Hyejung Won*, Xu Shi*, Fabio CP Navarro*, Declan Clarke*, Mengting Gu*, Prashant Emani*, Yucheng T Yang, Min Xu, Michael J Gandal, Shaoke Lou, Jing Zhang, Jonathan J Park, Chengfei Yan, Suhn Kyong Rhie, Kasidet Manakongtreecheep, Holly Zhou, Aparna Nathan, Mette Peters, Eugenio Mattei, Dominic Fitzgerald, Tonya Brunetti, Jill Moore, Yan Jiang, Kiran Girdhar, Gabriel E Hoffman, Selim Kalayci, Zeynep H Gümüş, Gregory E Crawford, Panos Roussos, Schahram Akbarian, Andrew E Jaffe, Kevin P White, Zhiping Weng, Nenad Sestan, Daniel H Geschwind, James A Knowles, Mark B Gerstein, PsychENCODE Consortium
Journal PapersScience (2018)

Abstract

Despite progress in defining genetic risk for psychiatric disorders, their molecular mechanisms remain elusive. Addressing this, the PsychENCODE Consortium has generated a comprehensive online resource for the adult brain across 1866 individuals. The PsychENCODE resource contains ~79,000 brain-active enhancers, sets of Hi-C linkages, and topologically associating domains; single-cell expression profiles for many cell types; expression quantitative-trait loci (QTLs); and further QTLs associated with chromatin, splicing, and cell-type proportions. Integration shows that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). It also allows building of a gene regulatory network, linking genome-wide association study variants to genes (e.g., 321 for schizophrenia). We embed this network into an interpretable deep-learning model, which improves disease prediction by ~6-fold versus polygenic risk scores and identifies key genes and pathways in psychiatric disorders.

Network analysis as a grand unifier in biomedical data science

Patrick McGillivray, Declan Clarke, William Meyerson, Jing Zhang, Donghoon Lee, Mengting Gu, Sushant Kumar, Holly Zhou, Mark Gerstein
Journal PapersAnnual Review of Biomedical Data Science (2018)

Abstract

Biomedical data scientists study many types of networks, ranging from those formed by neurons to those created by molecular interactions. People often criticize these networks as uninterpretable diagrams termed hairballs; however, here we show that molecular biological networks can be interpreted in several straightforward ways. First, we can break down a network into smaller components, focusing on individual pathways and modules. Second, we can compute global statistics describing the network as a whole. Third, we can compare networks. These comparisons can be within the same context (e.g., between two gene regulatory networks) or cross-disciplinary (e.g., between regulatory networks and governmental hierarchies). The latter comparisons can transfer a formalism, such as that for Markov chains, from one context to another or relate our intuitions in a familiar setting (e.g., social networks) to the relatively unfamiliar molecular context. Finally, key aspects of molecular networks are dynamics and evolution, i.e., how they evolve over time and how genetic variants affect them. By studying the relationships between variants in networks, we can begin to interpret many common diseases, such as cancer and heart disease.

Dynamic RNA–protein interactions underlie the zebrafish maternal-to-zygotic transition

Vladimir Despic, Mario Dejung, Mengting Gu, Jayanth Krishnan, Jing Zhang, Lydia Herzel, Korinna Straube, Mark B Gerstein, Falk Butter, Karla M Neugebauer
Journal PapersGenome Research (2017)

Abstract

During the maternal-to-zygotic transition (MZT), transcriptionally silent embryos rely on post-transcriptional regulation of maternal mRNAs until zygotic genome activation (ZGA). RNA-binding proteins (RBPs) are important regulators of post-transcriptional RNA processing events, yet their identities and functions during developmental transitions in vertebrates remain largely unexplored. Using mRNA interactome capture, we identified 227 RBPs in zebrafish embryos before and during ZGA, hereby named the zebrafish MZT mRNA-bound proteome. This protein constellation consists of many conserved RBPs, some of which are potential stage-specific mRNA interactors that likely reflect the dynamics of RNA–protein interactions during MZT. The enrichment of numerous splicing factors like hnRNP proteins before ZGA was surprising, because maternal mRNAs were found to be fully spliced. To address potentially unique roles of these RBPs in embryogenesis, we focused on Hnrnpa1. iCLIP and subsequent mRNA reporter assays revealed a function for Hnrnpa1 in the regulation of poly(A) tail length and translation of maternal mRNAs through sequence-specific association with 3′ UTRs before ZGA. Comparison of iCLIP data from two developmental stages revealed that Hnrnpa1 dissociates from maternal mRNAs at ZGA and instead regulates the nuclear processing of pri-mir-430 transcripts, which we validated experimentally. The shift from cytoplasmic to nuclear RNA targets was accompanied by a dramatic translocation of Hnrnpa1 and other pre-mRNA splicing factors to the nucleus in a transcription-dependent manner. Thus, our study identifies global changes in RNA–protein interactions during vertebrate MZT and shows that Hnrnpa1 RNA-binding activities are spatially and temporally coordinated to regulate RNA metabolism during early development.

MicroRNAs Establish Uniform Traits during the Architecture of Vertebrate Embryos

Dionna M Kasper, Albertomaria Moro, Emma Ristori, Anand Narayanan, Guillermina Hill-Teran, Elizabeth Fleming, Miguel Moreno-Mateos, Charles E Vejnar, Jing Zhang, Donghoon Lee, Mengting Gu, Mark Gerstein, Antonio Giraldez, Stefania Nicoli
Journal PapersDevelopmental Cell (2017)

Abstract

Proper functioning of an organism requires cells and tissues to behave in uniform, well-organized ways. How this optimum of phenotypes is achieved during the development of vertebrates is unclear. Here, we carried out a multi-faceted and single-cell resolution screen of zebrafish embryonic blood vessels upon mutagenesis of single and multi-gene microRNA (miRNA) families. We found that embryos lacking particular miRNA-dependent signaling pathways develop a vascular trait similar to wild-type, but with a profound increase in phenotypic heterogeneity. Aberrant trait variance in miRNA mutant embryos uniquely sensitizes their vascular system to environmental perturbations. We discovered a previously unrecognized role for specific vertebrate miRNAs to protect tissue development against phenotypic variability. This discovery marks an important advance in our comprehension of how miRNAs function in the development of higher organisms.

Exon expression QTL (eeQTL) analysis highlights distant genomic variations associated with splicing regulation

Leying Guan, Qian Yang, Mengting Gu, Liang Chen, Xuegong Zhang
Journal PapersQuantitative Biology (2014)

Abstract

Alternative splicing is a ubiquitous mechanism of post-transcriptional regulation of gene expression and produces multiple isoforms from the same genes. Expression quantitative trait loci (eQTL) has been a major method for finding associations between gene expression and genomic variations. Differences in alternative splicing isoforms are resulted from differences in the expression of exons. We propose to use exon expression QTL (eeQTL) to study the genomic variations that are associated with splicing regulation. A stringent criterion was adopted to study gene-level eQTLs and exon-level eeQTLs for both cis- and trans- factors. From experiments on an RNA-sequencing (RNA-Seq) data set of HapMap samples, we observed that compared with eQTLs, more eeQTL trans-factors can be found than cis-factors, and many of the eeQTLs cannot be found at the gene level. This work highlights that the regulation of exons adds another layer of regulation on gene expression, and that eeQTL analysis is a new approach for investigating genome-wide genomic variations that are involved in the regulation of alternative splicing.