We focus on the development of interpretable machine learning approaches, driven by rapidly increasing quantities of biological and biomedical data, to understand the functional genomics and molecular mechanisms between genotype and phenotype, and discover the engineering principles in complex biological systems. With applications to human diseases, we aim to generate the knowledge from those principles that can be directly translated into the biomedical research. In particular, our recent work use multi-scale modeling and machine learning to study gene expression dynamics, gene regulatory networks and circuits in brain disorders, development and cancer.
Single-cell deconvolution and interpretable deep learning: we have built a comprehensive functional genomic resource for the human brain across 1866 individuals (resource.psychencode.org) using multi-omics data from PsychENCODE and other large consortia. It contains ~79K brain-active enhancers, sets of Hi-C linkages and TADs, single-cell expression profiles for many cell types, expression QTLs, and further QTLs associated with chromatin, splicing, and cell-type proportions. We deconvolved the bulk tissue expression across individuals using single-cell data and found that varying cell-type proportions largely account for the cross-population variation in expression (with >88% reconstruction accuracy). Leveraging our QTLs and Hi-C datasets, we predicted a full regulatory network, linking GWAS variants to genes (e.g., 321 for schizophrenia). We embedded this network into an interpretable deep-learning model, which improves disease prediction ~6X vs. polygenic risk scores and identifies key genes and pathways in psychiatric disorders. [Science 362, eaat8464, 2018]
Cross-species gene network clustering: we designed a novel cross-species clustering algorithm to demonstrate conserved and species-specific gene and non-coding RNA regulatory modules during embryonic development between C. elegans and D. mel. We found that in both species, the orthologous genes work more closely during the phylotypic developmental stage (aka the vertebrate body plan stage) than other developmental stages. This lays the groundwork for evolutionary expression patterns during embryogenesis and enabled us to systematically study interactions between evolutionary conserved and species-specific functions during development. [Nature 512, 445–448, 2014; Genome Biology 15:R100, 2014]
Principal dynamic models in biological systems: we developed computational methods identifying the principal gene expression patterns for complex biological processes such as embryogenesis, allowing integration of the state-space model and dimensionality reduction by matrix factorizations for the first time. This approach produced an entirely new analytical platform with promise to open new avenues of investigation into systematic and robust dynamic patterns from high dimensional, complex and noisy gene expression data [PLoS Computational Biology, 12(10): e1005146, 2016; PLoS ONE 7(1): e28805, 2012; IEEE/ACM Transactions on Computational Biology and Bioinformatics, 430-437, 2012].
Gene regulatory logics: we developed a computational method by integrating ENCODE and TCGA data to identify a genome-wide regulatory logic of transcription factors and microRNAs reporting on logic patterns observed in leukemia. Until this point, similar logics had only been reported in simple organisms like yeast. These results provided unprecedented insights into the gene regulatory circuit logics in complex and more advanced biological systems like cancer [PLoS Computational Biology 11(4): e1004132, 2015].
Inter-disciplinary network transferability: our recent review compared the characteristics of biological networks with other disciplines, and discussed the cross-disciplinary transferability of network formalisms to help gain novel biological insights at the system level. We illustrated how these comparisons benefit the field with a few specific examples related to network growth, organizational hierarchies, and the evolution of adaptive systems [Cell Systems, 2, 147-157, 2016].
Academic social network: we analyzed the academic social networks driven by large scientific consortia (Big Science), which revealed temporal dynamics of collaborative patterns between consortia members and non-member users [Trends in Genetics, 32, 251-253, 2016].