The Challenge

Laboratory of Noncoding genome & Data Science

Exploring the dark matter of the genome

Over 500 million years of evolution the number of open reading frames (ORFs) that codes for proteins have fairly remained constant at about 30,000 ORFs. Hence, it is indeed a conundrum for us to explain how complex functions evolved in higher organisms with such a limited number of ORFs.

Our hypothesis is that our initial definition of ORFs have been very conservative and because of this we might have missed inferring potential functional aspects of vasts stretches of the genome that are currently dubbed as noncoding or junk.

In humans it is important to revisit this definition of ORFs because almost 90% of disease-associated mutations map to regions of the genome that are currently dubbed as noncoding and junk.

Our Focus

Intepretation and prioritization of mutations in novel ORFs

Our lab focuses on the functional intepreation of the mutations that map to noncoding regions in cancer, schizohprenia, & rare diseases.


We use – Whole Genome Sequencing, Transcriptomics, and Proteomics data in one framework developed by us, which we call ‘systems proteotranscriptogenomics’, to identify novel ORFS.

Machine learning

We develop and use machine learning algorithms to intepret and prioritize variants in the novel ORFs

Cloud Computing

Our study is based on large scale data sets that are publicaly available. Hence to access and analyse these huge datasets we develop our own cloud platform using Amazon Web Services