After the 2003 completion of the Human Genome Project – which sequenced all 3 billion “letters,” or base pairs, in the human genome – many thought that our DNA would become an open book. But a perplexing problem quickly emerged: although scientists could transcribe the book, they could only interpret a small percentage of it.
The mysterious majority – as much as 98 percent – of our DNA do not code for proteins. Much of this “dark matter genome” is thought to be nonfunctional evolutionary leftovers that are just along for the ride. However, hidden among this noncoding DNA are many crucial regulatory elements that control the activity of thousands of genes. What is more, these elements play a major role in diseases such as cancer, heart disease, and autism, and they could hold the key to possible cures.
As part of a major ongoing effort to fully map and annotate the functional sequences of the human genome, including this silent majority, the National Institutes of Health (NIH) on Feb. 2, 2017, announced new grant funding for a nationwide project to set up five “characterization centers,” including two at UC San Francisco, to study how these regulatory elements influence gene expression and, consequently, cell behavior.
The project’s aim is for scientists to use the latest technology, such as genome editing, to gain insights into human biology that could one day lead to treatments for complex genetic diseases.
Importance of Genomic Grammar
After the shortfalls of the Human Genome Project became clear, the Encyclopedia of DNA Elements (ENCODE) Project was launched in September 2003 by the National Human Genome Research Institute (NHGRI). The goal of ENCODE is to find all the functional regions of the human genome, whether they form genes or not.
“The Human Genome Project mapped the letters of the human genome, but it didn’t tell us anything about the grammar: where the punctuation is, where the starts and ends are,” said NIH Program Director Elise Feingold, PhD. “That’s what ENCODE is trying to do.”
The initiative revealed that millions of these noncoding letter sequences perform essential regulatory actions, like turning genes on or off in different types of cells. However, while scientists have established that these regulatory sequences have important functions, they do not know what function each sequence performs, nor do they know which gene each one affects. That is because the sequences are often located far from their target genes – in some cases millions of letters away. What’s more, many of the sequences have different effects in different types of cells.
The new grants from NHGRI will allow the five new centers to work to define the functions and gene targets of these regulatory sequences. At UCSF, two of the centers will be based in the labs of Nadav Ahituv, PhD, and Yin Shen, PhD. The other three characterization centers will be housed at Stanford University, Cornell University, and the Lawrence Berkeley National Laboratory. Additional centers will continue to focus on mapping, computational analysis, data analysis and data coordination.
Cellular Barcodes Reveal Regulatory Function
New technology has made identifying the function and targets of regulatory sequences much easier. Scientists can now manipulate cells to obtain more information about their DNA, and, thanks to high-throughput screening, they can do so in large batches, testing thousands of sequences in one experiment instead of one by one.
“It used to be extremely difficult to test for function in the noncoding part of the genome,” said Ahituv, a professor in the Department of Bioengineering and Therapeutic Sciences. “With a gene, it’s easier to assess the effect because there is a change in the corresponding protein. But with regulatory sequences, you don’t know what a change in DNA can lead to, so it’s hard to predict the functional output.”
Ahituv and Shen are both using innovative techniques to study enhancers, which play a fundamental role in gene expression. Every cell in the human body contains the same DNA. What determines whether a cell is a skin cell or a brain cell or a heart cell is which genes are turned on and off. Enhancers are the secret switches that turn on cell-type specific genes.
During a previous phase of ENCODE, Ahituv and collaborator Jay Shendure, PhD, at the University of Washington, developed a technique called lentivirus-based massive parallel reporter assay to identify enhancers. With the new grant, they will use this technology to test for enhancers among 100,000 regulatory sequences previously identified by ENCODE. [….]