Understanding human genome regulation through robust and multi-scale reference chromatin state annotations across hundreds of human cell types

Although researchers have identified tens of thousands of disease-associated genetic variants, the mechanisms driving most of these variants remains unknown. Most variants are believed to affect regulatory elements. However, regulatory elements are incompletely annotated and understood. Large-scale projects have recently generated thousands of epigenomic data sets. These data sets measure the regulatory activity of the genome in human cells. However, computational methods are needed to understand the link between genetic variation and disease.

We previously developed a computational method, Segway, that annotates genomic regulatory elements on the basis of epigenomic data sets. Enabled by new epigenetic data sets, this project will annotate the genome in hundreds of human cell types, and use these annotations to understand disease-associated genetic variation.

Additionally, we will develop computational methods that improve our ability to identify genomic elements. This outputs of this project will come in three forms:

  1. General-purpose software for annotating the genome.
  2. Easy-to-use reference data sets.
  3. Insights into the link between genetic variation and chronic obstructive pulmonary disease (COPD).