Abstract

Computational efforts to identify functional elements within genomes leverage comparative sequence information by looking for regions that exhibit evidence of selective constraint. One way of detecting constrained elements is to follow a bottom-up approach by computing constraint scores for individual positions of a multiple alignment and then defining constrained elements as segments of contiguous, highly scoring nucleotide positions. Here we present GERP++, a new tool that uses maximum likelihood evolutionary rate estimation for position-specific scoring and, in contrast to previous bottom-up methods, a novel dynamic programming approach to subsequently define constrained elements. GERP++ evaluates a richer set of candidate element breakpoints and ranks them based on statistical significance, eliminating the need for biased heuristic extension techniques. Using GERP++ we identify over 1.3 million constrained elements spanning over 7% of the human genome. We predict a higher fraction than earlier estimates largely due to the annotation of longer constrained elements, which improves one to one correspondence between predicted elements with known functional sequences. GERP++ is an efficient and effective tool to provide both nucleotide- and element-level constraint scores within deep multiple sequence alignments.

Keywords

Constraint (computer-aided design)GenomeComputer scienceMultiple sequence alignmentLeverage (statistics)HeuristicHuman genomeSet (abstract data type)Fraction (chemistry)Sequence (biology)Computational biologySequence alignmentData miningArtificial intelligenceBiologyMathematicsGeneticsGene

Affiliated Institutions

Related Publications

Accelerated Profile HMM Searches

Profile hidden Markov models (profile HMMs) and probabilistic inference methods have made important contributions to the theory of sequence database homology search. However, pr...

2011 PLoS Computational Biology 6891 citations

Publication Info

Year
2010
Type
article
Volume
6
Issue
12
Pages
e1001025-e1001025
Citations
1786
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

1786
OpenAlex

Cite This

Eugene Davydov, David L. Goode, Marina Sirota et al. (2010). Identifying a High Fraction of the Human Genome to be under Selective Constraint Using GERP++. PLoS Computational Biology , 6 (12) , e1001025-e1001025. https://doi.org/10.1371/journal.pcbi.1001025

Identifiers

DOI
10.1371/journal.pcbi.1001025