Abstract

HMMER is arguably the best tool for protein domain identification, which is essential for biological function prediction. There are many software and hardware enhancements of HMMER; however, most of them are not scalable to a large number of processors. The exponential growth of the number of protein sequences in public databases, which is currently set at more than 13 million, demands the use of HMMER on a very large scale. We have developed a highly scalable parallel (HSP) HMMER approach that enables identification of conserved functional domains in millions of proteins in less than a day using thousands of processing nodes on a supercomputer.

Keywords

ScalabilityComputer scienceIdentification (biology)Domain (mathematical analysis)SupercomputerFunction (biology)SoftwareSet (abstract data type)Parallel computingOperating systemProgramming languageBiologyEcology

Affiliated Institutions

Related Publications

Publication Info

Year
2009
Type
article
Pages
766-770
Citations
12
Access
Closed

Social Impact

Altmetric
PlumX Metrics

Social media, news, blog, policy document mentions

Citation Metrics

12
OpenAlex
2
Influential

Cite This

Bhanu Rekapalli, Christian Halloy, Igor B. Zhulin (2009). HSP-HMMER. Proceedings of the 2009 ACM symposium on Applied Computing , 766-770. https://doi.org/10.1145/1529282.1529443

Identifiers

DOI
10.1145/1529282.1529443

Data Quality

Data completeness: 81%