Abstract

Phylogenetic inference is a grand challenge in Bioinformatics due to immense computational requirements. The increasing popularity of multi-gene alignments in biological studies, which typically provide a stable topological signal due to a more favorable ratio of the number of base pairs to the number of sequences, coupled with rapid accumulation of sequence data in general, poses new challenges for high performance computing. In this paper, we demonstrate how state-of-the-art Maximum Likelihood (ML) programs can be efficiently scaled to the IBM BlueGene/L (BG/L) architecture, by porting RAxML, which is currently among the fastest and most accurate programs for phylogenetic inference under the ML criterion. We simultaneously exploit coarse-grained and fine-grained parallelism that is inherent in every ML-based biological analysis. Performance is assessed using datasets consisting of 212 sequences and 566,470 base pairs, and 2,182 sequences and 51,089 base pairs, respectively. To the best of our knowledge, these are the largest datasets analyzed under ML to date. The capability to analyze such datasets will help to address novel biological questions via phylogenetic analyses. Our experimental results indicate that the fine-grained parallelization scales well up to 1, 024 processors. Moreover, a larger number of processors can be efficiently exploited by a combination of coarse-grained and fine-grained parallelism. Finally, we demonstrate that our parallelization scales equally well on an AMD Opteron cluster with a less favorable network latency to processor speed ratio. We recorded super-linear speedups in several cases due to increased cache efficiency.

Keywords

Computer scienceParallel computingPortingPhylogenetic treeXeon PhiIBMKernel (algebra)SupercomputerInferenceExploitArtificial intelligenceMathematicsBiology

Affiliated Institutions

Related Publications

Publication Info

Year
2007
Type
article
Pages
1-11
Citations
134
Access
Closed

External Links

Social Impact

Social media, news, blog, policy document mentions

Citation Metrics

134
OpenAlex

Cite This

Michael Ott, Jarosław Żola, Alexandros Stamatakis et al. (2007). Large-scale maximum likelihood-based phylogenetic analysis on the IBM BlueGene/L. , 1-11. https://doi.org/10.1145/1362622.1362628

Identifiers

DOI
10.1145/1362622.1362628