"Multidomainbenchmark: A multi-domain query and subject database suite" by Hyrum D. Carroll, John L. Spouge et al.

Faculty Bibliography

Title

Multidomainbenchmark: A multi-domain query and subject database suite

Authors

Hyrum D. Carroll, Columbus State University
John L. Spouge, National Institutes of Health (NIH)
Mileidy Gonzalez, National Institutes of Health (NIH)

Document Type

Article

Publication Date

2-14-2019

Publication Title

BMC Bioinformatics

Volume

Keywords

Benchmark, Multi-domain, Query and subject

Abstract

© 2019 The Author(s). Background: Genetic sequence database retrieval benchmarks play an essential role in evaluating the performance of sequence searching tools. To date, all phylogenetically diverse benchmarks known to the authors include only query sequences with single protein domains. Domains are the primary building blocks of protein structure and function. Independently, each domain can fulfill a single function, but most proteins (>80% in Metazoa) exist as multi-domain proteins. Multiple domain units combine in various arrangements or architectures to create different functions and are often under evolutionary pressures to yield new ones. Thus, it is crucial to create gold standards reflecting the multi-domain complexity of real proteins to more accurately evaluate sequence searching tools. Description: This work introduces MultiDomainBenchmark (MDB), a database suite of 412 curated multi-domain queries and 227,512 target sequences, representing at least 5108 species and 1123 phylogenetically divergent protein families, their relevancy annotation, and domain location. Here, we use the benchmark to evaluate the performance of two commonly used sequence searching tools, BLAST/PSI-BLAST and HMMER. Additionally, we introduce a novel classification technique for multi-domain proteins to evaluate how well an algorithm recovers a domain architecture. Conclusion: MDB is publicly available at http://csc.columbusstate.edu/carroll/MDB/.

Recommended Citation

Carroll, Hyrum D.; Spouge, John L.; and Gonzalez, Mileidy, "Multidomainbenchmark: A multi-domain query and subject database suite" (2019). Faculty Bibliography. 2803.
https://csuepress.columbusstate.edu/bibliography_faculty/2803

This document is currently not available here.

COinS

CSU ePress

Faculty Bibliography

Title

Authors

Document Type

Publication Date

Publication Title

Volume

Keywords

Abstract

Recommended Citation

Browse

Search

Author Corner

Library Corner

CSU ePress

Faculty Bibliography

Title

Authors

Document Type

Publication Date

Publication Title

Volume

Keywords

Abstract

Recommended Citation

Share

Browse

Search

Author Corner

Library Corner