Title
An efficient distributed protein disorder prediction with pasted samples
Document Type
Article
Publication Date
1-1-2018
Publication Title
Computers and Electrical Engineering
Volume
65
First Page
342
Last Page
356
Keywords
Big data, Machine learning, Pasting, Protein disorder classification, Statistical query learning
Abstract
© 2017 Elsevier Ltd In this paper, we compare prediction performance of a machine learning classifier constructed at once in memory with an ensemble of models constructed with the pasting procedure for protein disorder prediction. The pasting procedure takes sample bites of the training data as input, constructs a classification predictor on each sample and pastes the predictors together. This method has not been previously tested on protein structure data. With a sufficiently large sample size we observed increased performance for the pasting procedure compared with a single model constructed at once in memory for all window sizes. We attribute this increased performance to the robustness of the statistical query learning model. This procedure provides a means to improve classification performance at the protein disorder prediction task as well as construct models too large to be held at once in memory.
Recommended Citation
Smith, Denson; Yenduri, Sumanth; Iqbal, Sumaiya; and Krishna, P. Venkata, "An efficient distributed protein disorder prediction with pasted samples" (2018). Faculty Bibliography. 2907.
https://csuepress.columbusstate.edu/bibliography_faculty/2907