An efficient distributed protein disorder prediction with pasted samples
Computers and Electrical Engineering
Big data, Machine learning, Pasting, Protein disorder classification, Statistical query learning
© 2017 Elsevier Ltd In this paper, we compare prediction performance of a machine learning classifier constructed at once in memory with an ensemble of models constructed with the pasting procedure for protein disorder prediction. The pasting procedure takes sample bites of the training data as input, constructs a classification predictor on each sample and pastes the predictors together. This method has not been previously tested on protein structure data. With a sufficiently large sample size we observed increased performance for the pasting procedure compared with a single model constructed at once in memory for all window sizes. We attribute this increased performance to the robustness of the statistical query learning model. This procedure provides a means to improve classification performance at the protein disorder prediction task as well as construct models too large to be held at once in memory.
Smith, Denson; Yenduri, Sumanth; Iqbal, Sumaiya; and Krishna, P. Venkata, "An efficient distributed protein disorder prediction with pasted samples" (2018). Faculty Bibliography. 2907.