Title

An efficient distributed protein disorder prediction with pasted samples

Document Type

Article

Publication Date

1-1-2018

Publication Title

Computers and Electrical Engineering

Volume

65

First Page

342

Last Page

356

Keywords

Big data, Machine learning, Pasting, Protein disorder classification, Statistical query learning

Abstract

© 2017 Elsevier Ltd In this paper, we compare prediction performance of a machine learning classifier constructed at once in memory with an ensemble of models constructed with the pasting procedure for protein disorder prediction. The pasting procedure takes sample bites of the training data as input, constructs a classification predictor on each sample and pastes the predictors together. This method has not been previously tested on protein structure data. With a sufficiently large sample size we observed increased performance for the pasting procedure compared with a single model constructed at once in memory for all window sizes. We attribute this increased performance to the robustness of the statistical query learning model. This procedure provides a means to improve classification performance at the protein disorder prediction task as well as construct models too large to be held at once in memory.

This document is currently not available here.

Share

COinS