An Instance Selection Algorithm for Regression and its Application in Variance Reduction

The tradeoff between bias and variance is a well-known problem in machine learning, since algorithms are expected to achieve a reduced training error without going into overfitting. In Genetic Fuzzy Systems (GFSs), overfitting is usually avoided through the control of the number of rules and/or the number of labels. However, in many machine learning approaches, variance is reduced through the use of a validation set. Inspired by this idea, we propose in this paper an Instance Selection (IS) algorithm for regression problems called Class Conditional Instance Selection for Regression (CCISR) which is based on CCIS [1]. The output of CCISR is used in a GFS to obtain Rule Bases with a low variance, as the rules are generated with an ad hoc data driven method guided by the selected instances, but the error is still measured with the full training dataset. The combined system has been tested over 12 publicly available datasets, and results were compared with other GFSs. Our approach is capable of achieving a reduction in the number of rules while maintaining a good accuracy.

keywords: Instance Selection, regression problems, Genetic, Fuzzy Systems (GFSs), variance reduction