Special Issue Paper

Journal of the Operational Research Society (2009) 60, 1045–1055. doi:10.1057/palgrave.jors.2602651; published online 20 August 2008

Near-optimal feature selection for large databases

J Yang1 and S Ólafsson2

  1. 1Chonbuk National University, Jeonju, Korea
  2. 2Iowa State University, Ames, IA, USA

Correspondence: J Yang, Department of Industrial and Information Systems Engineering, Chonbuk National University, 418 Engineering Bldg. #6, Jeonju, 561-756, Korea. E-mail: jkyang@chonbuk.ac.kr

Received November 2007; Accepted April 2008; Published online 20 August 2008.

Top

Abstract

We analyse a new optimization-based approach for feature selection that uses the nested partitions method for combinatorial optimization as a heuristic search procedure to identify good feature subsets. In particular, we show how to improve the performance of the nested partitions method using random sampling of instances. The new approach uses a two-stage sampling scheme that determines the required sample size to guarantee convergence to a near-optimal solution. This approach therefore also has attractive theoretical characteristics. In particular, when the algorithm terminates in finite time, rigorous statements can be made concerning the quality of the final feature subset. Numerical results are reported to illustrate the key results, and show that the new approach is considerably faster than the original nested partitions method and other feature selection methods.

Keywords:

feature selection, data mining, combinatorial optimization, scalability

MORE ARTICLES LIKE THIS

These links to content published by Palgrave Macmillan are automatically generated.

RESEARCH

Near-optimal feature selection for large databases

Journal of the Operational Research Society Special Feature

An optimization approach to partitional data clustering

Journal of the Operational Research Society Special Feature

Extra navigation

.

Society resources

ADVERTISEMENT
JORS-Link to full archive